# Lip-sync videos using Wav2Lip

- 📁Official repo - https://github.com/Rudrabha/Wav2Lip
- 🌐Website - http://bhaasha.iiit.ac.in/lipsync/
- 📄Paper - https://arxiv.org/abs/2008.10010


In [1]:
# Mount the Drive
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Install the required libraries & clone the official Wav2Lip repo
!pip install - q pytube
!pip install ffmpeg-python mediapipe == 0.8.11
!git clone https: // github.com/justinjohn0306/Wav2Lip / content/drive/MyDrive/Wav2Lip
%cd / content/drive/MyDrive/Wav2Lip/


Cloning into '/content/drive/MyDrive/Wav2Lip'...
remote: Enumerating objects: 502, done.[K
remote: Counting objects: 100% (81/81), done.[K
remote: Compressing objects: 100% (72/72), done.[K
remote: Total 502 (delta 23), reused 54 (delta 7), pack-reused 421[K
Receiving objects: 100% (502/502), 29.76 MiB | 16.74 MiB/s, done.
Resolving deltas: 100% (256/256), done.
/content/drive/MyDrive/Wav2Lip


In [3]:
# Download the pre-trained models' weights
!wget 'https://github.com/justinjohn0306/Wav2Lip/releases/download/models/wav2lip.pth' - O 'checkpoints/wav2lip.pth'
!wget 'https://github.com/justinjohn0306/Wav2Lip/releases/download/models/wav2lip_gan.pth' - O 'checkpoints/wav2lip_gan.pth'
!wget 'https://github.com/justinjohn0306/Wav2Lip/releases/download/models/resnet50.pth' - O 'checkpoints/resnet50.pth'
!wget 'https://github.com/justinjohn0306/Wav2Lip/releases/download/models/mobilenet.pth' - O 'checkpoints/mobilenet.pth'
a = !pip install https: // raw.githubusercontent.com/AwaleSajil/ghc/master/ghc-1.0-py3-none-any.whl
!pip install git+https: // github.com/elliottzheng/batch-face.git@master


--2023-09-15 07:53:28--  https://github.com/justinjohn0306/Wav2Lip/releases/download/models/wav2lip.pth
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/615543729/e18ec62e-10ae-4c65-9862-1c7a0fafe228?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230915%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230915T075329Z&X-Amz-Expires=300&X-Amz-Signature=b3d0949ed662f95c03bb9835bac91e5d989b290151b199660f3e32635fafcdbf&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=615543729&response-content-disposition=attachment%3B%20filename%3Dwav2lip.pth&response-content-type=application%2Foctet-stream [following]
--2023-09-15 07:53:29--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/615543729/e18ec62e-10ae-4c65-9862-1c7a0fafe228?X-Amz-Algo

In [4]:
# # Install the requirements
# %cd /content/drive/MyDrive/Wav2Lip/
# !pip install -r requirements.txt


In [5]:
# Downloading the target YouTube video using pytube
# URL - https://www.youtube.com/watch?v=YMuuEv37s0o
import pytube
yt_url = input("Enter the YouTube URL: ")
video = pytube.YouTube(yt_url)
streams = video.streams
stream = streams.filter(res='720p').first()
stream.download(output_path='/content/drive/MyDrive/Wav2Lip/input_videos/')


Enter the YouTube URL: https://www.youtube.com/watch?v=YMuuEv37s0o


'/content/drive/MyDrive/Wav2Lip/input_videos/TechNews 1562  IPL Final OLA Prime Plus Love scam Motorola Edge 40 Premium TV Days Etc.mp4'

**After trimming (all the frames in the video have face) the original video to the desired input data to feed into the model**


In [6]:
# Uploaded the trimmed video in the "input_videos" folder
import os
os.listdir('/content/drive/MyDrive/Wav2Lip/input_videos/')


['TechNews 1562  IPL Final OLA Prime Plus Love scam Motorola Edge 40 Premium TV Days Etc.mp4',
 'trimmed_input_video.mp4']

**The audio data has to be trimmed inorder that the length of the video & audio have same length**


In [7]:
# Uploaded the trimmed audio data in the "input_audios" folder
os.listdir('/content/drive/MyDrive/Wav2Lip/input_audios/')


['output10.wav', 'trimmed_input_audio.wav']

In [8]:
# Resampling the audio data
import librosa
import soundfile as sf

TRIMMED_AUDIO_PATH = '/content/drive/MyDrive/Wav2Lip/input_audios/output10.wav'
audio, sr = librosa.load(TRIMMED_AUDIO_PATH, sr=None)
sf.write('/content/drive/MyDrive/Wav2Lip/input_audios/trimmed_input_audio.wav',
         audio, sr, format='wav')


In [9]:
# @title Let's lip-sync the video using the pre-trained models
# @markdown <b>Note: Only change these, if you have to</b>

%cd / content/drive/MyDrive/Wav2Lip

# Set up paths and variables for the output file
output_file_path = '/content/Wav2Lip/results/result_voice.mp4'

# Delete existing output file before processing, if any
if os.path.exists(output_file_path):
    os.remove(output_file_path)

pad_top = 0  # @param {type:"integer"}
pad_bottom = 10  # @param {type:"integer"}
pad_left = 0  # @param {type:"integer"}
pad_right = 0  # @param {type:"integer"}
rescaleFactor = 1  # @param {type:"integer"}
nosmooth = False  # @param {type:"boolean"}
# @markdown ___
# @markdown Model selection:
use_hd_model = True  # @param {type:"boolean"}
checkpoint_path = 'checkpoints/wav2lip.pth' if not use_hd_model else 'checkpoints/wav2lip_gan.pth'


if nosmooth == False:
    !python inference.py - -checkpoint_path $checkpoint_path - -face "/content/drive/MyDrive/Wav2Lip/input_videos/trimmed_input_video.mp4" - -audio "/content/drive/MyDrive/Wav2Lip/input_audios/trimmed_input_audio.wav" - -pads $pad_top $pad_bottom $pad_left $pad_right - -resize_factor $rescaleFactor
else:
    !python inference.py - -checkpoint_path $checkpoint_path - -face "/content/drive/MyDrive/Wav2Lip/input_videos/trimmed_input_video.mp4" - -audio "/content/drive/MyDrive/Wav2Lip/input_audios/trimmed_input_audio.wav" - -pads $pad_top $pad_bottom $pad_left $pad_right - -resize_factor $rescaleFactor - -nosmooth


/content/drive/MyDrive/Wav2Lip
Using cuda for inference.
Load checkpoint from: checkpoints/wav2lip_gan.pth
Models loaded
Reading video frames...
Number of frames available for inference: 3109
(80, 5386)
Length of mel chunks: 3358
  0% 0/27 [00:00<?, ?it/s]face detect time: 93.737557888031
100% 27/27 [02:04<00:00,  4.60s/it]
wav2lip prediction time: 124.14525055885315
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-l