# **Lip Sync**

In this notebook we will use the trained lip-syncing model wav2lip. It takes an input image and audio, and then generates a video where the image appears to lip sync with the provided audio.

### Setting Enviornment

Here we installed Miniconda which installed Python 3.6 the same as the original paper to match the packages versions we wil install

In [1]:
# Install Miniconda
!wget -q https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

# Install Conda dependencies
!conda install -q -y --prefix /usr/local python=3.6 pip

# Add Conda binaries to system path
import sys
sys.path.append('/usr/local/lib/python3.6/site-packages/')

PREFIX=/usr/local
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-hdf63c60_3 ...
installing: libstdcxx-ng-7.2.0-hdf63c60_3 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-hf484d3e_0 ...
installing: openssl-1.0.2o-h20670df_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-ha838bed_2 ...
installing: libedit-3.1.20170329-h6b74fdf_2 ...
installing: readline-7.0-ha6073c6_4 ...
installing: sqlite-3.23.1-he433501_0 ...
installing: asn1crypto-0.24.0-py36_0 ...
installing: certifi-2018.4.16-py36_0 ...
installing: chardet-3.0.4-py36h0f667ec_1 ...
installing: idna-2.6-py36h82fb2a8_1 ...
installing: pycosat-0.6.3-py36h0a5515d_0 ...
installing: pycparser-2.18-py36hf9f622e_1 ...
installing: pysocks-1.6.8-py36_0 ...
installing: ruamel_yaml-0.15.37-py36h14c

In [2]:
# check python version
!python --version

Python 3.6.13 :: Anaconda, Inc.


In [3]:
# cloning the source codes and the datasamples we will apply the model on
!git clone https://github.com/MarwanMohamed95/lip-syncing.git

Cloning into 'lip-syncing'...
remote: Enumerating objects: 84, done.[K
remote: Counting objects: 100% (84/84), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 84 (delta 13), reused 79 (delta 8), pack-reused 0[K
Receiving objects: 100% (84/84), 5.11 MiB | 11.99 MiB/s, done.
Resolving deltas: 100% (13/13), done.


### Loading the models (Wav2Lip) and (Face Detector)

In [4]:
# loading the model to lip-syncing/checkpoints directory
!wget 'https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA' -O '/content/lip-syncing/checkpoints/wav2lip_gan.pth'

--2024-01-18 01:19:17--  https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA
Resolving iiitaphyd-my.sharepoint.com (iiitaphyd-my.sharepoint.com)... 13.107.136.10, 13.107.138.10, 2620:1ec:8f8::10, ...
Connecting to iiitaphyd-my.sharepoint.com (iiitaphyd-my.sharepoint.com)|13.107.136.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 435801865 (416M) [application/octet-stream]
Saving to: ‘/content/lip-syncing/checkpoints/wav2lip_gan.pth’


2024-01-18 01:19:23 (79.1 MB/s) - ‘/content/lip-syncing/checkpoints/wav2lip_gan.pth’ saved [435801865/435801865]



In [5]:
# Loading the face detector to lip-syncing/face_detection/detection/sfd dir
!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "lip-syncing/face_detection/detection/sfd/s3fd.pth"

--2024-01-18 01:19:24--  https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Resolving www.adrianbulat.com (www.adrianbulat.com)... 45.136.29.207
Connecting to www.adrianbulat.com (www.adrianbulat.com)|45.136.29.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89843225 (86M) [application/octet-stream]
Saving to: ‘lip-syncing/face_detection/detection/sfd/s3fd.pth’


2024-01-18 01:19:30 (15.6 MB/s) - ‘lip-syncing/face_detection/detection/sfd/s3fd.pth’ saved [89843225/89843225]



### Installing Packages

In [6]:
!cd lip-syncing && pip install -r requirements.txt

Collecting librosa==0.7.0
  Downloading librosa-0.7.0.tar.gz (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 9.6 MB/s 
[?25hCollecting numpy==1.17.1
  Downloading numpy-1.17.1-cp36-cp36m-manylinux1_x86_64.whl (20.4 MB)
[K     |████████████████████████████████| 20.4 MB 1.2 MB/s 
[?25hCollecting opencv-contrib-python==4.2.0.34
  Downloading opencv_contrib_python-4.2.0.34-cp36-cp36m-manylinux1_x86_64.whl (34.2 MB)
[K     |████████████████████████████████| 34.2 MB 1.2 MB/s 
[?25hCollecting opencv-python==4.1.0.25
  Downloading opencv_python-4.1.0.25-cp36-cp36m-manylinux1_x86_64.whl (26.6 MB)
[K     |████████████████████████████████| 26.6 MB 1.3 MB/s 
[?25hCollecting torch==1.1.0
  Downloading torch-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (676.9 MB)
[K     |████████████████████████████████| 676.9 MB 5.0 kB/s 
[?25hCollecting torchvision==0.3.0
  Downloading torchvision-0.3.0-cp36-cp36m-manylinux1_x86_64.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 43.8 M

### Inference

Lip Sync on English audio file with the input image.

In [8]:
!cd lip-syncing && python3.6 inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face "/content/lip-syncing/data_samples/test_image_2.png" --audio "/content/lip-syncing/data_samples/English_audio.wav" --outfile "/content/lip-syncing/results/English_Result_2.mp4"

Using cuda for inference.
Number of frames available for inference: 1
(80, 2815)
Length of mel chunks: 876
  0% 0/7 [00:00<?, ?it/s]
  0% 0/1 [00:00<?, ?it/s][ATHCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
  0% 0/1 [00:00<?, ?it/s]
Recovering from OOM error; New batch size: 8

  0% 0/1 [00:00<?, ?it/s][A
100% 1/1 [00:01<00:00,  1.55s/it]
Load checkpoint from: checkpoints/wav2lip_gan.pth
Model loaded
100% 7/7 [00:14<00:00,  2.06s/it]
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --ena

### Showing the results

In [9]:
output_file_name= "/content/lip-syncing/results/English_Result_2.mp4"
from IPython.display import HTML
from base64 import b64encode

mp4 = open(output_file_name,'rb').read()
decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f'<video width=800 controls><source src={decoded_vid} type="video/mp4"></video>')

Lip Sync on Arabic audio file with the input image.

In [10]:
!cd lip-syncing && python3.6 inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face "/content/lip-syncing/data_samples/test_image_2.png" --audio "/content/lip-syncing/data_samples/Arabic_audio.wav" --outfile "/content/lip-syncing/results/Arabic_Result_2.mp4"

Using cuda for inference.
Number of frames available for inference: 1
(80, 4025)
Length of mel chunks: 1255
  0% 0/10 [00:00<?, ?it/s]
  0% 0/1 [00:00<?, ?it/s][ATHCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
  0% 0/1 [00:00<?, ?it/s]
Recovering from OOM error; New batch size: 8

  0% 0/1 [00:00<?, ?it/s][A
100% 1/1 [00:01<00:00,  1.57s/it]
Load checkpoint from: checkpoints/wav2lip_gan.pth
Model loaded
100% 10/10 [00:15<00:00,  1.55s/it]
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d -

In [11]:
output_file_name= "/content/lip-syncing/results/Arabic_Result_2.mp4"
from IPython.display import HTML
from base64 import b64encode

mp4 = open(output_file_name,'rb').read()
decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f'<video width=800 controls><source src={decoded_vid} type="video/mp4"></video>')