# **Lip Sync with Face Restoration**

- In this project we will use the trained lip-syncing wav2lip (for lip syncing) and GFPGAN (for face restoration) models to generate high quality lip syncing video.
- Wav2Lip takes an input image and audio, and then generates a video where the image appears to lip sync with the provided audio.
- GFPGAN model takes the generated video frames to enhance the frame quality (face and background) to generate high quality video.

### Setting Enviornment

Here we installed Miniconda which installed Python 3.7.

In [None]:
# Install Miniconda for Python 3.7
!wget -q https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
!chmod +x Miniconda3-4.5.4-Linux-x86_64.sh
!bash ./Miniconda3-4.5.4-Linux-x86_64.sh -b -f -p /usr/local

# Install Conda dependencies for Python 3.7
!conda install -q -y --prefix /usr/local python=3.7 pip

# Add Conda binaries to system path
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

PREFIX=/usr/local
installing: python-3.6.5-hc3d631a_2 ...
Python 3.6.5 :: Anaconda, Inc.
installing: ca-certificates-2018.03.07-0 ...
installing: conda-env-2.6.0-h36134e3_1 ...
installing: libgcc-ng-7.2.0-hdf63c60_3 ...
installing: libstdcxx-ng-7.2.0-hdf63c60_3 ...
installing: libffi-3.2.1-hd88cf55_4 ...
installing: ncurses-6.1-hf484d3e_0 ...
installing: openssl-1.0.2o-h20670df_0 ...
installing: tk-8.6.7-hc745277_3 ...
installing: xz-5.2.4-h14c3975_4 ...
installing: yaml-0.1.7-had09818_2 ...
installing: zlib-1.2.11-ha838bed_2 ...
installing: libedit-3.1.20170329-h6b74fdf_2 ...
installing: readline-7.0-ha6073c6_4 ...
installing: sqlite-3.23.1-he433501_0 ...
installing: asn1crypto-0.24.0-py36_0 ...
installing: certifi-2018.4.16-py36_0 ...
installing: chardet-3.0.4-py36h0f667ec_1 ...
installing: idna-2.6-py36h82fb2a8_1 ...
installing: pycosat-0.6.3-py36h0a5515d_0 ...
installing: pycparser-2.18-py36hf9f622e_1 ...
installing: pysocks-1.6.8-py36_0 ...
installing: ruamel_yaml-0.15.37-py36h14c

In [None]:
# check python version
!python --version

Python 3.7.13
Python 3.7.13


In [None]:
# cloning the source codes and the datasamples we will apply the model on
!git clone https://github.com/MarwanMohamed95/LipSync-with-Face-Restoration.git

Cloning into 'LipSync-with-Face-Restoration'...
remote: Enumerating objects: 171, done.[K
remote: Counting objects: 100% (171/171), done.[K
remote: Compressing objects: 100% (137/137), done.[K
remote: Total 171 (delta 26), reused 162 (delta 17), pack-reused 0[K
Receiving objects: 100% (171/171), 7.00 MiB | 14.38 MiB/s, done.
Resolving deltas: 100% (26/26), done.


# 1. Lip Syncing

### Loading Wav2Lip and Face Detector models

In [None]:
# loading wav2lip model to lip-syncing/checkpoints directory
wav2lip_model = 'https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA'
wav2lip_dir = '/content/LipSync-with-Face-Restoration/Wav2Lip/checkpoints/wav2lip_gan.pth'
!wget $wav2lip_model -O $wav2lip_dir

--2024-01-23 21:09:49--  https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA
Resolving iiitaphyd-my.sharepoint.com (iiitaphyd-my.sharepoint.com)... 13.107.136.10, 13.107.138.10, 2620:1ec:8f8::10, ...
Connecting to iiitaphyd-my.sharepoint.com (iiitaphyd-my.sharepoint.com)|13.107.136.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 435801865 (416M) [application/octet-stream]
Saving to: ‘/content/LipSync-with-Face-Restoration/Wav2Lip/checkpoints/wav2lip_gan.pth’


2024-01-23 21:10:26 (11.5 MB/s) - ‘/content/LipSync-with-Face-Restoration/Wav2Lip/checkpoints/wav2lip_gan.pth’ saved [435801865/435801865]



In [None]:
# Loading the face detector model to lip-syncing/face_detection/detection/sfd dir
face_detector_url = "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
face_detector_dir = "/content/LipSync-with-Face-Restoration/Wav2Lip/face_detection/detection/sfd/s3fd.pth"
!wget $face_detector_url -O $face_detector_dir

--2024-01-23 21:10:26--  https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Resolving www.adrianbulat.com (www.adrianbulat.com)... 45.136.29.207
Connecting to www.adrianbulat.com (www.adrianbulat.com)|45.136.29.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89843225 (86M) [application/octet-stream]
Saving to: ‘/content/LipSync-with-Face-Restoration/Wav2Lip/face_detection/detection/sfd/s3fd.pth’


2024-01-23 21:10:35 (11.9 MB/s) - ‘/content/LipSync-with-Face-Restoration/Wav2Lip/face_detection/detection/sfd/s3fd.pth’ saved [89843225/89843225]



### Installing required Packages

In [None]:
!cd LipSync-with-Face-Restoration && pip install -r requirements.txt

Collecting librosa==0.7.0
  Downloading librosa-0.7.0.tar.gz (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpy==1.17.1
  Downloading numpy-1.17.1-cp37-cp37m-manylinux1_x86_64.whl (20.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.3/20.3 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting opencv-contrib-python==4.2.0.34
  Downloading opencv_contrib_python-4.2.0.34-cp37-cp37m-manylinux1_x86_64.whl (34.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.2/34.2 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting opencv-python==4.1.0.25
  Downloading opencv_python-4.1.0.25-cp37-cp37m-manylinux1_x86_64.whl (26.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.6/26.6 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tqdm==4.45.0
  Downloa

### Importing required libraries

In [None]:
import cv2
from tqdm import tqdm
from os import path
import os

### Inference

Apply Wav2Lip model to sync the lip movements of the image with the audio input
This will take the image and the audio as inputs and generates Lip Sync Video as output

In [None]:
input_image = '/content/LipSync-with-Face-Restoration/data_samples/test_image.png'
input_voice = '/content/LipSync-with-Face-Restoration/data_samples/English_audio.wav'
results_path = '/content/LipSync-with-Face-Restoration/results'
low_quality_video = os.path.join(results_path, 'low_quality_video.mp4')

In [None]:
!cd LipSync-with-Face-Restoration/Wav2Lip && python3.7 inference.py --checkpoint_path checkpoints/wav2lip_gan.pth \
--face $input_image \
--audio $input_voice \
--outfile $low_quality_video

Using cuda for inference.
Number of frames available for inference: 1
(80, 2815)
Length of mel chunks: 876
  0% 0/7 [00:00<?, ?it/s]
  0% 0/1 [00:00<?, ?it/s][A
100% 1/1 [00:01<00:00,  1.53s/it]
Load checkpoint from: checkpoints/wav2lip_gan.pth
Model loaded
100% 7/7 [00:17<00:00,  2.54s/it]
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-l

### Showing the output video (Low Quality) (Before Face Restoration)

In [None]:
from IPython.display import HTML
from base64 import b64encode

mp4 = open(low_quality_video,'rb').read()
decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f'<video width=600 controls><source src={decoded_vid} type="video/mp4"></video>')

# 2. Face Restoration

### Loading GFPGAN model for face restoration

In [None]:
gfpgan_url = 'https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth'
gfpgan_dir = '/content/LipSync-with-Face-Restoration/GFPGAN/experiments/pretrained_models'
!cd /content/LipSync-with-Face-Restoration/GFPGAN && python setup.py develop
!wget $gfpgan_url -P $gfpgan_dir

running develop
running egg_info
creating gfpgan.egg-info
writing gfpgan.egg-info/PKG-INFO
writing dependency_links to gfpgan.egg-info/dependency_links.txt
writing requirements to gfpgan.egg-info/requires.txt
writing top-level names to gfpgan.egg-info/top_level.txt
writing manifest file 'gfpgan.egg-info/SOURCES.txt'
reading manifest file 'gfpgan.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'gfpgan.egg-info/SOURCES.txt'
running build_ext
Creating /usr/local/lib/python3.7/site-packages/gfpgan.egg-link (link to .)
Adding gfpgan 1.3.8 to easy-install.pth file

Installed /content/LipSync-with-Face-Restoration/GFPGAN
Processing dependencies for gfpgan==1.3.8
Searching for yapf==0.40.2
Best match: yapf 0.40.2
Adding yapf 0.40.2 to easy-install.pth file
Installing yapf script to /usr/local/bin
Installing yapf-diff script to /usr/local/bin

Using /usr/local/lib/python3.7/site-packages
Searching for tqdm==4.45.0
Best match: tqd

### Saving the lip sync video frames in frames directory

In [None]:
unProcessedFrames_dir = os.path.join(results_path, 'frames')

if not os.path.exists(unProcessedFrames_dir):
  os.makedirs(unProcessedFrames_dir)

vidcap = cv2.VideoCapture(low_quality_video)
numberOfFrames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = vidcap.get(cv2.CAP_PROP_FPS)
print("FPS: ", fps, "Frames: ", numberOfFrames)

for frameNumber in tqdm(range(numberOfFrames)):
    _,image = vidcap.read()
    cv2.imwrite(path.join(unProcessedFrames_dir, str(frameNumber).zfill(4)+'.jpg'), image)

FPS:  25.0 Frames:  876


100%|██████████| 876/876 [00:07<00:00, 118.59it/s]


In [None]:
enhanced_frames = os.path.join(results_path, 'enhanced_frames')

if not os.path.exists(enhanced_frames):
  os.makedirs(enhanced_frames)

### Applying GFPGAN model on the frames saved in `frames` directory and save the high quality frames in `enhanced_frames` directory

In [None]:
!cd /content/LipSync-with-Face-Restoration/GFPGAN && python inference_gfpgan.py -i $unProcessedFrames_dir -o $enhanced_frames -v 1.3 -s 2 --bg_upsampler realesrgan

Downloading: "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth" to /usr/local/lib/python3.7/site-packages/weights/RealESRGAN_x2plus.pth

100% 64.0M/64.0M [00:00<00:00, 68.1MB/s]
  f"The parameter '{pretrained_param}' is deprecated since 0.13 and may be removed in the future, "
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /content/LipSync-with-Face-Restoration/GFPGAN/gfpgan/weights/detection_Resnet50_Final.pth

100% 104M/104M [00:01<00:00, 72.9MB/s]
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /content/LipSync-with-Face-Restoration/GFPGAN/gfpgan/weights/parsing_parsenet.pth

100% 81.4M/81.4M [00:01<00:00, 70.7MB/s]
Processing 0000.jpg ...
	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4
Processing 0001.jpg ...
	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4
Processing 0002.jpg ...
	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4
Processing 0003.jpg ...
	Tile 1

### Converting the processed frames (high quality frames) to video with 25 fps as the original video generated from wav2lip model

In [None]:
# Define paths
restoredFramesPath = os.path.join(enhanced_frames, 'restored_imgs')
high_quality_video = os.path.join(results_path, 'high_quality_video.mp4')

dir_list = os.listdir(restoredFramesPath)
dir_list.sort()
img_array = []

from tqdm import tqdm
for filename in tqdm(dir_list):
  img_path = os.path.join(restoredFramesPath, filename)
  img = cv2.imread(img_path)
  if img is None:
    continue
  height, width, layers = img.shape
  size = (width,height)
  img_array.append(img)


out = cv2.VideoWriter(high_quality_video,cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

for i in range(len(img_array)):
  out.write(img_array[i])
out.release()

100%|██████████| 876/876 [00:13<00:00, 66.05it/s]


### Merging the output high quality video with the audio

In [None]:
finalProcessedOuputVideo = os.path.join(results_path, 'final_video.mp4')
!ffmpeg -y -r $fps -i {high_quality_video} -i {input_voice} -map 0 -map 1:a -c:v copy -shortest {finalProcessedOuputVideo}