# Lip-Syncing Audio to Video

### In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment

<figure>
        <img src="https://ar5iv.labs.arxiv.org/html/2212.04970/assets/x1.png" alt ="Audio Art" style='width:800px;height:500px;'>
        <figcaption>

### Import and download necessary libraries

In [1]:
!pip install librosa==0.9.1
!git clone https://github.com/zabique/Wav2Lip
!cd Wav2Lip && pip install -r requirements.txt
!pip install ffmpeg-python
!rm -rf /sample_data
!mkdir /sample_data

!pip install https://raw.githubusercontent.com/AwaleSajil/ghc/master/ghc-1.0-py3-none-any.whl
!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "/kaggle/working/Wav2Lip/face_detection/detection/sfd/s3fd.pth"

from base64 import b64decode
import numpy as np
from scipy.io.wavfile import read as wav_read
import io
import ffmpeg

from IPython.display import clear_output 
clear_output()
print("\nDone")


Done


### List the contents of the directories before inference


In [2]:
print("Contents of /kaggle/working/Wav2Lip:")
!ls /kaggle/working/Wav2Lip
print("Contents of /kaggle/input/generative-ai:")
!ls /kaggle/input/generative-ai/

Contents of /kaggle/working/Wav2Lip:
README.md		filelists	     requirements.txt
audio.py		hparams.py	     requirementsCPU.txt
checkpoints		hq_wav2lip_train.py  results
color_syncnet_train.py	inference.py	     temp
evaluation		models		     wav2lip_train.py
face_detection		preprocess.py
Contents of /kaggle/input/generative-ai:
13_K.mp4  96_E.wav  wav2lip_gan.pth


### Running the inference command separately to capture errors and List the contents of the results directory after inference



In [3]:
try:
    print("Running inference...")
    inference_output = !cd /kaggle/working/Wav2Lip && python inference.py --checkpoint_path /kaggle/input/generative-ai/wav2lip_gan.pth --face "/kaggle/input/generative-ai/13_K.mp4" --audio "/kaggle/input/generative-ai/96_E.wav" --outfile "results/result_voice.mp4"
    print("Inference output:")
    print("\n".join(inference_output))
    print("Inference completed successfully.")
except Exception as e:
    print(f"Inference failed: {e}")

print("Contents of /kaggle/working/Wav2Lip/results after inference:")
!ls /kaggle/working/Wav2Lip/results

Running inference...
Inference output:
  return librosa.filters.mel(hp.sample_rate, hp.n_fft, n_mels=hp.num_mels,
Using cuda for inference.
Reading video frames...
Number of frames available for inference: 777
(80, 3233)
Length of mel chunks: 1007

  0%|          | 0/8 [00:00<?, ?it/s]

  0%|          | 0/49 [00:00<?, ?it/s][A
  0%|          | 0/49 [00:04<?, ?it/s]
Recovering from OOM error; New batch size: 8


  0%|          | 0/98 [00:00<?, ?it/s][A

  1%|          | 1/98 [03:02<4:54:14, 182.01s/it][A

  2%|▏         | 2/98 [03:04<2:01:52, 76.17s/it] [A

  3%|▎         | 3/98 [03:06<1:07:02, 42.34s/it][A

  4%|▍         | 4/98 [03:08<41:25, 26.44s/it]  [A

  5%|▌         | 5/98 [03:10<27:21, 17.65s/it][A

  6%|▌         | 6/98 [03:12<18:56, 12.35s/it][A

  7%|▋         | 7/98 [03:14<13:38,  8.99s/it][A

  8%|▊         | 8/98 [03:16<10:11,  6.79s/it][A

  9%|▉         | 9/98 [03:18<07:53,  5.32s/it][A

 10%|█         | 10/98 [03:20<06:19,  4.31s/it][A

 11%|█         | 11/

### Check if the result file exists and Save the result video file to a desired location



In [4]:
import os
result_file = '/kaggle/working/Wav2Lip/results/result_voice.mp4'
if os.path.exists(result_file):
    output_file = '/kaggle/working/result_voice.mp4'
    import shutil
    shutil.copy(result_file, output_file)
    print(f"Result file saved as {output_file}")

    from IPython.display import HTML
    from base64 import b64encode

    mp4 = open(result_file,'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
    display(HTML(f"""
    <video width="50%" height="50%" controls>
          <source src="{data_url}" type="video/mp4">
    </video>"""))
else:
    print(f"Result file {result_file} not found.")

Result file saved as /kaggle/working/result_voice.mp4
