<h2><b>Mounting Drive<b><h2>

In [23]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


<h2><b>Installing Requirements<b><h2>

In [24]:
!pip install librosa==0.8.0
!pip install ffmpeg
!pip install numpy==1.20.1
!pip install opencv-contrib-python
!pip install opencv-python
!pip install torch
!pip install torchvision
!pip install tqdm
!pip install numba==0.48



<h2><b>Downloading Video And Audio<b><h2>

In [None]:
from pytube import YouTube

#Setting progress
def progress_func(stream, chunk, bytes_remaining):
    current = ((stream.filesize - bytes_remaining)/stream.filesize)
    percent = ('{0:.1f}').format(current*100)
    progress = int(50*current)
    status = '█' * progress + '-' * (50 - progress)
    print('Downloading: [{0}] {1}%'.format(status, percent), end='\r')

#Allocating Link
youtube_link = 'https://www.youtube.com/watch?v=YMuuEv37s0o'
yt = YouTube(youtube_link, on_progress_callback=progress_func)

# Getting best resolution video stream without audio
video_stream = yt.streams.filter(only_video=True, file_extension='mp4').first()

#Dowloading
video_stream.download()

#Once Finished , Print !
print("Download finished.")

In [None]:
import gdown

url = 'https://drive.google.com/uc?id=1jhUOAeGw8lPjNf7Q1cIcBOvzE3CJ3gVz'
output = 'output10.wav'
gdown.download(url, output, quiet=False)


<h2><b>Once we finish downloading , we have to rename our video file name to video.mp4 , the code given below will only capture those frames where the face is visible and upto 1 minute and 8 sec..As Wav2lip depends on face<b><h2>

In [None]:
import cv2
from tqdm import tqdm

# Load the face detector
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Load the video
cap = cv2.VideoCapture('video.mp4')

# Get the total number of frames and frames per second
total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
fps = cap.get(cv2.CAP_PROP_FPS)

# Calculate the number of frames for 1 minute and 8 seconds
duration_frames = int(1 * 60 * fps + 8 * fps)  # 1 minute and 8 seconds

# Prepare the video writer with the new fourcc and frame size
frame_size = (3840, 1920)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, frame_size)

# Initialize the progress bar
pbar = tqdm(total=duration_frames)

frames_written = 0
while(cap.isOpened() and frames_written < duration_frames):
    ret, frame = cap.read()
    if ret == True:
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5)

        # If a face is detected and the frame is not None, write the frame to the output video
        if len(faces) != 0 and frame is not None:
            out.write(frame)
            frames_written += 1

        # Update the progress bar
        pbar.update(1)
    else:
        break

# Close the progress bar
pbar.close()

cap.release()
out.release()

<h3><b>Now here we will use FFMpeg to meet Wav2lip requirements:

>ffmpeg -i output.mp4 -r 25 input.mp4
>ffmpeg -i input.mp4 -vf "scale=1280:-1" output.mp4
<br>

Here we are converting our video file to 1280 with 25 fps to make it easier for Wav2lip to do its work

Note: Please make sure that you have ffmpeg installed , if not , then you can install it through pip or download it from here https://ffmpeg.org/download.html and set the path of bin folder in Environment Variables if you are working on PC

In [25]:
!git clone https://github.com/Rudrabha/Wav2Lip.git

fatal: destination path 'Wav2Lip' already exists and is not an empty directory.


In [34]:
!ls "/content/gdrive/MyDrive/Colab Notebooks"

output10.wav  Untitled0.ipynb  video.mp4  wav2lip_gan.pth


In [27]:
!cp -ri "/content/gdrive/MyDrive/Colab Notebooks/wav2lip_gan.pth" /content/Wav2Lip/checkpoints/

cp: overwrite '/content/Wav2Lip/checkpoints/wav2lip_gan.pth'? yes


In [28]:
!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "Wav2Lip/face_detection/detection/sfd/s3fd.pth"

--2023-07-19 04:39:05--  https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Resolving www.adrianbulat.com (www.adrianbulat.com)... 45.136.29.207
Connecting to www.adrianbulat.com (www.adrianbulat.com)|45.136.29.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89843225 (86M) [application/octet-stream]
Saving to: ‘Wav2Lip/face_detection/detection/sfd/s3fd.pth’


2023-07-19 04:39:12 (12.4 MB/s) - ‘Wav2Lip/face_detection/detection/sfd/s3fd.pth’ saved [89843225/89843225]



In [29]:
!ls sample_data

anscombe.json		      mnist_test.csv	     README.md
california_housing_test.csv   mnist_train_small.csv  video.mp4
california_housing_train.csv  output10.wav


In [35]:
!cp "/content/gdrive/MyDrive/Colab Notebooks/video.mp4" "/content/gdrive/MyDrive/Colab Notebooks/output10.wav" sample_data/
!ls sample_data/

anscombe.json		      mnist_test.csv	     README.md
california_housing_test.csv   mnist_train_small.csv  video.mp4
california_housing_train.csv  output10.wav


In [36]:
!cd Wav2Lip && python inference.py --checkpoint_path '../Wav2Lip/checkpoints/wav2lip_gan.pth' --face "../sample_data/video.mp4" --audio "../sample_data/output10.wav" --outfile "outputs.mp4"

Using cuda for inference.
Reading video frames...
Number of frames available for inference: 1702
(80, 5386)
Length of mel chunks: 1680
  0% 0/14 [00:00<?, ?it/s]
  0% 0/105 [00:00<?, ?it/s][A
  1% 1/105 [00:41<1:11:25, 41.21s/it][A
  2% 2/105 [00:44<32:08, 18.73s/it]  [A
  3% 3/105 [00:46<19:10, 11.28s/it][A
  4% 4/105 [00:49<13:07,  7.80s/it][A
  5% 5/105 [00:51<09:47,  5.88s/it][A
  6% 6/105 [00:54<07:50,  4.76s/it][A
  7% 7/105 [00:56<06:41,  4.10s/it][A
  8% 8/105 [00:59<05:47,  3.58s/it][A
  9% 9/105 [01:01<05:10,  3.23s/it][A
 10% 10/105 [01:04<04:46,  3.02s/it][A
 10% 11/105 [01:06<04:30,  2.88s/it][A
 11% 12/105 [01:09<04:25,  2.86s/it][A
 12% 13/105 [01:12<04:14,  2.77s/it][A
 13% 14/105 [01:14<04:05,  2.70s/it][A
 14% 15/105 [01:17<03:59,  2.67s/it][A
 15% 16/105 [01:20<03:56,  2.66s/it][A
 16% 17/105 [01:22<03:58,  2.71s/it][A
 17% 18/105 [01:25<03:53,  2.69s/it][A
 18% 19/105 [01:28<03:48,  2.66s/it][A
 19% 20/105 [01:30<03:43,  2.63s/it][A
 20% 21/105 

<h2><b>Checking whether our output file generated or not !</b></h2>

In [39]:
!ls Wav2Lip

audio.py		filelists	     outputs.mp4       results
checkpoints		hparams.py	     preprocess.py     temp
color_syncnet_train.py	hq_wav2lip_train.py  __pycache__       wav2lip_train.py
evaluation		inference.py	     README.md
face_detection		models		     requirements.txt


<h2><b>Copying our file to google drive !<b><h2>

In [42]:
!cp "Wav2Lip/outputs.mp4" "/content/gdrive/MyDrive/Colab Notebooks/"

Overall summary:


- This whole project depends upon input file
- Use Jdownloader to download highest quality avaialble and then covert it using ffmpeg
- Input file should be of 1280p (max) with 25 fps
- Wav2lip_gan.pth must be used

