##  Wav2Lip on CPU - Google Colab (Free Tier)

This notebook allows you to run Wav2Lip inference on a CPU, making it compatible with Google Colab's free tier. It includes text-to-speech using gTTS to generate audio from text, which is then used to animate a static face image.

### 1. Setup Environment and Install Dependencies

This cell will:
- Clone the Wav2Lip repository.
- Install specific versions of PyTorch (CPU-compatible) and Librosa (for compatibility).
- Install other necessary Python packages (gTTS for text-to-speech, opencv-python for image/video processing, etc.).
- Download the pre-trained Wav2Lip model checkpoint.
- Download a sample avatar image.

In [None]:
!git clone https://github.com/Rudrabha/Wav2Lip.git

# Install CPU-compatible PyTorch and other dependencies
!pip install torch==1.13.1+cpu torchvision==0.14.1+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
!pip install librosa==0.9.2 numba==0.58.1
!pip install gTTS==2.3.2 opencv-python==4.8.0.76 scipy==1.11.4 tqdm==4.66.1

# Download Wav2Lip pre-trained model
!wget 'https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EAbENTSj11FFp0Q55_iAIVMBcQx28VpVmTuF4h7RnO00rQ' -O '/content/Wav2Lip/checkpoints/wav2lip_gan.pth'

# Download a sample avatar image
!wget 'https://img.freepik.com/free-photo/young-bearded-man-with-striped-shirt_273609-5677.jpg' -O '/content/avatar.jpg'

print("Setup Complete! Wav2Lip repository cloned, dependencies installed, model and sample image downloaded.")

### 2. Import Libraries

Import all the Python libraries that will be used throughout the notebook.

In [None]:
import os
from gtts import gTTS
from IPython.display import Audio, HTML, clear_output
from base64 import b64encode
import cv2
import numpy as np
import subprocess
import torch
import librosa

# Clear output after imports for a cleaner notebook
# clear_output()

print("Libraries imported successfully.")

### 3. Generate Speech from Text using gTTS

This cell defines a function to convert your input text into an audio file (`.wav` format) using Google Text-to-Speech (gTTS). 
The audio will be saved to `/content/generated_tts.wav`.

In [None]:
def text_to_speech(text, output_filename="/content/generated_tts.wav"):
    tts = gTTS(text=text, lang='en')
    tts.save(output_filename)
    print(f"Text converted to speech and saved as {output_filename}")
    return output_filename

# --- Test the TTS --- 
input_text = "Hello, this is a test of the text to speech system for Wav2Lip." # You can change this text
audio_file = text_to_speech(input_text)

# Display audio player in Colab
Audio(audio_file)

### 4. Patch Wav2Lip for CPU and Librosa 0.9.2 Compatibility

The original Wav2Lip code requires some adjustments to run on CPU and with `librosa==0.9.2`.
This cell will overwrite the `Wav2Lip/audio.py` file with a version compatible with our setup.

In [None]:
%%writefile /content/Wav2Lip/audio.py
import librosa
import numpy as np
from scipy.io import wavfile
import scipy.signal as sps

hparams = {
    'sample_rate': 16000,
    'preemphasis': 0.97,
    'n_fft': 800,
    'hop_length': 200,
    'win_length': 800,
    'num_mels': 80,
    'fmin': 55,
    'fmax': 7600,
    'ref_db': 20,
    'min_level_db': -100,
    'rescale': True,
    'rescaling_max': 0.999,
    # Mel filters are scaled to be energy-preserving
    'mel_weight_normalize': True, # Use librosa's Slaney an Tromp normalization
}

def load_wav(path, sr):
    return librosa.load(path, sr=sr)[0]

def save_wav(wav, path, sr):
    wav *= 32767 / max(0.01, np.max(np.abs(wav)))
    #proposed by @dsmiller
    wavfile.write(path, sr, wav.astype(np.int16))

def preemphasis(wav, k, preemphasize=True):
    if preemphasize:
        return sps.lfilter([1, -k], [1], wav)
    return wav

def inv_preemphasis(wav, k, inv_preemphasize=True):
    if inv_preemphasize:
        return sps.lfilter([1], [1, -k], wav)
    return wav

def melspectrogram(wav):
    D = _stft(preemphasis(wav, hparams['preemphasis']))
    S = _amp_to_db(_linear_to_mel(np.abs(D))) - hparams['ref_db']
    if hparams['rescale']:
        S = _normalize(S)
    return S

def _stft(y):
    return librosa.stft(y=y, n_fft=hparams['n_fft'], hop_length=hparams['hop_length'], win_length=hparams['win_length'])

def _linear_to_mel(spectrogram):
    _mel_basis = _build_mel_basis()
    return np.dot(_mel_basis, spectrogram)

def _build_mel_basis():
    # Use htk=True for Slaney-style MEL weights
    return librosa.filters.mel(sr=hparams['sample_rate'], n_fft=hparams['n_fft'], n_mels=hparams['num_mels'],
                               fmin=hparams['fmin'], fmax=hparams['fmax'], htk=True,
                               norm='slaney' if hparams['mel_weight_normalize'] else None)

def _amp_to_db(x):
    min_level = np.exp(hparams['min_level_db'] / 20 * np.log(10))
    return 20 * np.log10(np.maximum(min_level, x))

def _db_to_amp(x):
    return np.power(10.0, (x * 0.05))

def _normalize(S):
    return np.clip((S - hparams['min_level_db']) / -hparams['min_level_db'], 0, 1)

def _denormalize(S):
    return (np.clip(S, 0, 1) * -hparams['min_level_db']) + hparams['min_level_db']

print("Patched Wav2Lip/audio.py created successfully.")

# Further patch: Ensure face_detect command uses python and handles paths correctly for subprocess
# The inference script might also need small tweaks for CPU, handled in the next step's command line call.
def patch_inference_file():
    inference_path = '/content/Wav2Lip/inference.py'
    with open(inference_path, 'r') as f:
        content = f.read()
    
    # Ensure device is CPU
    content = content.replace('device = torch.device("cuda" if torch.cuda.is_available() else "cpu")',
                              'device = torch.device("cpu")')
    content = content.replace('model = model.to(device)', 'model = model.to(torch.device("cpu"))')
    
    # Make sure subprocess calls for face detection use full python path if necessary and quote paths
    # This is more of a safeguard, the main call will be python inference.py ...
    content = content.replace("subprocess.call([args.face_detection_script,",
                              "subprocess.call(['python', args.face_detection_script,")

    with open(inference_path, 'w') as f:
        f.write(content)
    print(f"Patched {inference_path} for CPU usage and subprocess calls.")

patch_inference_file()


### 5. Run Wav2Lip Inference

Now, we'll run the Wav2Lip inference script. 
This will take the static image (`/content/avatar.jpg`) and the generated audio (`/content/generated_tts.wav`) to produce a lip-synced video.

**Important Notes:**
- Ensure the `checkpoint_path` points to the downloaded model.
- `face` is the path to your input image.
- `audio` is the path to your input audio.
- `outfile` is where the result video will be saved.
- We add `--device cpu` to explicitly use the CPU. While we patched `inference.py`, this is an additional safeguard.
- `--pads` and `--face_det_batch_size` are adjusted for potentially slower CPU processing.
- If you see errors related to `ffmpeg`, it might not be installed or found in Colab's default environment. The script usually handles this, but it's a common point of failure if misconfigured.

In [None]:
# Define file paths
face_image_path = "/content/avatar.jpg"
audio_input_path = "/content/generated_tts.wav"
output_video_path = "/content/Wav2Lip/results/result.mp4" # Save it within the Wav2Lip results folder first
final_output_path = "/content/result.mp4" # Final path for easy access

# Wav2Lip Inference Command
# Using !python instead of !cd Wav2Lip && python ... to simplify path management for input/output
wav2lip_command = (
    f"python /content/Wav2Lip/inference.py "
    f"--checkpoint_path /content/Wav2Lip/checkpoints/wav2lip_gan.pth "
    f"--face {face_image_path} "
    f"--audio {audio_input_path} "
    f"--outfile {output_video_path} "
    f"--device cpu " # Explicitly set device to CPU
    f"--pads 0 10 0 0 " # Adjust padding as needed
    f"--face_det_batch_size 4 " # Lower batch size for CPU
    f"--wav2lip_batch_size 32" # Adjust based on CPU capability
)

print(f"Running Wav2Lip command: {wav2lip_command}")
subprocess.run(wav2lip_command, shell=True, check=True)

# Move the result to /content for easier access if needed, and to match the plan's output path
if os.path.exists(output_video_path):
    os.rename(output_video_path, final_output_path)
    print(f"Output video saved as {final_output_path}")
else:
    print(f"Error: Output video not found at {output_video_path}")

### 6. Display the Output Video

This cell will display the generated lip-synced video directly in the notebook.

In [None]:
from IPython.display import HTML
from base64 import b64encode

video_path = "/content/result.mp4"

if os.path.exists(video_path):
    mp4 = open(video_path,'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
    display(HTML(f'''
    <video width=400 controls>
          <source src="{data_url}" type="video/mp4">
    </video>'''))
    print(f"Displaying video from {video_path}")
else:
    print(f"Video file not found at {video_path}. Please ensure the Wav2Lip script ran successfully.")