<a href="https://colab.research.google.com/github/Shivamani162/Generative-AI-2025/blob/main/Virtual_News_Anchor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Step 1: Install dependencies and FFmpeg
!apt-get update
!apt-get install -y ffmpeg

# Step 2: Clone Wav2Lip repository (skip if already cloned)
import os
if not os.path.exists("/content/Wav2Lip"):
    !git clone https://github.com/Rudrabha/Wav2Lip.git
%cd Wav2Lip

# Step 3: Install compatible versions of requirements
# The original requirements.txt has outdated versions, so we install compatible ones
!pip install librosa==0.8.0  # Compatible with Python 3.11
!pip install numpy==1.23.5  # Compatible with Python 3.11
!pip install opencv-python==4.11.0.86  # Latest compatible version
!pip install gdown  # For reliable downloads

# Step 4: Create necessary directories
!mkdir -p face_detection/detection/sfd
!mkdir -p checkpoints

# Step 5: Download face detection model
print("Downloading face detection model...")
!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"
if not os.path.exists("face_detection/detection/sfd/s3fd.pth"):
    print("Failed to download face detection model. Please download manually from:")
    print("https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth")
    print("Then upload it here and name it 's3fd-619a316812.pth'.")
    from google.colab import files
    uploaded = files.upload()
    for filename in uploaded.keys():
        print(f"Uploaded file: {filename}")
        if "s3fd-619a316812.pth" in filename.lower():
            !mv "{filename}" "face_detection/detection/sfd/s3fd.pth"
        else:
            print(f"Uploaded file '{filename}' does not match 's3fd-619a316812.pth'. Please upload the correct file.")
            raise FileNotFoundError("Incorrect file uploaded.")

# Step 6: Attempt to download Wav2Lip checkpoint using wget with retries
checkpoint_path = "checkpoints/wav2lip_gan.pth"
import time

print("Downloading Wav2Lip checkpoint...")
for attempt in range(3):  # Try 3 times
    try:
        !wget "https://huggingface.co/Nekochu/Wav2Lip/resolve/main/wav2lip_gan.pth" -O {checkpoint_path}
        time.sleep(5)  # Wait for file to be written
        if os.path.exists(checkpoint_path):
            break
    except:
        print(f"Attempt {attempt + 1} failed. Retrying...")
        time.sleep(5)

# Step 7: If download fails, prompt for manual upload
if not os.path.exists(checkpoint_path):
    print("Automatic download failed. Please manually download 'wav2lip_gan.pth' from one of these links:")
    print("1. Hugging Face: https://huggingface.co/Nekochu/Wav2Lip/resolve/main/wav2lip_gan.pth")
    print("2. Google Drive: https://drive.google.com/uc?id=1Y7nNhfA-5W9kEyX6cWq30BZz7eA2W5h-")
    print("Steps: Open a link in a browser, download the file (should be ~433 MB), save it as 'wav2lip_gan.pth', then upload it here.")
    from google.colab import files
    uploaded = files.upload()
    for filename in uploaded.keys():
        print(f"Uploaded file: {filename}")
        if "wav2lip_gan.pth" in filename.lower():  # Case-insensitive matching
            !mv "{filename}" {checkpoint_path}
            print(f"Moved {filename} to {checkpoint_path}")
        else:
            print(f"Uploaded file '{filename}' does not match 'wav2lip_gan.pth'. Please upload the correct file.")
            raise FileNotFoundError("Incorrect file uploaded.")

# Step 8: Verify the checkpoint file size (should be ~433 MB)
file_size = os.path.getsize(checkpoint_path) / (1024 * 1024)  # Size in MB
print(f"Checkpoint file size: {file_size:.2f} MB")
if file_size < 400:  # If less than 400 MB, it's likely incomplete
    raise FileNotFoundError("Checkpoint file is incomplete. Please rerun this cell or manually upload a valid file.")

print("Wav2Lip setup complete. Please restart the runtime (Runtime > Restart runtime) and then run the next cell.")

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,372 kB]
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:9 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,670 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main all Packages [8,738 kB]
Get:13 http://archive.ubuntu.com/ubuntu jamm

Downloading face detection model...
--2025-03-12 13:53:25--  https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Resolving www.adrianbulat.com (www.adrianbulat.com)... 45.136.29.207
Connecting to www.adrianbulat.com (www.adrianbulat.com)|45.136.29.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89843225 (86M) [application/octet-stream]
Saving to: ‘face_detection/detection/sfd/s3fd.pth’


2025-03-12 13:53:30 (19.6 MB/s) - ‘face_detection/detection/sfd/s3fd.pth’ saved [89843225/89843225]

Downloading Wav2Lip checkpoint...
--2025-03-12 13:53:30--  https://huggingface.co/Nekochu/Wav2Lip/resolve/main/wav2lip_gan.pth
Resolving huggingface.co (huggingface.co)... 18.238.109.92, 18.238.109.102, 18.238.109.121, ...
Connecting to huggingface.co (huggingface.co)|18.238.109.92|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.hf.co/repos/77/6e/776eb19ab2111846cf60d972f9cc20dcc659aa064fa57f090d3a5f3e5f60b199/

In [2]:
# Step 1: Install gtts to ensure availability
!pip install gtts

# Step 2: Import necessary libraries
from google.colab import files
from IPython.display import Audio, display
from PIL import Image
import os
import random
import glob
import subprocess
import time
from gtts import gTTS
import shutil

# Step 3: Ensure working directory is correct
%cd /content/Wav2Lip

# Step 4: Verify inference.py and checkpoint exist
if not os.path.exists("inference.py"):
    raise FileNotFoundError("inference.py not found in /content/Wav2Lip. Please ensure Step 1 was run correctly.")
if not os.path.exists("checkpoints/wav2lip_gan.pth"):
    raise FileNotFoundError("Checkpoint file 'checkpoints/wav2lip_gan.pth' not found. Please ensure Step 1 completed successfully.")

# Step 5: Upload the anchor's image
print("Please upload the anchor's image (JPG or PNG, at least 256x256 with a clear face):")
uploaded = files.upload()
if not uploaded:
    print("No image uploaded. Please run the code again and upload an image.")
    raise FileNotFoundError("No image uploaded.")

# Sanitize the uploaded image filename
original_image_path = list(uploaded.keys())[0]
sanitized_image_path = "uploaded_image" + os.path.splitext(original_image_path)[1].lower()
sanitized_image_path = sanitized_image_path.replace(" ", "_").replace("(", "").replace(")", "")
shutil.move(original_image_path, sanitized_image_path)
image_path = sanitized_image_path
print(f"Sanitized image path: {image_path}")

# Step 6: Verify image file is valid
try:
    img = Image.open(image_path)
    print(f"Image dimensions: {img.size} (width, height)")
    if img.size[0] < 256 or img.size[1] < 256:
        print("Error: Image is smaller than 256x256. Please upload a larger image.")
        raise ValueError("Image too small.")
except Exception as e:
    print(f"Error opening image: {e}")
    raise FileNotFoundError("Invalid image file.")

# Step 7: Prompt for custom script input
script = input("Please enter the script you want the anchor to say (max 100 words): ")
words = script.split()
if len(words) > 100:
    print("Warning: Script exceeds 100 words. Proceeding anyway...")
print(f"Entered script: {script}")

# Step 8: Prompt for language selection
language_options = {"en": "English", "fr": "French", "es": "Spanish", "te": "Telugu", "hi": "Hindi"}
print("Available languages: ", ", ".join([f"{code} ({name})" for code, name in language_options.items()]))
language = input("Enter the language code (default: en): ").strip().lower()
if language not in language_options:
    print("Invalid language code. Defaulting to English.")
    language = "en"
print(f"Using language: {language_options[language]}")

# Step 9: Prompt for voice selection
voice_gender = input("Choose voice gender (male/female, default: female): ").strip().lower()
if voice_gender not in ["male", "female"]:
    print("Invalid choice. Defaulting to female voice.")
    voice_gender = "female"
print(f"Using {voice_gender} voice.")

# Step 10: Generate audio
if language in ["te", "hi"] and voice_gender == "male":
    print("Warning: gTTS does not have male voices for Telugu or Hindi. Proceeding with default voice.")
tts = gTTS(text=script, lang=language, slow=False)
audio_path = "audio.wav"
tts.save(audio_path)

# Step 11: Verify audio file
if not os.path.exists(audio_path):
    print("Audio generation failed.")
    raise FileNotFoundError("Audio file was not generated.")
print("Playing generated audio to verify:")
display(Audio(audio_path))

# Step 12: Run Wav2Lip inference
print("Running Wav2Lip inference...")
output_path = "results/result_voice.mp4"
command = [
    "python", "/content/Wav2Lip/inference.py",
    "--checkpoint_path", "checkpoints/wav2lip_gan.pth",
    "--face", image_path,
    "--audio", audio_path,
    "--outfile", output_path,
    "--fps", "25"
]
print("Executing command:", " ".join(command))
start_time = time.time()
try:
    process = subprocess.run(command, capture_output=True, text=True, timeout=600)
    execution_time = time.time() - start_time
    print(f"Inference completed in {execution_time:.2f} seconds.")
    print("Wav2Lip Output:")
    print(process.stdout)
    if process.stderr:
        print("Wav2Lip Errors:")
        print(process.stderr)
    if process.returncode != 0:
        raise RuntimeError("Inference script failed.")
except subprocess.TimeoutExpired as e:
    print("Inference timed out.")
    raise RuntimeError("Inference took too long and was terminated.")

# Step 13: Download the generated video
video_files = glob.glob("results/*.mp4")
if video_files:
    latest_video = max(video_files, key=os.path.getctime)
    print("Generated video:", latest_video)
    files.download(latest_video)
else:
    print("No video generated.")
    raise RuntimeError("Video generation failed.")


/content/Wav2Lip
Please upload the anchor's image (JPG or PNG, at least 256x256 with a clear face):


Saving Screenshot 2025-03-11 134017.png to Screenshot 2025-03-11 134017.png
Sanitized image path: uploaded_image.png
Image dimensions: (556, 548) (width, height)
Please enter the script you want the anchor to say (max 100 words):  తాజా బ్రేకింగ్ న్యూస్! ఇప్పుడే వచ్చిన సమాచారం ప్రకారం, దేశవ్యాప్తంగా విస్తృతంగా ప్రభావం చూపిన ఈ సంఘటనపై అధికారులు అత్యవసర సమావేశం ఏర్పాటు చేసారు. ప్రజలకు అప్రమత్తంగా ఉండాలని సూచనలు ఇవ్వబడ్డాయి. మరిన్ని వివరాలకు మా ఛానల్‌ను అనుసరించండి.
Entered script:  తాజా బ్రేకింగ్ న్యూస్! ఇప్పుడే వచ్చిన సమాచారం ప్రకారం, దేశవ్యాప్తంగా విస్తృతంగా ప్రభావం చూపిన ఈ సంఘటనపై అధికారులు అత్యవసర సమావేశం ఏర్పాటు చేసారు. ప్రజలకు అప్రమత్తంగా ఉండాలని సూచనలు ఇవ్వబడ్డాయి. మరిన్ని వివరాలకు మా ఛానల్‌ను అనుసరించండి.
Available languages:  en (English), fr (French), es (Spanish), te (Telugu), hi (Hindi)
Enter the language code (default: en): te
Using language: Telugu
Choose voice gender (male/female, default: female): female
Using female voice.
Playing generated audio to verify:


Running Wav2Lip inference...
Executing command: python /content/Wav2Lip/inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face uploaded_image.png --audio audio.wav --outfile results/result_voice.mp4 --fps 25
Inference completed in 121.15 seconds.
Wav2Lip Output:
Using cpu for inference.
Number of frames available for inference: 1
(80, 1744)
Length of mel chunks: 542
Load checkpoint from: checkpoints/wav2lip_gan.pth
Model loaded

Wav2Lip Errors:

  model_weights = torch.load(path_to_detector)


  0%|          | 0/1 [00:00<?, ?it/s][A

100%|██████████| 1/1 [00:03<00:00,  3.53s/it][A
100%|██████████| 1/1 [00:03<00:00,  3.53s/it]
  checkpoint = torch.load(checkpoint_path,

 20%|██        | 1/5 [00:29<01:57, 29.36s/it]
 40%|████      | 2/5 [00:54<01:20, 26.68s/it]
 60%|██████    | 3/5 [01:18<00:51, 25.72s/it]
 80%|████████  | 4/5 [01:43<00:25, 25.30s/it]
100%|██████████| 5/5 [01:48<00:00, 17.99s/it]
100%|██████████| 5/5 [01:48<00:00, 21.68s/it]
ffmpeg version 4.4.2-0ubuntu0.22.

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>