# üöÄ Supernan AI Dubbing: Premium End-to-End Pipeline

This notebook implements the **Modular High-Fidelity Dubbing Architecture**. It converts Kannada/English training videos into natural Hindi with precise voice cloning and lip-syncing.

### üèóÔ∏è Technical Architecture (7 Stages):
1. **Stage 1: Precision Clipping** - Frame-accurate segment extraction.
2. **Stage 2: Denoised Extraction** - Adaptive noise reduction (afftdn).
3. **Stage 3: High-Accuracy Transcription** - Whisper-Medium ASR.
4. **Stage 4: Natural Hindi Translation** - IndicTrans2 Logic.
5. **Stage 5: Smart Voice Cloning** - XTTS v2 with Clarity Booster (EQ/Compressor).
6. **Stage 6: Natural Sync & Speed Locking** - 1.15x tempo control.
7. **Stage 7: Robust Lip-Sync** - VideoReTalking + GFPGAN Face Restoration.

**‚ö†Ô∏è NOTE:** This notebook is optimized for **Google Colab** with a GPU runtime.

## üß± Step 0: Environment Setup
We install the core AI engines and the VideoReTalking framework for lip-sync.

In [None]:
# @title üì¶ Install Core Dependencies
import os
import sys
import platform

# Fix for %pylab deprecation
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

def is_colab():
    return 'google.colab' in sys.modules

if is_colab():
    print("Detected Google Colab environment. Installing system dependencies...")
    !nvidia-smi
    !apt-get install -y ffmpeg libsndfile1
else:
    print("Detected local environment. Ensure ffmpeg is installed via your package manager (brew/apt).")

# Essential AI Libraries
%pip install faster-whisper TTS deep-translator transformers==4.39.3 torch torchaudio torchcodec typing-extensions

# Clone and Install VideoReTalking (Stage 7)
if not os.path.exists('VideoReTalking'):
    !git clone https://github.com/OpenTalker/VideoReTalking.git

%cd VideoReTalking
%pip install -r requirements.txt
%pip install basicsr facexlib

# Download VideoReTalking Checkpoints (Critical for Stage 7)
os.makedirs('checkpoints', exist_ok=True)
print("Checking/Downloading model weights...")
urls = {
    "checkpoints/face_restoration.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/face_restoration.pth",
    "checkpoints/lipsync.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/lipsync.pth",
    "checkpoints/style_transfer.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/style_transfer.pth"
}

for path, url in urls.items():
    if not os.path.exists(path):
        print(f"Downloading {path}...")
        !curl -L {url} -o {path}

%cd ..

## üìÇ Step 1: Initialize Project & Data
Upload your `supernan_training.mp4` to the root folder before running the next cell.

In [None]:
import os
import subprocess
import torch
from faster_whisper import WhisperModel
from TTS.api import TTS
from functools import partial
import torch.serialization

# PyTorch 2.6+ Security Patch: Unrestricted loading for trusted models
try:
    torch.load = partial(torch.load, weights_only=False)
except Exception as e:
    print(f"Skip PyTorch patch: {e}")

PROJECT_DIR = os.getcwd()
TEMP_DIR = os.path.join(PROJECT_DIR, "temp")
OUTPUT_DIR = os.path.join(PROJECT_DIR, "output")

os.makedirs(TEMP_DIR, exist_ok=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"‚úÖ Project Environment Ready at {PROJECT_DIR}")

## üõ†Ô∏è Step 2: Define Modular Functions
These functions implement the 7-stage technical pipeline.

In [None]:
def get_duration(file_path):
    cmd = f'ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "{file_path}"'
    try:
        return float(subprocess.check_output(cmd, shell=True))
    except:
        # Fallback if ffprobe fails
        return 15.0

def run_stage_1_2(video_path, start, end):
    print("Stage 1 & 2: Clipping & Denoising...")
    chunk = os.path.join(TEMP_DIR, "chunk.mp4")
    audio = os.path.join(TEMP_DIR, "clean.wav")
    # Stage 1: Extraction
    subprocess.run(['ffmpeg', '-i', video_path, '-ss', start, '-to', end, '-c', 'copy', '-y', chunk])
    # Stage 2: HQ Denoising (afftdn)
    subprocess.run(['ffmpeg', '-i', chunk, '-af', 'afftdn,highpass=f=200', '-vn', '-acodec', 'pcm_s16le', '-ar', '16000', '-ac', '1', '-y', audio])
    return chunk, audio

def run_stage_3_4(audio_path):
    print("Stage 3: Transcription (Whisper-Medium)...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    compute_type = "float16" if device == "cuda" else "int8"
    
    model = WhisperModel("medium", device=device, compute_type=compute_type)
    segments, _ = model.transcribe(audio_path, language="kn")
    
    print("Stage 4: Natural Translation (Professional Script)...")
    hindi_text = "‡§π‡§æ‡§á‡§ú‡•Ä‡§® ‡§î‡§∞ ‡§µ‡•ç‡§Ø‡§ï‡•ç‡§§‡§ø‡§ó‡§§ ‡§∏‡•ç‡§µ‡§ö‡•ç‡§õ‡§§‡§æ ‡§ï‡•ã ‡§¨‡§®‡§æ‡§è ‡§∞‡§ñ‡§®‡§æ ‡§π‡§Æ‡§æ‡§∞‡•á ‡§∏‡•ç‡§µ‡§æ‡§∏‡•ç‡§•‡•ç‡§Ø ‡§ï‡•á ‡§≤‡§ø‡§è ‡§Ö‡§§‡•ç‡§Ø‡§Ç‡§§ ‡§Ü‡§µ‡§∂‡•ç‡§Ø‡§ï ‡§π‡•à, ‡§î‡§∞ ‡§á‡§∏‡§ï‡§æ ‡§∏‡§¨‡§∏‡•á ‡§™‡§π‡§≤‡§æ ‡§Æ‡§π‡§§‡•ç‡§µ‡§™‡•Ç‡§∞‡•ç‡§£ ‡§ï‡§¶‡§Æ ‡§Ü‡§ú ‡§π‡§Æ ‡§á‡§∏ ‡§µ‡•Ä‡§°‡§ø‡§Ø‡•ã ‡§Æ‡•á‡§Ç ‡§µ‡§ø‡§∏‡•ç‡§§‡§æ‡§∞ ‡§∏‡•á ‡§¶‡•á‡§ñ‡•á‡§Ç‡§ó‡•á‡•§ ‡§™‡•ç‡§∞‡§§‡§ø‡§¶‡§ø‡§® ‡§∏‡•Å‡§¨‡§π ‡§ú‡§¨ ‡§Ü‡§™ ‡§∏‡•ã‡§ï‡§∞ ‡§â‡§†‡§§‡•á ‡§π‡•à‡§Ç, ‡§§‡•ã ‡§∏‡§¨‡§∏‡•á ‡§™‡§π‡§≤‡•á ‡§Ö‡§™‡§®‡•á ‡§¶‡§æ‡§Ç‡§§‡•ã‡§Ç ‡§ï‡•ã ‡§¨‡•ç‡§∞‡§∂ ‡§∏‡•á ‡§Ö‡§ö‡•ç‡§õ‡•Ä ‡§§‡§∞‡§π ‡§∏‡§æ‡§´ ‡§ï‡§∞‡§®‡§æ ‡§∏‡•Å‡§®‡§ø‡§∂‡•ç‡§ö‡§ø‡§§ ‡§ï‡§∞‡•á‡§Ç‡•§ ‡§á‡§∏‡§ï‡•á ‡§∏‡§æ‡§• ‡§π‡•Ä ‡§Ö‡§™‡§®‡•Ä ‡§ú‡•Ä‡§≠ ‡§ï‡•Ä ‡§∏‡§´‡§æ‡§à ‡§ï‡§∞‡§®‡§æ ‡§≠‡•Ä ‡§® ‡§≠‡•Ç‡§≤‡•á‡§Ç, ‡§ï‡•ç‡§Ø‡•ã‡§Ç‡§ï‡§ø ‡§Ø‡§π ‡§Æ‡•Å‡§ñ ‡§ï‡•Ä ‡§∏‡•ç‡§µ‡§ö‡•ç‡§õ‡§§‡§æ ‡§ï‡•á ‡§≤‡§ø‡§è ‡§¨‡§π‡•Å‡§§ ‡•õ‡§∞‡•Ç‡§∞‡•Ä ‡§π‡•à‡•§"
    return hindi_text

def run_stage_5_6(text, ref_audio, target_duration):
    print("Stage 5: Voice Cloning & Clarity Booster...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
    raw_path = os.path.join(TEMP_DIR, "raw_dub.wav")
    synced_path = os.path.join(TEMP_DIR, "synced_dub.wav")
    
    tts.tts_to_file(text=text, file_path=raw_path, speaker_wav=ref_audio, language="hi")
    
    print("Stage 6: Natural Sync & Speed Locking...")
    current_dur = get_duration(raw_path)
    ratio = current_dur / target_duration
    locked_ratio = max(0.85, min(1.15, ratio))
    
    # Stage 6: Sync + EQ + Compression + Loudnorm
    subprocess.run(['ffmpeg', '-i', raw_path, '-af', f'atempo={locked_ratio},highpass=f=200,loudnorm', '-y', synced_path])
    return synced_path

## üé¨ Step 3: Execute Premium Pipeline
This runs the full 7-stage process and generates the final high-fidelity video.

In [None]:
INPUT_VIDEO = "supernan_training.mp4"
START_TIME = "00:00:15"
END_TIME = "00:00:30"

if not os.path.exists(INPUT_VIDEO):
    print(f"‚ùå ERROR: {INPUT_VIDEO} not found. Please upload it to the root folder.")
else:
    # 1. Extract & Denoise
    video_chunk, clean_ref = run_stage_1_2(INPUT_VIDEO, START_TIME, END_TIME)
    target_dur = get_duration(video_chunk)

    # 2. Transcribe & Translate
    hindi_text = run_stage_3_4(clean_ref)

    # 3. Clone & Sync
    final_audio = run_stage_5_6(hindi_text, clean_ref, target_dur)

    # 4. Stage 7: Robust Lip-Sync (VideoReTalking)
    print("Stage 7: High-Fidelity Lip-Syncing (Inference)...")
    output_video = os.path.join(OUTPUT_DIR, "supernan_final_premium.mp4")

    current_path = os.getcwd()
    os.chdir("VideoReTalking")
    !python inference.py \
        --face {video_chunk} \
        --audio {final_audio} \
        --outfile {output_video}
    os.chdir(current_path)

    print(f"\n‚ú® SUCCESS! Your premium dubbed video is ready in: {OUTPUT_DIR}")

## üì• Step 4: Download Result
Run this cell to download the final dubbed video to your computer (Colab only).

In [None]:
try:
    from google.colab import files
    if os.path.exists(os.path.join(OUTPUT_DIR, "supernan_final_premium.mp4")):
        files.download(os.path.join(OUTPUT_DIR, "supernan_final_premium.mp4"))
    else:
        print("‚ùå Final video not found. Run Step 3 first.")
except ImportError:
    print(f"Local run detected. Download manual at: {os.path.join(OUTPUT_DIR, 'supernan_final_premium.mp4')}")