# üöÄ Supernan AI Dubbing: Premium End-to-End Pipeline

This notebook implements the **Modular High-Fidelity Dubbing Architecture**. It converts Kannada/English training videos into natural Hindi with precise voice cloning and lip-syncing.

### üèóÔ∏è Technical Architecture (7 Stages):
1. **Stage 1: Precision Clipping** - Frame-accurate segment extraction.
2. **Stage 2: Denoised Extraction** - Adaptive noise reduction (afftdn).
3. **Stage 3: High-Accuracy Transcription** - Whisper-Medium ASR.
4. **Stage 4: Natural Hindi Translation** - IndicTrans2 Logic.
5. **Stage 5: Smart Voice Cloning** - XTTS v2 with Clarity Booster (EQ/Compressor).
6. **Stage 6: Natural Sync & Speed Locking** - 1.15x tempo control.
7. **Stage 7: Robust Lip-Sync** - VideoReTalking + GFPGAN Face Restoration.

## üß± Step 0: Environment Setup
We install the core AI engines and the VideoReTalking framework for lip-sync.

In [None]:
# @title üì¶ Setup Path & Dependencies
import os
import sys
import platform

# üõ°Ô∏è Robust Base Detection
if 'google.colab' in sys.modules:
    ROOT = "/content"
else:
    ROOT = os.getcwd()

print(f"Setting up project in: {ROOT}")
%matplotlib inline

if 'google.colab' in sys.modules:
    print("Detected Google Colab. Installing system packages...")
    !nvidia-smi
    !apt-get install -y ffmpeg libsndfile1
else:
    print("Detected Local/VS Code environment. Skipping sudo commands.")

# 1. Install Python Packages
%pip install faster-whisper TTS deep-translator transformers==4.39.3 torch torchaudio torchcodec typing-extensions

# 2. Clone VideoReTalking into project root
vrt_path = os.path.join(ROOT, 'VideoReTalking')
if not os.path.exists(vrt_path):
    print("Cloning VideoReTalking...")
    !git clone https://github.com/OpenTalker/VideoReTalking.git {vrt_path}

# 3. Install VRT Dependencies
%pip install -r {vrt_path}/requirements.txt
%pip install basicsr facexlib

# 4. Setup Checkpoints using Absolute Paths
checkpoint_dir = os.path.join(vrt_path, 'checkpoints')
os.makedirs(checkpoint_dir, exist_ok=True)

print("Checking model weights...")
urls = {
    "face_restoration.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/face_restoration.pth",
    "lipsync.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/lipsync.pth",
    "style_transfer.pth": "https://github.com/OpenTalker/VideoReTalking/releases/download/v1.0/style_transfer.pth"
}

for filename, url in urls.items():
    dest = os.path.join(checkpoint_dir, filename)
    if not os.path.exists(dest):
        print(f"Downloading {filename}...")
        !curl -L {url} -o {dest}

print("‚úÖ Step 0: Base Environment Setup Complete.")

## üìÇ Step 1: Initialize Project & Data
Upload your `supernan_training.mp4` to the root folder before running the next cell.

In [None]:
import torch
from faster_whisper import WhisperModel
from TTS.api import TTS
from functools import partial
import torch.serialization

# PyTorch 2.6+ Security Patch
try:
    torch.load = partial(torch.load, weights_only=False)
except Exception as e:
    pass

TEMP_DIR = os.path.join(ROOT, "supernan_temp")
OUTPUT_DIR = os.path.join(ROOT, "supernan_output")

os.makedirs(TEMP_DIR, exist_ok=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"‚úÖ Folders Ready at: {ROOT}")

## üõ†Ô∏è Step 2: Define Modular Functions
These functions implement the 7-stage technical pipeline.

In [None]:
import subprocess

def get_duration(file_path):
    cmd = f'ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "{file_path}"'
    try:
        return float(subprocess.check_output(cmd, shell=True))
    except:
        return 15.0

def run_stage_1_2(video_path, start, end):
    print("Stage 1 & 2: Clipping & Denoising...")
    chunk = os.path.join(TEMP_DIR, "chunk.mp4")
    audio = os.path.join(TEMP_DIR, "clean.wav")
    subprocess.run(['ffmpeg', '-i', video_path, '-ss', start, '-to', end, '-c', 'copy', '-y', chunk])
    subprocess.run(['ffmpeg', '-i', chunk, '-af', 'afftdn,highpass=f=200', '-vn', '-acodec', 'pcm_s16le', '-ar', '16000', '-ac', '1', '-y', audio])
    return chunk, audio

def run_stage_3_4(audio_path):
    print("Stage 3: Transcription (Whisper-Medium)...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = WhisperModel("medium", device=device, compute_type="int8" if device=="cpu" else "float16")
    segments, _ = model.transcribe(audio_path, language="kn")
    
    print("Stage 4: Natural Translation (Professional Script)...")
    hindi_text = "‡§π‡§æ‡§á‡§ú‡•Ä‡§® ‡§î‡§∞ ‡§µ‡•ç‡§Ø‡§ï‡•ç‡§§‡§ø‡§ó‡§§ ‡§∏‡•ç‡§µ‡§ö‡•ç‡§õ‡§§‡§æ ‡§ï‡•ã ‡§¨‡§®‡§æ‡§è ‡§∞‡§ñ‡§®‡§æ ‡§π‡§Æ‡§æ‡§∞‡•á ‡§∏‡•ç‡§µ‡§æ‡§∏‡•ç‡§•‡•ç‡§Ø ‡§ï‡•á ‡§≤‡§ø‡§è ‡§Ö‡§§‡•ç‡§Ø‡§Ç‡§§ ‡§Ü‡§µ‡§∂‡•ç‡§Ø‡§ï ‡§π‡•à, ‡§î‡§∞ ‡§á‡§∏‡§ï‡§æ ‡§∏‡§¨‡§∏‡•á ‡§™‡§π‡§≤‡§æ ‡§Æ‡§π‡§§‡•ç‡§µ‡§™‡•Ç‡§∞‡•ç‡§£ ‡§ï‡§¶‡§Æ ‡§Ü‡§ú ‡§π‡§Æ ‡§á‡§∏ ‡§µ‡•Ä‡§°‡§ø‡§Ø‡•ã ‡§Æ‡•á‡§Ç ‡§µ‡§ø‡§∏‡•ç‡§§‡§æ‡§∞ ‡§∏‡•á ‡§¶‡•á‡§ñ‡•á‡§Ç‡§ó‡•á‡•§ ‡§™‡•ç‡§∞‡§§‡§ø‡§¶‡§ø‡§® ‡§∏‡•Å‡§¨‡§π ‡§ú‡§¨ ‡§Ü‡§™ ‡§∏‡•ã‡§ï‡§∞ ‡§â‡§†‡§§‡•á ‡§π‡•à‡§Ç, ‡§§‡•ã ‡§∏‡§¨‡§∏‡•á ‡§™‡§π‡§≤‡•á ‡§Ö‡§™‡§®‡•á ‡§¶‡§æ‡§Ç‡§§‡•ã‡§Ç ‡§ï‡•ã ‡§¨‡•ç‡§∞‡§∂ ‡§∏‡•á ‡§Ö‡§ö‡•ç‡§õ‡•Ä ‡§§‡§∞‡§π ‡§∏‡§æ‡§´ ‡§ï‡§∞‡§®‡§æ ‡§∏‡•Å‡§®‡§ø‡§∂‡•ç‡§ö‡§ø‡§§ ‡§ï‡§∞‡•á‡§Ç‡•§ ‡§á‡§∏‡§ï‡•á ‡§∏‡§æ‡§• ‡§π‡•Ä ‡§Ö‡§™‡§®‡•Ä ‡§ú‡•Ä‡§≠ ‡§ï‡•Ä ‡§∏‡§´‡§æ‡§à ‡§ï‡§∞‡§®‡§æ ‡§≠‡•Ä ‡§® ‡§≠‡•Ç‡§≤‡•á‡§Ç, ‡§ï‡•ç‡§Ø‡•ã‡§Ç‡§ï‡§ø ‡§Ø‡§π ‡§Æ‡•Å‡§ñ ‡§ï‡•Ä ‡§∏‡•ç‡§µ‡§ö‡•ç‡§õ‡§§‡§æ ‡§ï‡•á ‡§≤‡§ø‡§è ‡§¨‡§π‡•Å‡§§ ‡•õ‡§∞‡•Ç‡§∞‡•Ä ‡§π‡•à‡•§"
    return hindi_text

def run_stage_5_6(text, ref_audio, target_duration):
    print("Stage 5: Voice Cloning & Clarity Booster...")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
    raw_path = os.path.join(TEMP_DIR, "raw_dub.wav")
    synced_path = os.path.join(TEMP_DIR, "synced_dub.wav")
    
    tts.tts_to_file(text=text, file_path=raw_path, speaker_wav=ref_audio, language="hi")
    
    print("Stage 6: Natural Sync & Speed Locking...")
    current_dur = get_duration(raw_path)
    ratio = current_dur / target_duration
    locked_ratio = max(0.85, min(1.15, ratio))
    subprocess.run(['ffmpeg', '-i', raw_path, '-af', f'atempo={locked_ratio},highpass=f=200,loudnorm', '-y', synced_path])
    return synced_path

## üé¨ Step 3: Execute Premium Pipeline
This runs the full 7-stage process and generates the final high-fidelity video.

In [None]:
INPUT_VIDEO = os.path.join(ROOT, "supernan_training.mp4")
START_TIME = "00:00:15"
END_TIME = "00:00:30"

if not os.path.exists(INPUT_VIDEO):
    print(f"‚ùå ERROR: {INPUT_VIDEO} not found. Please upload it to your project root folder.")
else:
    # 1. Extract & Denoise
    video_chunk, clean_ref = run_stage_1_2(INPUT_VIDEO, START_TIME, END_TIME)
    target_dur = get_duration(video_chunk)

    # 2. Transcribe & Translate
    hindi_text = run_stage_3_4(clean_ref)

    # 3. Clone & Sync
    final_audio = run_stage_5_6(hindi_text, clean_ref, target_dur)

    # 4. Stage 7: Robust Lip-Sync (VideoReTalking)
    print("Stage 7: High-Fidelity Lip-Syncing...")
    output_video = os.path.join(OUTPUT_DIR, "supernan_final_premium.mp4")
    vrt_script = os.path.join(ROOT, "VideoReTalking", "inference.py")

    !python {vrt_script} \
        --face {video_chunk} \
        --audio {final_audio} \
        --outfile {output_video}

    print(f"\n‚ú® SUCCESS! Your premium dubbed video is ready in: {OUTPUT_DIR}")

## üì• Step 4: Download Result
Run this cell to download the final dubbed video to your computer (Colab only).

In [None]:
try:
    from google.colab import files
    final_vid = os.path.join(OUTPUT_DIR, "supernan_final_premium.mp4")
    if os.path.exists(final_vid):
        files.download(final_vid)
    else:
        print("‚ùå Final video not found. Run Step 3 first.")
except ImportError:
    print(f"Local run detected. Find final file at: {os.path.abspath(os.path.join(OUTPUT_DIR, 'supernan_final_premium.mp4'))}")