<a href="https://www.kaggle.com/code/lakhindarpal/viralreel-ai-automated-short-form-content?scriptVersionId=292448811" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
!pip install -qU google-genai
!pip install -q git+https://github.com/m-bain/whisperx.git
!pip install -qU gradio opencv-python gdown yt-dlp 
!pip install -q mediapipe==0.10.14

!wget -O detector.tflite -q https://storage.googleapis.com/mediapipe-models/face_detector/blaze_face_short_range/float16/1/blaze_face_short_range.tflite

!apt-get update -y
!apt-get install -y ffmpeg fonts-roboto

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Hit:1 https://cli.github.com/packages stable InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease     
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease                     
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease                         
Hit:6 http://security.ubuntu.com/ubuntu jammy-security InRelease               
Hit:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease                 
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ub

## IMPORTS


In [2]:
import torch
import numpy as np
import os, cv2, json, subprocess, re, traceback
from concurrent.futures import ThreadPoolExecutor

import gradio as gr
from PIL import Image, ImageDraw, ImageFont
from kaggle_secrets import UserSecretsClient
import whisperx
from google import genai
from google.genai import types
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

2026-01-17 22:14:42.571833: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1768688082.592973    1911 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1768688082.599529    1911 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1768688082.616297    1911 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768688082.616325    1911 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1768688082.616327    1911 computation_placer.cc:177] computation placer alr

## CONFIG


In [3]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
BATCH_SIZE = 16
MAX_DURATION = 60

## AUTH


In [4]:
try:
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("GOOGLE_API_KEY")
    client = genai.Client(api_key=api_key)
except Exception as e:
    print(f"‚ùå Secret Error: {e}")

## CORE ENGINE


In [5]:
 class ContentBrain:
    def __init__(self):
        print(f"üöÄ Loading WhisperX on {DEVICE}...")
        self.model = whisperx.load_model(
            "large-v3-turbo",
            DEVICE,
            compute_type="float16" if DEVICE == "cuda" else "int8",
            vad_method="silero",
        )
        self.align_model, self.metadata = whisperx.load_align_model(
            language_code="en", device=DEVICE
        )

    def transcribe(self, audio_path):
        result = self.model.transcribe(audio_path, batch_size=BATCH_SIZE)
        aligned = whisperx.align(
            result["segments"],
            self.align_model,
            self.metadata,
            audio_path,
            DEVICE,
            return_char_alignments=False,
        )
        return aligned

    def analyze(self, text):
        print("üß† Thinking (Gemini 2.5 Flash)...")
        prompt = f"""
        Act as a viral content strategist. Analyze this transcript.
        Identify exactly 3 segments (30-50s duration) that work as standalone viral shorts.
        Return JSON ONLY: [{{"start_text": "unique start phrase", "end_text": "unique end phrase", "title": "Engaging Headline"}}]
        Transcript: {text[:150000]}...
        """
        try:
            res = client.models.generate_content(
                model="gemini-2.5-flash",
                contents=prompt,
                config=types.GenerateContentConfig(
                    temperature=0.7, response_mime_type="application/json"
                ),
            )
            return json.loads(res.text)
        except:
            return []

In [6]:
class SmartCam:
    def __init__(self):
        base_opts = python.BaseOptions(model_asset_path="detector.tflite")
        opts = vision.FaceDetectorOptions(
            base_options=base_opts, min_detection_confidence=0.5
        )
        self.detector = vision.FaceDetector.create_from_options(opts)

    def get_face_center(self, frame):
        mp_img = mp.Image(
            image_format=mp.ImageFormat.SRGB,
            data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB),
        )
        res = self.detector.detect(mp_img)
        if res.detections:
            largest = max(res.detections, key=lambda d: d.bounding_box.width)
            return (
                largest.bounding_box.origin_x + (largest.bounding_box.width / 2)
            ) / frame.shape[1]
        return 0.5

In [7]:
class Renderer:
    def __init__(self):
        self.font_path = "/usr/share/fonts/truetype/roboto/Roboto-Black.ttf"
        try:
            self.font_size = 75
            self.font = ImageFont.truetype(self.font_path, self.font_size)
            self.small_font = ImageFont.truetype(self.font_path, 40)
        except:
            self.font = ImageFont.load_default()
            self.small_font = ImageFont.load_default()

    def get_text_width(self, words, draw):
        total = 0
        for wd in words:
            bbox = draw.textbbox((0, 0), wd["word"], font=self.font)
            total += (bbox[2] - bbox[0]) + 20
        return total - 20

    def draw_wrapped_title(self, draw, title, w):
        words = title.upper().split()
        lines = []
        curr = []
        for word in words:
            test = curr + [word]
            bbox = draw.textbbox((0, 0), " ".join(test), font=self.small_font)
            if (bbox[2] - bbox[0]) < (w - 100):
                curr = test
            else:
                lines.append(" ".join(curr))
                curr = [word]
        lines.append(" ".join(curr))

        y = 80
        for line in lines:
            bbox = draw.textbbox((0, 0), line, font=self.small_font)
            x = (w - (bbox[2] - bbox[0])) // 2
            draw.text(
                (x, y),
                line,
                font=self.small_font,
                fill="#00ffff",
                stroke_width=3,
                stroke_fill="black",
            )
            y += 50

    def draw_karaoke(self, frame, words, time, title):
        draw = ImageDraw.Draw(frame)
        w, h = frame.size
        self.draw_wrapped_title(draw, title, w)

        active_idx = -1
        for i, word in enumerate(words):
            if word["start"] <= time <= word["end"] + 0.2:
                active_idx = i
                break

        if active_idx == -1:
            return frame

        chunk_size = 3
        start = (active_idx // chunk_size) * chunk_size
        end = min(len(words), start + chunk_size)
        visible = words[start:end]

        if self.get_text_width(visible, draw) > (w - 80):
            visible = [words[active_idx]]

        y = h - 450
        x = (w - self.get_text_width(visible, draw)) // 2

        for wd in visible:
            color = "#FFE135" if wd == words[active_idx] else "white"
            draw.text(
                (x, y),
                wd["word"],
                font=self.font,
                fill=color,
                stroke_width=5,
                stroke_fill="black",
            )
            bbox = draw.textbbox((0, 0), wd["word"], font=self.font)
            x += (bbox[2] - bbox[0]) + 20

        return frame

## WORKER FUNCTIONS


In [8]:
brain = ContentBrain()
cam = SmartCam()
renderer = Renderer()

üöÄ Loading WhisperX on cuda...


  torchaudio.list_audio_backends()
  available_backends = torchaudio.list_audio_backends()
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _speechbrain_save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _speechbrain_load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for load
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint save hook for _save
DEBUG:speechbrain.utils.checkpoints:Registered checkpoint load hook for _recover


2026-01-17 22:14:54 - whisperx.asr - INFO - No language specified, language will be detected for each audio file (increases inference time)
2026-01-17 22:14:54 - whisperx.vads.silero - INFO - Performing voice activity detection using Silero...


Using cache found in /root/.cache/torch/hub/snakers4_silero-vad_master
I0000 00:00:1768688095.609700    1911 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:84) egl_initializedUnable to initialize EGL
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1768688095.613740    2754 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.


In [9]:
def clean_filename(title):
    clean = re.sub(r"[^\w\s-]", "", title).strip().lower()
    return re.sub(r"[-\s]+", "_", clean)

In [10]:
def render_worker(args):
    i, hook, vid_path, all_words = args
    print(f"   ‚ñ∂Ô∏è Processing: {hook['title']}")

    start_t, end_t = 0, 0
    s_txt, e_txt = hook["start_text"].strip(), hook["end_text"].strip()

    for w in all_words:
        if s_txt.startswith(w["word"].strip()):
            start_t = w["start"]
            break

    if start_t > 0:
        for w in all_words:
            if (
                w["start"] > start_t
                and w["end"] < (start_t + MAX_DURATION)
                and e_txt.endswith(w["word"].strip())
            ):
                end_t = w["end"]

    if end_t <= start_t:
        start_t = i * 60 + 60
        end_t = start_t + 50
    if (end_t - start_t) > MAX_DURATION:
        end_t = start_t + MAX_DURATION

    cap = cv2.VideoCapture(vid_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    start_f, end_f = int(start_t * fps), int(end_t * fps)

    safe_title = clean_filename(hook["title"])
    out = f"{safe_title}.mp4"
    tmp = f"temp_{i}.mp4"

    writer = cv2.VideoWriter(tmp, cv2.VideoWriter_fourcc(*"mp4v"), fps, (720, 1280))
    cap.set(cv2.CAP_PROP_POS_FRAMES, start_f)

    curr_f = start_f
    curr_center = 0.5

    while curr_f < end_f:
        ret, frame = cap.read()
        if not ret:
            break

        tgt_center = cam.get_face_center(frame)
        curr_center = curr_center * 0.9 + tgt_center * 0.1

        h, w, _ = frame.shape
        tgt_w = int(h * (9 / 16))
        if tgt_w % 2 != 0:
            tgt_w -= 1

        cx = int(curr_center * w)
        x1 = max(0, min(cx - (tgt_w // 2), w - tgt_w))

        final = cv2.resize(frame[0:h, x1 : x1 + tgt_w], (720, 1280))
        img = Image.fromarray(cv2.cvtColor(final, cv2.COLOR_BGR2RGB))
        img = renderer.draw_karaoke(img, all_words, curr_f / fps, hook["title"])
        writer.write(cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR))
        curr_f += 1

    cap.release()
    writer.release()

    subprocess.run(
        [
            "ffmpeg",
            "-y",
            "-i",
            tmp,
            "-ss",
            str(start_t),
            "-to",
            str(end_t),
            "-i",
            vid_path,
            "-c:v",
            "libx264",
            "-pix_fmt",
            "yuv420p",
            "-preset",
            "ultrafast",
            "-c:a",
            "aac",
            "-map",
            "0:v:0",
            "-map",
            "1:a:0",
            "-shortest",
            out,
        ],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )

    return out

In [11]:
def run_pipeline(vid_file, url, progress=gr.Progress()):
    try:
        # 1. Input
        path = "input.mp4"
        if vid_file:
            path = vid_file
        elif url:
            progress(0.1, desc="Loading Video...")
            cmd = [
                "yt-dlp",
                "--add-header",
                "User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
                "-f",
                "bestvideo[ext=mp4][vcodec^=avc]+bestaudio[ext=m4a]/best[ext=mp4]/best",
                "--force-overwrites",
                "-o",
                path,
                url,
            ]
            result = subprocess.run(cmd, capture_output=True, text=True)
            if result.returncode != 0:
                return [
                    gr.update(value=None),
                    gr.update(value=None),
                    gr.update(value=None),
                    f"‚ùå Download Failed:\n{result.stderr[-200:]}",
                ]
        else:
            return [
                gr.update(value=None),
                gr.update(value=None),
                gr.update(value=None),
                "‚ùå No input provided",
            ]

        if not os.path.exists(path):
            return [
                gr.update(value=None),
                gr.update(value=None),
                gr.update(value=None),
                "‚ùå Download failed (File not found)",
            ]

        # 2. Audio
        progress(0.2, desc="Analysing Audio...")
        subprocess.run(
            [
                "ffmpeg",
                "-y",
                "-i",
                path,
                "-vn",
                "-acodec",
                "pcm_s16le",
                "-ar",
                "16000",
                "-ac",
                "1",
                "temp.wav",
            ],
            stdout=subprocess.DEVNULL,
        )

        try:
            aligned = brain.transcribe("temp.wav")
        except Exception as e:
            return [
                gr.update(value=None),
                gr.update(value=None),
                gr.update(value=None),
                f"‚ùå Whisper Error: {e}",
            ]

        text = " ".join([s["text"] for s in aligned["segments"]])
        words = []
        for s in aligned["segments"]:
            words.extend(s["words"])

        # 3. AI
        progress(0.4, desc="Selecting Hooks...")
        hooks = brain.analyze(text)
        if not hooks:
            return [
                gr.update(value=None),
                gr.update(value=None),
                gr.update(value=None),
                "‚ùå No viral hooks found by AI",
            ]

        # 4. Render
        progress(0.6, desc="Rendering...")
        with ThreadPoolExecutor(max_workers=3) as exe:
            files = list(
                exe.map(
                    render_worker, [(i, h, path, words) for i, h in enumerate(hooks)]
                )
            )

        # 5. Dynamic Return (Update Labels)
        outputs = []
        for i, f in enumerate(files):
            outputs.append(gr.update(value=f, label=hooks[i]["title"]))

        while len(outputs) < 3:
            outputs.append(gr.update(value=None, label="No Reel Generated"))

        outputs.append("‚úÖ Processing Complete!")
        return outputs[0], outputs[1], outputs[2], outputs[3]

    except Exception as e:
        return [
            gr.update(value=None),
            gr.update(value=None),
            gr.update(value=None),
            f"‚ùå Critical Error:\n{str(e)}",
        ]

## GRADIO UI


In [None]:
with gr.Blocks(title="ViralReel AI") as app:
    gr.Markdown("# üöÄ ViralReel AI")
    gr.Markdown("Automated Short-Form Content Generator ¬∑ Powered by Gemini & WhisperX")

    with gr.Column():
        with gr.Tabs():
            with gr.TabItem("Upload Video"):
                v_in = gr.Video(label="Source File")
            with gr.TabItem("Paste URL"):
                l_in = gr.Textbox(
                    label="YouTube / Drive Link", placeholder="https://..."
                )

        btn = gr.Button("Generate Reels", variant="primary")
        status = gr.Textbox(label="System Logs", interactive=False)

    with gr.Row():
        o1 = gr.Video(label="Pending Reel 1...")
        o2 = gr.Video(label="Pending Reel 2...")
        o3 = gr.Video(label="Pending Reel 3...")

    btn.click(run_pipeline, inputs=[v_in, l_in], outputs=[o1, o2, o3, status])

app.queue().launch(share=True, debug=True)

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://60395b6299a5e6a2eb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

2026-01-17 22:16:27 - whisperx.asr - INFO - Detected language: en (1.00) in first 30s of audio
üß† Thinking (Gemini 2.5 Flash)...
   ‚ñ∂Ô∏è Processing: Linus Torvalds: My Wife's Fun vs. Mine (and Skiing Fail)
   ‚ñ∂Ô∏è Processing: The Staggering Pace of Linux Development: 10,000 Lines a Day!
   ‚ñ∂Ô∏è Processing: Linus Torvalds: 'Innovation is Bullshit. Get the Work Done.'


