# üéôÔ∏è Vani AI - Hinglish Podcast Generator

**Transform any Wikipedia article into a natural-sounding Hinglish podcast conversation.**

This notebook implements a complete pipeline that:
1. Fetches and cleans Wikipedia article content
2. Generates a conversational Hinglish script using LLM (Gemini/OpenAI)
3. Synthesizes multi-speaker audio using ElevenLabs TTS
4. Produces a final MP3 podcast file

---

## Table of Contents
1. [Environment Setup](#1-environment-setup)
2. [Wikipedia Content Extraction](#2-wikipedia-content-extraction)
3. [Hinglish Script Generation](#3-hinglish-script-generation)
4. [Text-to-Speech Synthesis](#4-text-to-speech-synthesis)
5. [Audio Processing & Assembly](#5-audio-processing--assembly)
6. [Output & Playback](#6-output--playback)
7. [Prompting Strategy Explanation](#7-prompting-strategy-explanation)

---
## 1. Environment Setup

### 1.1 Install Dependencies

In [None]:
# Install required packages (including groq for fallback LLM)
!pip install -q requests beautifulsoup4 wikipedia-api pydub elevenlabs google-generativeai openai groq

# Install ffmpeg for audio processing (required by pydub)
!apt-get install -qq ffmpeg

print("‚úÖ All dependencies installed successfully!")

### 1.2 Import Libraries

In [None]:
import os
import re
import json
import time
from typing import List, Dict, Optional, Literal
from dataclasses import dataclass
from enum import Enum
from getpass import getpass

# Web scraping
import requests
from bs4 import BeautifulSoup
import wikipediaapi

# LLM providers
import google.generativeai as genai
from openai import OpenAI

# TTS
from elevenlabs import ElevenLabs

# Audio processing
from pydub import AudioSegment

# Colab display
from IPython.display import Audio, display, Markdown, HTML

print("‚úÖ All libraries imported successfully!")

### 1.3 Configure API Keys

Enter your API keys securely. You'll need:
- **Gemini API Key** (from [Google AI Studio](https://aistudio.google.com/app/apikey)) - for script generation
- **ElevenLabs API Key** (from [ElevenLabs](https://elevenlabs.io/)) - for TTS
- **OpenAI API Key** (optional, from [OpenAI](https://platform.openai.com/)) - alternative LLM

In [None]:
# API Key Configuration
# You can either set these as environment variables or enter them when prompted

def get_api_key(name: str, env_var: str) -> str:
    """Get API key from environment or prompt user."""
    key = os.environ.get(env_var)
    if not key:
        key = getpass(f"Enter your {name}: ")
    return key

# Get API keys
GEMINI_API_KEY = get_api_key("Gemini API Key", "GEMINI_API_KEY")
ELEVENLABS_API_KEY = get_api_key("ElevenLabs API Key", "ELEVENLABS_API_KEY")

# Optional: Groq API Key for fallback (press Enter to skip)
GROQ_API_KEY = os.environ.get("GROQ_API_KEY", "")
if not GROQ_API_KEY:
    user_input = getpass("Enter your Groq API Key for fallback (press Enter to skip): ")
    GROQ_API_KEY = user_input if user_input else None

# Optional: OpenAI API Key (press Enter to skip)
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "")
if not OPENAI_API_KEY:
    user_input = getpass("Enter your OpenAI API Key (press Enter to skip): ")
    OPENAI_API_KEY = user_input if user_input else None

# Validate required keys
assert GEMINI_API_KEY, "‚ùå Gemini API Key is required!"
assert ELEVENLABS_API_KEY, "‚ùå ElevenLabs API Key is required!"

print("‚úÖ API keys configured!")
print(f"   - Gemini (primary): {'‚úì' if GEMINI_API_KEY else '‚úó'}")
print(f"   - Groq (fallback): {'‚úì' if GROQ_API_KEY else '‚úó (skipped)'}")
print(f"   - ElevenLabs: {'‚úì' if ELEVENLABS_API_KEY else '‚úó'}")
print(f"   - OpenAI: {'‚úì (optional)' if OPENAI_API_KEY else '‚úó (skipped)'}")

### 1.4 Initialize API Clients

In [None]:
# Primary: Gemini 2.5 Flash (best for natural, varied conversations)
genai.configure(api_key=GEMINI_API_KEY)
gemini_model = genai.GenerativeModel('gemini-2.5-flash')

# Fallback: Groq (LLaMA 3.3 70B) - used if Gemini hits rate limits
from groq import Groq
groq_client = Groq(api_key=GROQ_API_KEY) if GROQ_API_KEY else None

# Initialize ElevenLabs for TTS (primary and only TTS provider)
elevenlabs_client = ElevenLabs(api_key=ELEVENLABS_API_KEY)

# Initialize OpenAI (if available, for script generation only)
openai_client = OpenAI(api_key=OPENAI_API_KEY) if OPENAI_API_KEY else None

print("‚úÖ API clients initialized!")
print(f"   - Primary LLM: Gemini 2.5 Flash")
print(f"   - Fallback LLM: {'Groq (LLaMA 3.3 70B)' if groq_client else 'None'}")
print(f"   - TTS Provider: ElevenLabs (eleven_multilingual_v2)")

### 1.5 Data Models

In [None]:
@dataclass
class ScriptLine:
    """A single line of dialogue in the script."""
    speaker: Literal["Rahul", "Anjali"]
    text: str

@dataclass
class PodcastScript:
    """Complete podcast script with title and dialogue."""
    title: str
    script: List[ScriptLine]
    source_url: str

class LLMProvider(Enum):
    """Supported LLM providers."""
    GEMINI = "gemini"    # Primary: Gemini 2.0 Flash (best variety)
    GROQ = "groq"        # Fallback: LLaMA 3.3 70B via Groq
    OPENAI = "openai"    # Alternative: GPT-4

print("‚úÖ Data models defined!")

---
## 2. Wikipedia Content Extraction

In [None]:
def extract_article_title(url: str) -> str:
    """Extract article title from Wikipedia URL."""
    patterns = [
        r'/wiki/([^#?]+)',  # Standard format
        r'title=([^&]+)',   # Old format with query params
    ]
    for pattern in patterns:
        match = re.search(pattern, url)
        if match:
            return match.group(1)
    raise ValueError(f"Could not extract article title from URL: {url}")


def fetch_wikipedia_content(url: str) -> Dict[str, str]:
    """Fetch and clean Wikipedia article content."""
    article_title = extract_article_title(url)
    
    wiki = wikipediaapi.Wikipedia(
        user_agent='VaniAI/1.0 (Hinglish Podcast Generator)',
        language='en'
    )
    
    page = wiki.page(article_title)
    
    if not page.exists():
        raise ValueError(f"Wikipedia article not found: {article_title}")
    
    return {
        'title': page.title,
        'content': page.text,
        'summary': page.summary
    }


def clean_wikipedia_text(text: str, max_words: int = 3000) -> str:
    """Clean and truncate Wikipedia text for LLM processing."""
    # Remove reference markers [1], [2], etc.
    text = re.sub(r'\[\d+\]', '', text)
    
    # Remove unwanted sections
    sections_to_remove = [
        r'\n== See also ==.*',
        r'\n== References ==.*',
        r'\n== External links ==.*',
        r'\n== Notes ==.*',
        r'\n== Further reading ==.*',
    ]
    for pattern in sections_to_remove:
        text = re.sub(pattern, '', text, flags=re.DOTALL)
    
    # Remove multiple newlines
    text = re.sub(r'\n{3,}', '\n\n', text)
    
    # Truncate to max words
    words = text.split()
    if len(words) > max_words:
        text = ' '.join(words[:max_words]) + '...'
    
    return text.strip()


print("‚úÖ Wikipedia extraction functions defined!")

---
## 3. Hinglish Script Generation

In [None]:
# The Hinglish Script Generation Prompt
# Enhanced with few-shot examples from training scripts

HINGLISH_SCRIPT_PROMPT = """
You are creating a natural 2-minute Hinglish podcast conversation about the following content.

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
SOURCE CONTENT
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
{article_content}

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
SPEAKERS
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
ANJALI = Lead anchor / Expert
‚îú‚îÄ Confident, articulate, well-prepared
‚îú‚îÄ Explains topics clearly with enthusiasm
‚îú‚îÄ Guides the conversation smoothly
‚îî‚îÄ Shares interesting facts and insights

RAHUL = Co-host / Sidekick  
‚îú‚îÄ Energetic, curious, adds humor
‚îú‚îÄ Asks smart follow-up questions
‚îú‚îÄ Has his own perspectives (not just agreeing)
‚îî‚îÄ Keeps energy up without being annoying

Both are PROFESSIONALS - smooth, polished, like Radio Mirchi RJs.

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ö†Ô∏è ANTI-PATTERNS - NEVER DO THESE
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ùå NEVER start with "Dekho, aaj kal..." or "Arey [name], tune dekha/suna?"
‚ùå NEVER use "Haan yaar" or "Bilkul" as the automatic second line
‚ùå NEVER add "yaar" or "na?" to every single line
‚ùå NEVER repeat the same reaction pattern twice
‚ùå NEVER use generic openings - make it SPECIFIC to this content
‚ùå NEVER have Rahul just agree - he should add his own perspective
‚ùå NEVER end with "subscribe karna" or "phir milenge"

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
OPENING TEMPLATES BY TOPIC TYPE (pick ONE that matches)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

TECH/AI/SCIENCE:
Rahul: "Yaar Anjali, honestly bata, yeh [topic] wala scene thoda scary/confusing nahi ho raha? Matlab, [specific observation]..."

CELEBRITY/BIOGRAPHY:
Rahul: "Yaar Anjali, I was just scrolling through Wikipedia na, and honestly, [name] ki life story is just... filmy. Matlab, literal [specific quality] wali feel aati hai."

SPORTS TEAM:
Rahul: "Arey Anjali, jab bhi [league] ka topic uthta hai na, sabse pehle dimaag mein ek hi naam aata hai‚Äî[team]! Matlab, '[slogan]' is not just a slogan, it's a vibe, hai na?"

SPORTS PLAYER:
Rahul: "Yaar Anjali, maine kal raat phir se [player] ke old highlights dekhe. I swear, yeh banda human nahi hai, alien hai alien!"

POLITICS/LEADERS:
Rahul: "Oye Anjali, ek baat bata. Aajkal jidhar dekho, news mein bas [name] hi chhay hue hain. Matlab, whether it's [context], banda har jagah trending hai, hai na?"

FINANCE/CRYPTO/BUSINESS:
Rahul: "Arre Anjali, aajkal jidhar dekho bas [topic] chal raha hai. Office mein, gym mein... what is the actual scene yaar? Matlab, is it really [question] ya bas hawa hai?"

CURRENT EVENTS/WAR/NEWS:
Rahul: "Arre Anjali, sun na, I was scrolling through Twitter... matlab X... and again, wahi [topic] ki news. It feels like [observation], hai na?"

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
NATURAL REACTIONS (use variety, not repetition)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

SURPRISE: "Baap re!", "Whoa, that I didn't know!", "Wait, seriously?", "Sahi mein?"
AGREEMENT: "Hundred percent!", "Exactly!", "Bilkul sahi kaha"
UNDERSTANDING: "Oh achcha...", "Hmm, interesting", "Achcha, toh matlab..."
HUMOR: "Haha, relax!", "(laughs)", "Umm, not literally baba!"
EMOTION: "Man, that's [emotion]", "I literally had tears", "Uff!"
CURIOSITY: "But wait, [question]?", "Aur suna hai...", "Mujhe toh lagta hai..."

DO NOT use the same reaction twice in a script.

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
CONVERSATIONAL ELEMENTS (must include)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

‚úì Personal anecdotes: "Maine kal dekha...", "I was just reading..."
‚úì Genuine interruptions: "Wait wait, before that‚Äî", "Arre haan!"
‚úì Callbacks/inside jokes: "Chalo coffee peete hain?", "Popcorn ready rakh"
‚úì Real emotions: "I literally had tears", "Goosebumps aa gaye"
‚úì Specific facts from the article (dates, numbers, names)
‚úì Natural endings: reflection, open question, or casual remark

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
EXAMPLE 1: TECH TOPIC (AI)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

{{"speaker": "Rahul", "text": "Yaar Anjali, honestly bata, yeh AI wala scene thoda scary nahi ho raha? Matlab, I opened Twitter today, and boom‚Äîek aur naya tool jo sab kuch automate kar dega. Are we doomed or what?"}}
{{"speaker": "Anjali", "text": "Haha, relax Rahul! Saans le pehle. I know hype bohot zyada hai, but if you look at the actual history‚ÄîAI koi nayi cheez nahi hai. Its roots go back to 1956."}}
{{"speaker": "Rahul", "text": "Wait, 1956? Serious? Mujhe laga yeh abhi 2-3 saal pehle start hua hai with ChatGPT and all that."}}
{{"speaker": "Anjali", "text": "Bilkul! Dartmouth College mein ek workshop hua tha jahan yeh term coin kiya gaya tha. Tabse lekar ab tak, we've gone through 'AI winters' where funding dried up, and now... boom, Deep Learning era."}}
{{"speaker": "Rahul", "text": "Hmm, achcha. So basically, it's not magic. But abhi jo ho raha hai, woh kya hai exactly?"}}
{{"speaker": "Anjali", "text": "See, earlier approaches were rule-based. Aajkal hum Neural Networks use karte hain inspired by the human brain. That's the game changer, na?"}}
{{"speaker": "Rahul", "text": "Sahi hai. But tell me one thing, jo movies mein dikhate hain... Skynet types. Are robots going to take over?"}}
{{"speaker": "Anjali", "text": "Umm, not really. Hum abhi 'Narrow AI' mein hain‚Äîmachines that are super good at one specific task. 'General AI' is still hypothetical. Toh chill kar, tera toaster tujhe attack nahi karega."}}
{{"speaker": "Rahul", "text": "Haha, thank god! Quite fascinating though, history se lekar future tak sab connected hai."}}
{{"speaker": "Anjali", "text": "Exactly. It's a tool, Rahul. Use it well, and it's a superpower. Darr mat, bas update reh!"}}

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
EXAMPLE 2: SPORTS TEAM (IPL)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

{{"speaker": "Rahul", "text": "Arey Anjali, jab bhi IPL ka topic uthta hai na, sabse pehle dimaag mein ek hi naam aata hai‚ÄîMumbai Indians! Matlab, 'Duniya Hila Denge' is not just a slogan, it's a vibe, hai na?"}}
{{"speaker": "Anjali", "text": "Haha, bilkul Rahul! And honestly, facts bhi yahi bolte hain. Paanch titles jeetna‚Äî2013, 2015, 2017, 2019, aur 2020 mein‚Äîkoi mazaak thodi hai yaar."}}
{{"speaker": "Rahul", "text": "Sahi mein! Aur socho, shuru mein toh struggle tha. But jab Rohit Sharma captain bane... uff! Woh 'Hitman' era toh legendary tha."}}
{{"speaker": "Anjali", "text": "Hundred percent. Rohit ki captaincy was crucial, but credit Reliance Industries ko bhi jaata hai. Unki brand value $87 million ke aas-paas estimate ki gayi thi!"}}
{{"speaker": "Rahul", "text": "Baap re! But talent scouting bhi solid hai inki. Jasprit Bumrah aur Hardik Pandya‚ÄîMI ne hi toh groom kiye hain na?"}}
{{"speaker": "Anjali", "text": "Oh, totally! Aur sirf IPL nahi, Champions League T20 bhi do baar jeeta hai. Global T20 circuit mein bhi dominance dikhaya hai."}}
{{"speaker": "Rahul", "text": "Arre haan, MI vs CSK toh emotion hai bhai! Jeet kisi ki bhi ho, entertainment full on hota hai."}}
{{"speaker": "Anjali", "text": "Exactly! Chalo, let's see iss baar Paltan kya naya karti hai. Wankhede mein jab 'Mumbai Mumbai' chillate hain... goosebumps!"}}

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
OUTPUT FORMAT
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

Return ONLY valid JSON (no markdown, no explanation):
{{
    "title": "Catchy Hinglish title specific to this content",
    "script": [
        {{"speaker": "Rahul", "text": "..."}},
        {{"speaker": "Anjali", "text": "..."}},
        ...
    ]
}}

‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
QUALITY CHECKLIST (verify before responding)
‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
‚ñ° Opening matches the topic type from templates above
‚ñ° Uses SPECIFIC facts from the article (dates, numbers, names)
‚ñ° No two consecutive reactions are the same
‚ñ° Includes at least one personal anecdote or genuine emotion
‚ñ° Natural ending (not "goodbye" or "subscribe")
‚ñ° 15-20 exchanges total (~2 minutes at 150 wpm)
‚ñ° Each line: 1-3 sentences, speakable in 5-15 seconds
‚ñ° "yaar" appears MAX 2-3 times total
"""

print("‚úÖ Script generation prompt defined!")

In [None]:
def generate_script_gemini(article_content: str) -> Dict:
    """Primary: Generate Hinglish podcast script using Gemini 2.5 Flash."""
    prompt = HINGLISH_SCRIPT_PROMPT.format(article_content=article_content)
    
    generation_config = genai.GenerationConfig(
        response_mime_type="application/json",
        temperature=0.95,  # Higher for more variety
        top_p=0.95,
        max_output_tokens=4096
    )
    
    response = gemini_model.generate_content(prompt, generation_config=generation_config)
    
    try:
        return json.loads(response.text)
    except json.JSONDecodeError as e:
        print(f"‚ö†Ô∏è JSON parsing error: {e}")
        print(f"Raw response: {response.text[:500]}...")
        raise




def generate_script_groq(article_content: str) -> Dict:
    """Fallback: Generate Hinglish podcast script using Groq (LLaMA 3.3 70B)."""
    if not groq_client:
        raise ValueError("Groq client not initialized. Please provide GROQ_API_KEY.")
    
    prompt = HINGLISH_SCRIPT_PROMPT.format(article_content=article_content)
    
    response = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": "You are an expert Hinglish podcast scriptwriter. Always respond with valid JSON only."},
            {"role": "user", "content": prompt}
        ],
        response_format={"type": "json_object"},
        temperature=0.95,
        max_tokens=4096
    )
    
    return json.loads(response.choices[0].message.content)


def generate_script_openai(article_content: str) -> Dict:
    """Alternative: Generate Hinglish podcast script using OpenAI GPT-4."""
    if not openai_client:
        raise ValueError("OpenAI client not initialized. Please provide API key.")
    
    prompt = HINGLISH_SCRIPT_PROMPT.format(article_content=article_content)
    
    response = openai_client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "system", "content": "You are an expert Hinglish podcast scriptwriter. Always respond with valid JSON only."},
            {"role": "user", "content": prompt}
        ],
        response_format={"type": "json_object"},
        temperature=0.95,  # Higher for more variety
        max_tokens=4096
    )
    
    return json.loads(response.choices[0].message.content)


def generate_script(article_content: str, provider: LLMProvider = LLMProvider.GEMINI) -> Dict:
    """Generate Hinglish podcast script with automatic fallback to Groq."""
    print(f"ü§ñ Generating script using {provider.value}...")
    
    try:
        if provider == LLMProvider.GEMINI:
            return generate_script_gemini(article_content)
        elif provider == LLMProvider.GROQ:
            return generate_script_groq(article_content)
        elif provider == LLMProvider.OPENAI:
            return generate_script_openai(article_content)
        else:
            raise ValueError(f"Unknown provider: {provider}")
    except Exception as e:
        # Automatic fallback to Groq if Gemini fails (rate limit, etc.)
        if provider == LLMProvider.GEMINI and groq_client:
            print(f"‚ö†Ô∏è Gemini failed: {e}")
            print("üîÑ Falling back to Groq (LLaMA 3.3 70B)...")
            return generate_script_groq(article_content)
        raise


def validate_script(script_data: Dict) -> bool:
    """Validate the generated script structure."""
    if 'title' not in script_data:
        raise ValueError("Script missing 'title' field")
    if 'script' not in script_data:
        raise ValueError("Script missing 'script' field")
    if not isinstance(script_data['script'], list):
        raise ValueError("'script' must be a list")
    if len(script_data['script']) < 5:
        raise ValueError("Script too short (less than 5 exchanges)")
    
    valid_speakers = {'Rahul', 'Anjali'}
    for i, line in enumerate(script_data['script']):
        if 'speaker' not in line or 'text' not in line:
            raise ValueError(f"Line {i} missing 'speaker' or 'text' field")
        if line['speaker'] not in valid_speakers:
            raise ValueError(f"Invalid speaker '{line['speaker']}' at line {i}")
    
    return True


def display_script(script_data: Dict):
    """Display the script in a readable format."""
    print(f"\nüéôÔ∏è {script_data['title']}")
    print("=" * 60)
    
    for line in script_data['script']:
        speaker = line['speaker']
        text = line['text']
        color = "üîµ" if speaker == "Rahul" else "üü£"
        print(f"\n{color} {speaker}:")
        print(f"   {text}")
    
    print("\n" + "=" * 60)
    word_count = sum(len(line['text'].split()) for line in script_data['script'])
    est_duration = word_count / 150
    print(f"üìä {len(script_data['script'])} exchanges | {word_count} words | ~{est_duration:.1f} min")


print("‚úÖ Script generation functions defined!")

---
## 4. Text-to-Speech Synthesis (ElevenLabs)

In [None]:
# Voice mapping for our speakers (hardcoded Indian-accented voices)
VOICE_MAPPING = {
    "Rahul": {"voice_id": "mCQMfsqGDT6IDkEKR20a", "description": "Energetic Indian male voice"},
    "Anjali": {"voice_id": "2zRM7PkgwBPiau2jvVXc", "description": "Calm Indian female voice"}
}


def setup_voices():
    """Verify voice IDs are configured for Rahul and Anjali."""
    print("\nüé§ Voice Configuration:")
    print(f"  ‚úÖ Rahul: {VOICE_MAPPING['Rahul']['voice_id']} ({VOICE_MAPPING['Rahul']['description']})")
    print(f"  ‚úÖ Anjali: {VOICE_MAPPING['Anjali']['voice_id']} ({VOICE_MAPPING['Anjali']['description']})")


print("‚úÖ TTS voice setup functions defined!")

In [None]:
def preprocess_text_for_tts(text: str) -> str:
    """Preprocess text for TTS - handle emotional markers."""
    emotional_markers = {
        r'\(laughs\)': '... haha ...',
        r'\(giggles\)': '... hehe ...',
        r'\(surprised\)': '... oh! ...',
        r'\(excited\)': '',
        r'\(thinking\)': '... hmm ...',
        r'\(chuckles\)': '... heh ...',
    }
    
    for pattern, replacement in emotional_markers.items():
        text = re.sub(pattern, replacement, text, flags=re.IGNORECASE)
    
    # Remove remaining parenthetical markers
    text = re.sub(r'\([^)]*\)', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text


def generate_speech_segment(text: str, speaker: str, output_path: str) -> str:
    """Generate speech for a single dialogue segment."""
    voice_id = VOICE_MAPPING[speaker]['voice_id']
    
    if not voice_id:
        raise ValueError(f"Voice ID not set for {speaker}. Run setup_voices() first.")
    
    clean_text = preprocess_text_for_tts(text)
    
    audio = elevenlabs_client.text_to_speech.convert(
        voice_id=voice_id,
        text=clean_text,
        model_id="eleven_multilingual_v2",
        output_format="mp3_44100_128"
    )
    
    with open(output_path, 'wb') as f:
        for chunk in audio:
            f.write(chunk)
    
    return output_path


def generate_all_segments(script_data: Dict, output_dir: str = "audio_segments") -> List[str]:
    """Generate audio for all dialogue segments."""
    os.makedirs(output_dir, exist_ok=True)
    
    segment_files = []
    total = len(script_data['script'])
    
    print(f"\nüéôÔ∏è Generating {total} audio segments...")
    
    for i, line in enumerate(script_data['script']):
        speaker = line['speaker']
        text = line['text']
        
        filename = f"{output_dir}/segment_{i:03d}_{speaker.lower()}.mp3"
        
        print(f"  [{i+1}/{total}] {speaker}: {text[:40]}...")
        
        try:
            generate_speech_segment(text, speaker, filename)
            segment_files.append(filename)
            time.sleep(0.5)  # Rate limiting
        except Exception as e:
            print(f"  ‚ö†Ô∏è Error generating segment {i}: {e}")
            raise
    
    print(f"\n‚úÖ Generated {len(segment_files)} audio segments!")
    return segment_files


print("‚úÖ TTS generation functions defined!")

---
## 5. Audio Processing & Assembly

In [None]:
def merge_audio_segments(
    segment_files: List[str], 
    output_path: str = "output.mp3",
    pause_duration_ms: int = 250
) -> str:
    """Merge audio segments into a single MP3 file."""
    print(f"\nüîß Merging {len(segment_files)} audio segments...")
    
    # Start with silence for intro
    combined = AudioSegment.silent(duration=500)
    
    for i, file_path in enumerate(segment_files):
        try:
            segment = AudioSegment.from_mp3(file_path)
            
            # Add pause between segments
            if i > 0:
                pause = AudioSegment.silent(duration=pause_duration_ms)
                combined += pause
            
            combined += segment
        except Exception as e:
            print(f"  ‚ö†Ô∏è Error loading segment {i}: {e}")
            raise
    
    # Add silence for outro
    combined += AudioSegment.silent(duration=500)
    
    # Normalize audio levels
    combined = combined.normalize()
    
    # Export
    combined.export(output_path, format="mp3", bitrate="128k")
    
    duration_seconds = len(combined) / 1000
    
    print(f"\n‚úÖ Audio merged successfully!")
    print(f"   üìÅ Output: {output_path}")
    print(f"   ‚è±Ô∏è Duration: {duration_seconds:.1f} seconds ({duration_seconds/60:.1f} minutes)")
    print(f"   üìä File size: {os.path.getsize(output_path) / 1024:.1f} KB")
    
    return output_path


def cleanup_segments(segment_files: List[str]):
    """Clean up temporary audio segment files."""
    import shutil
    
    if segment_files:
        segment_dir = os.path.dirname(segment_files[0])
        if segment_dir and os.path.exists(segment_dir):
            shutil.rmtree(segment_dir)
            print(f"üßπ Cleaned up temporary files in {segment_dir}")


print("‚úÖ Audio processing functions defined!")

---
## 6. Output & Playback

In [None]:
def display_output(output_path: str, script_data: Dict):
    """Display the final output with audio player and script."""
    display(Markdown(f"# üéôÔ∏è {script_data['title']}"))
    display(Markdown("---"))
    
    display(Markdown("### üéß Listen to your podcast:"))
    display(Audio(output_path))
    
    display(Markdown("---"))
    display(Markdown("### üì• Download"))
    
    try:
        from google.colab import files
        display(Markdown("Click below to download:"))
        files.download(output_path)
    except ImportError:
        display(Markdown(f"Output saved to: `{output_path}`"))
    
    display(Markdown("---"))
    display(Markdown("### üìú Script"))
    display_script(script_data)


def save_script_json(script_data: Dict, output_path: str = "script.json"):
    """Save the script to a JSON file."""
    with open(output_path, 'w', encoding='utf-8') as f:
        json.dump(script_data, f, indent=2, ensure_ascii=False)
    print(f"üìÑ Script saved to: {output_path}")


print("‚úÖ Output functions defined!")

---
## üöÄ Run the Complete Pipeline

In [None]:
def run_pipeline(
    wikipedia_url: str,
    llm_provider: LLMProvider = LLMProvider.GEMINI,
    output_filename: str = "vani_podcast.mp3"
) -> Dict:
    """Run the complete Vani AI pipeline."""
    results = {}
    
    print("=" * 60)
    print("üéôÔ∏è VANI AI - HINGLISH PODCAST GENERATOR")
    print("=" * 60)
    print(f"\nüìå Source: {wikipedia_url}")
    print(f"ü§ñ LLM Provider: {llm_provider.value}")
    
    # Step 1: Fetch Wikipedia content
    print("\n" + "-" * 40)
    print("üì• STEP 1: Fetching Wikipedia content...")
    print("-" * 40)
    
    article_data = fetch_wikipedia_content(wikipedia_url)
    cleaned_content = clean_wikipedia_text(article_data['content'])
    
    print(f"‚úÖ Fetched: {article_data['title']}")
    print(f"   {len(cleaned_content.split())} words extracted")
    results['article'] = article_data
    
    # Step 2: Generate script
    print("\n" + "-" * 40)
    print("‚úçÔ∏è STEP 2: Generating Hinglish script...")
    print("-" * 40)
    
    script_data = generate_script(cleaned_content, provider=llm_provider)
    validate_script(script_data)
    script_data['source_url'] = wikipedia_url
    
    print(f"‚úÖ Generated: {script_data['title']}")
    print(f"   {len(script_data['script'])} dialogue exchanges")
    results['script'] = script_data
    
    save_script_json(script_data, "script.json")
    
    # Step 3: Setup voices
    print("\n" + "-" * 40)
    print("üé§ STEP 3: Setting up TTS voices...")
    print("-" * 40)
    
    setup_voices()
    
    # Step 4: Generate audio segments
    print("\n" + "-" * 40)
    print("üîä STEP 4: Generating audio segments...")
    print("-" * 40)
    
    segment_files = generate_all_segments(script_data)
    results['segment_files'] = segment_files
    
    # Step 5: Merge audio
    print("\n" + "-" * 40)
    print("üîß STEP 5: Merging audio segments...")
    print("-" * 40)
    
    output_path = merge_audio_segments(segment_files, output_filename)
    results['output_path'] = output_path
    
    # Cleanup
    cleanup_segments(segment_files)
    
    # Display results
    print("\n" + "=" * 60)
    print("üéâ PIPELINE COMPLETE!")
    print("=" * 60)
    
    display_output(output_path, script_data)
    
    return results


print("‚úÖ Pipeline function defined!")

### üéØ Generate Your Podcast!

Enter a Wikipedia URL below and run the cell to generate your Hinglish podcast.

In [None]:
# =============================================================
# üéØ CONFIGURE YOUR PODCAST HERE
# =============================================================

# Wikipedia article URL (change this to any Wikipedia article)
WIKIPEDIA_URL = "https://en.wikipedia.org/wiki/Mumbai_Indians"

# LLM Provider Options:
#   - LLMProvider.GEMINI  ‚Üí Primary: Gemini 2.0 Flash (best variety, auto-fallback to Groq)
#   - LLMProvider.GROQ    ‚Üí Fallback: LLaMA 3.3 70B via Groq (faster, more requests/day)
#   - LLMProvider.OPENAI  ‚Üí Alternative: GPT-4 Turbo
LLM_PROVIDER = LLMProvider.GEMINI

# Output filename
OUTPUT_FILENAME = "vani_podcast.mp3"

# =============================================================
# üöÄ RUN THE PIPELINE
# =============================================================

results = run_pipeline(
    wikipedia_url=WIKIPEDIA_URL,
    llm_provider=LLM_PROVIDER,
    output_filename=OUTPUT_FILENAME
)

---
## 7. Prompting Strategy Explanation

### How We Achieved Natural Hinglish Dialogue (100 words)

Our approach to generating authentic Hinglish dialogue focuses on four pillars:

1. **Anti-pattern enforcement** ‚Äì We explicitly ban templated phrases ("Arey Rahul, tune dekha?") and repetitive reactions ("Haan yaar"), forcing unique openings for each topic.

2. **Content-driven variety** ‚Äì The opener is chosen based on content type: surprising facts lead with hooks, technical topics start with questions, biographies begin with anecdotes.

3. **Sparing naturalism** ‚Äì Fillers ('yaar', 'na?') are limited to 2-3 per script maximum. Many lines have zero fillers, mimicking how professionals actually speak.

4. **Quality self-verification** ‚Äì The LLM checks its output against a checklist: unique opening, varied reactions, actual article facts, and balanced speaker contributions.

The two-host format (curious Rahul + expert Anjali) creates natural back-and-forth that sounds genuinely conversational, not templated.

---
## üìö Appendix: Try More Examples

In [None]:
# Try with different Wikipedia articles!

EXAMPLE_URLS = [
    "https://en.wikipedia.org/wiki/Mumbai_Indians",
    "https://en.wikipedia.org/wiki/Artificial_intelligence",
    "https://en.wikipedia.org/wiki/Shah_Rukh_Khan",
    "https://en.wikipedia.org/wiki/Indian_Premier_League",
    "https://en.wikipedia.org/wiki/Chandrayaan-3",
]

print("üéØ Example Wikipedia URLs to try:")
for i, url in enumerate(EXAMPLE_URLS, 1):
    title = url.split('/')[-1].replace('_', ' ')
    print(f"  {i}. {title}")
    print(f"     {url}")

### 1.5 Data Models