# üé§ Indic Parler TTS - Interactive Audio Quality Control

**Features:**
- 69 Named Speakers
- 21 Indian Languages
- 12 Emotion Tags
- Full Audio Quality Controls

Run all cells in order to launch the UI!

## 1Ô∏è‚É£ Check GPU

In [None]:
import torch

if torch.cuda.is_available():
    print(f"‚úÖ GPU Available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ùå No GPU! Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

## 2Ô∏è‚É£ Clone Repository

In [None]:
!git clone https://github.com/beginner4a3/ui.git
%cd ui
print("‚úÖ Repository cloned!")

## 3Ô∏è‚É£ Install Dependencies

In [None]:
%pip install -q gradio>=4.0.0
%pip install -q git+https://github.com/huggingface/parler-tts.git
%pip install -q transformers accelerate soundfile scipy huggingface_hub
print("‚úÖ Dependencies installed!")

## 4Ô∏è‚É£ Load Model (Enter your HF Token)

‚ö†Ô∏è **Enter your HuggingFace token below!**

Get your token from: https://huggingface.co/settings/tokens

In [None]:
# ‚ö†Ô∏è ENTER YOUR HUGGINGFACE TOKEN HERE ‚ö†Ô∏è
HF_TOKEN = "hf_your_token_here"  # Replace with your actual token

# Load the model
from app import setup_model
setup_model(HF_TOKEN)

## 5Ô∏è‚É£ Launch Interactive UI

This will start the Gradio interface with a public URL!

In [None]:
from app import launch_app
launch_app()

---

## üìã Alternative: All-in-One Cell

Run this single cell instead of cells 2-5 above:

In [None]:
# ==========================================
# ALL-IN-ONE: Setup + Load + Launch
# ==========================================

# ‚ö†Ô∏è ENTER YOUR HUGGINGFACE TOKEN HERE ‚ö†Ô∏è
HF_TOKEN = "hf_your_token_here"  # Replace with your actual token

# ---

import torch
import gradio as gr

# Configuration
SPEAKERS = ["-- Random Voice --", "Divya (Hindi)", "Rohit (Hindi)", "Maya (Hindi)", 
            "Karan (Hindi)", "Aditi (Tamil)", "Sunita (Tamil)", "Anjali (Telugu)"]
EMOTIONS = ["None", "Neutral", "Happy", "Sad", "Anger", "Fear", "Narration", "News"]
PITCH_MAP = {1: "low-pitched", 2: "slightly low-pitched", 3: "moderate pitch", 
             4: "slightly high-pitched", 5: "high-pitched"}
SPEED_MAP = {1: "slow pace", 2: "slightly slow pace", 3: "moderate pace",
             4: "slightly fast pace", 5: "fast pace"}
EXPR_MAP = {1: "monotone", 2: "slightly expressive", 3: "expressive and animated"}

# Load Model
print("üîê Logging into HuggingFace...")
from huggingface_hub import login
login(token=HF_TOKEN)

print("üîß Loading model (this takes a few minutes)...")
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer

device = "cuda:0" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

model = ParlerTTSForConditionalGeneration.from_pretrained(
    "ai4bharat/indic-parler-tts", torch_dtype=dtype, 
    attn_implementation="sdpa", token=HF_TOKEN
).to(device)

tokenizer = AutoTokenizer.from_pretrained("ai4bharat/indic-parler-tts", token=HF_TOKEN)
desc_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

print(f"‚úÖ Model loaded on {device}!")

# Generate function
def generate(text, speaker, gender, emotion, pitch, speed, expr, quality, noise, reverb):
    if speaker != "-- Random Voice --":
        name = speaker.split(" (")[0]
        desc = f"{name}'s voice is {EXPR_MAP[expr]} with a {PITCH_MAP[pitch]} tone at a {SPEED_MAP[speed]}"
    else:
        desc = f"A {gender} speaker with a {PITCH_MAP[pitch]} voice delivers {EXPR_MAP[expr]} speech at a {SPEED_MAP[speed]}"
    
    if emotion != "None":
        desc += f" with a {emotion} tone"
    desc += f". The recording is of {quality}, with {noise} audio and a {reverb} environment."
    
    desc_ids = desc_tokenizer(desc, return_tensors="pt").to(device)
    text_ids = tokenizer(text, return_tensors="pt").to(device)
    
    with torch.no_grad():
        gen = model.generate(
            input_ids=desc_ids.input_ids, attention_mask=desc_ids.attention_mask,
            prompt_input_ids=text_ids.input_ids, prompt_attention_mask=text_ids.attention_mask
        )
    
    audio = gen.cpu().numpy().squeeze()
    return (model.config.sampling_rate, audio), f"‚úÖ Done!\n\nüìù {desc}"

# Build UI (no load button needed)
with gr.Blocks(theme=gr.themes.Soft(primary_hue="purple")) as app:
    gr.Markdown("# üé§ Indic Parler TTS - Audio Quality Control")
    gr.Markdown(f"**Status:** ‚úÖ Model loaded on {device}")
    gr.Markdown("---")
    
    with gr.Row():
        with gr.Column():
            text = gr.Textbox(label="Text", value="Hello, welcome to Indic Parler TTS!", lines=3)
            speaker = gr.Dropdown(SPEAKERS, value="-- Random Voice --", label="Speaker")
            with gr.Row():
                gender = gr.Radio(["female", "male"], value="female", label="Gender")
                emotion = gr.Dropdown(EMOTIONS, value="None", label="Emotion")
            pitch = gr.Slider(1, 5, 3, step=1, label="Pitch (Low ‚Üí High)")
            speed = gr.Slider(1, 5, 3, step=1, label="Speed (Slow ‚Üí Fast)")
            expr = gr.Slider(1, 3, 2, step=1, label="Expressivity")
            quality = gr.Radio(["very high quality", "high quality", "good quality"], 
                              value="very high quality", label="Quality")
            noise = gr.Radio(["very clear", "slightly noisy", "noisy"], 
                            value="very clear", label="Background Noise")
            reverb = gr.Radio(["close-sounding", "slightly distant", "distant-sounding"],
                             value="close-sounding", label="Reverb")
        
        with gr.Column():
            gen_btn = gr.Button("üéôÔ∏è Generate Speech", variant="primary", size="lg")
            audio_out = gr.Audio(label="Output")
            status_out = gr.Textbox(label="Description", lines=5)
    
    gen_btn.click(generate, 
                  [text, speaker, gender, emotion, pitch, speed, expr, quality, noise, reverb],
                  [audio_out, status_out])

print("üöÄ Launching UI...")
app.launch(share=True)

---

## üìñ Audio Quality Settings Reference

| Setting | Best for Clarity |
|---------|------------------|
| **Pitch** | Moderate (3) |
| **Speed** | Moderate (3) |
| **Expressivity** | Slightly Expressive (2) |
| **Quality** | Very High Quality |
| **Noise** | Very Clear |
| **Reverb** | Close-Sounding |