[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/index-tts/index-tts/blob/feat/english-colab-notebook/IndexTTS_Colab_EN.ipynb)

# IndexTTS: Zero-Shot Text-To-Speech on Colab (English UI)

This notebook allows you to run the IndexTTS system in Google Colab. It will clone the repository, install dependencies, download models, and start the Gradio web UI. The UI will be in English.

In [None]:
# Clone the IndexTTS repository
!git clone https://github.com/index-tts/index-tts.git
%cd index-tts
!ls -la

## Install Dependencies

This step installs `ffmpeg` (required for audio processing) and all the Python packages listed in `requirements.txt`.

In [None]:
# Install ffmpeg
!apt-get update && apt-get install -y ffmpeg

# Install uv for faster package installation
!pip install uv

# Install Python dependencies using uv
# Note: pynini can sometimes have installation issues.
# The original repo suggests `conda install -c conda-forge pynini==2.1.5` for Windows.
# For Colab (Linux), pip/uv should generally work. If issues arise with pynini,
# specific compatible versions or alternative installation methods might be needed.
!uv pip install -r requirements.txt --system
!uv pip install WeTextProcessing --system

## Download Models

The following commands will download the necessary model checkpoints from Hugging Face. You can choose between `huggingface-cli` (recommended) or `wget`.

**Important:** You might need to uncomment and run `!pip install huggingface_hub` if `huggingface-cli` is not found.

In [None]:
# Option 1: Using huggingface-cli (Recommended)
# Make sure you have huggingface_hub installed and you are logged in if necessary,
# though for public models, login might not be required.
# !pip install huggingface_hub  # User can uncomment if needed
!huggingface-cli download IndexTeam/Index-TTS   bigvgan_discriminator.pth   bigvgan_generator.pth   bpe.model   dvae.pth   gpt.pth   unigram_12000.vocab   --repo-type model   --local-dir checkpoints --local-dir-use-symlinks False

# Verify checkpoint files
!ls -l checkpoints/

## Run the Gradio Web UI

This command will start the Gradio web interface. Click the public URL (usually ending with `gradio.live`) that appears in the output to open the UI in your browser.
The UI should now be in English.

In [None]:
# Run the Web UI
!python webui.py

---
## (Optional) Command-Line Interface (CLI) Usage

You can also use IndexTTS via its command-line interface.
First, you'll need a reference audio. You can upload one to your Colab environment or use a sample. Let's create a dummy reference for demonstration if you don't have one.

**Note:** You'll need to have a `reference_voice.wav` file in the main `index-tts` directory for the example below to work, or modify the path.
You might need to stop the Web UI cell above to run this.

In [None]:
# (Example) Create a dummy reference voice file if you don't have one
# This is just a placeholder. Replace with your actual reference audio.
# import numpy as np
# import soundfile as sf
# samplerate = 22050
# duration = 1
# frequency = 440
# t = np.linspace(0., duration, int(samplerate * duration), endpoint=False)
# data = 0.5 * np.sin(2. * np.pi * frequency * t)
# sf.write('reference_voice.wav', data, samplerate)

# Install IndexTTS as a package for CLI usage
!pip install -e .

# Run CLI inference (make sure 'reference_voice.wav' exists or change path)
# !indextts "Hello, this is a test of the IndexTTS command line interface." \
#   --voice reference_voice.wav \
#   --model_dir checkpoints \
#   --config checkpoints/config.yaml \
#   --output output_cli.wav

# print("If successful, output_cli.wav should be generated.")
# You can then listen to it or download it from the file browser on the left.

---
## (Optional) Python Script Usage

You can also use IndexTTS directly in Python.

**Note:** You might need to stop the Web UI cell above to run this.

In [None]:
# from indextts.infer import IndexTTS

# # Ensure you have a reference voice, e.g., 'reference_voice.wav'
# # This assumes 'reference_voice.wav' is in the current directory (index-tts)
# reference_audio_path = "reference_voice.wav" 
# text_to_speak = "This is a sample sentence generated using the IndexTTS Python interface."
# output_file_path = "output_script.wav"

# if 'tts' not in locals(): # Avoid re-initializing if already done
#   tts = IndexTTS(model_dir="checkpoints",cfg_path="checkpoints/config.yaml")

# # Check if reference_voice.wav exists, if not, skip inference
# import os
# if os.path.exists(reference_audio_path):
#   tts.infer(reference_audio_path, text_to_speak, output_file_path)
#   print(f"Generated audio saved to {output_file_path}")
#   # You can play/download this file from Colab's file browser
# else:
#   print(f"Reference audio {reference_audio_path} not found. Skipping script inference demo.")