<a href="https://colab.research.google.com/github/Troyanovsky/awesome-TTS-Colab/blob/main/Kitten_TTS_Nano.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🗣️ Kitten TTS Nano Google Colab

## 📄 Description  
This Colab notebook utilizes KittenML's new open-source TTS model, Kitten TTS, to generate speech from text. It's designed for highly efficient and accessible execution.

**Capabilities**: Text-to-speech, Multiple Expressive Voices, CPU-compatible, Ultra-small (25MB, 15M params)

---

## How to use

- Follow the instructions from the comments to change the text_to_generate
- Run all cells in the section you need
- The generated output will be in `output.wav`

---

## 🔗 Resources

- **GitHub Repository:** https://github.com/KittenML/KittenTTS
- **Model Availability (Nano Preview):** https://huggingface.co/KittenML/kitten-tts-nano-0.1

---

## 🎙️ Explore More TTS Models  
Want to try out additional TTS models? Check out the curated collection here:  
👉 [awesome-TTS-Colab](https://github.com/Troyanovsky/awesome-TTS-Colab)

## General TTS

In [1]:
# @title 🚀 Quick Start: Install KittenTTS

# Install KittenTTS directly from the provided wheel
# This ensures you get the exact version for the nano model preview.
!pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

print("KittenTTS installation complete!")

Collecting kittentts==0.1.0
  Downloading https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl (9.6 kB)
Collecting num2words (from kittentts==0.1.0)
  Downloading num2words-0.5.14-py3-none-any.whl.metadata (13 kB)
Collecting espeakng_loader (from kittentts==0.1.0)
  Downloading espeakng_loader-0.2.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.3 kB)
Collecting misaki>=0.9.4 (from misaki[en]>=0.9.4->kittentts==0.1.0)
  Downloading misaki-0.9.4-py3-none-any.whl.metadata (19 kB)
Collecting onnxruntime (from kittentts==0.1.0)
  Downloading onnxruntime-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.6 kB)
Collecting addict (from misaki>=0.9.4->misaki[en]>=0.9.4->kittentts==0.1.0)
  Downloading addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting phonemizer-fork (from misaki[en]>=0.9.4->kittentts==0.1.0)
  Downloading phonemizer_fork-3.3.2-py3-none-any.whl.metadata (48 kB)
[2K     [90m━━━━━━

In [None]:
# @title 🎶 Generate and Play Audio with Kitten TTS

# --- Configuration ---
# Enter the text you want the model to speak
text_to_generate = "Hello from Kitten TTS! This is a compact and expressive model, running right in your browser." # @param {type:"string"}

# Choose one of the available voices for the nano model.
# Available voices:
# 'expr-voice-2-m' (male), 'expr-voice-2-f' (female)
# 'expr-voice-3-m' (male), 'expr-voice-3-f' (female)
# 'expr-voice-4-m' (male), 'expr-voice-4-f' (female)
# 'expr-voice-5-m' (male), 'expr-voice-5-f' (female)
voice_to_use = 'expr-voice-2-f' # @param ["expr-voice-2-m", "expr-voice-2-f", "expr-voice-3-m", "expr-voice-3-f", "expr-voice-4-m", "expr-voice-4-f", "expr-voice-5-m", "expr-voice-5-f"] {type:"string"}

output_filename = "output.wav"
sample_rate = 24000 # The sample rate specified for saving the audio

# --- Imports ---
from kittentts import KittenTTS
import soundfile as sf
from IPython.display import Audio, display

# --- Model Initialization ---
print(f"Loading Kitten TTS model from KittenML/kitten-tts-nano-0.1...")
try:
    m = KittenTTS("KittenML/kitten-tts-nano-0.1")
    print("Model loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")
    print("Please ensure you ran the installation cell above and check the model path.")
    raise

# --- Audio Generation ---
print(f"Generating audio for text: '{text_to_generate}' with voice: '{voice_to_use}'")
try:
    audio_data = m.generate(text_to_generate, voice=voice_to_use)
    print("Audio generation complete.")
except Exception as e:
    print(f"Error during audio generation: {e}")
    print("Please check the voice name and input text.")
    raise

# --- Save Audio ---
try:
    sf.write(output_filename, audio_data, sample_rate)
    print(f"Audio saved to {output_filename}")
except Exception as e:
    print(f"Error saving audio to file: {e}")
    raise

# --- Display Audio ---
print("\nPlaying generated audio:")
display(Audio(output_filename, rate=sample_rate))