![Zonos Header](https://raw.githubusercontent.com/Zyphra/Zonos/refs/heads/main/assets/ZonosHeader.png)

# Zonos-v0.1

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

## Features

- **Natural Speech Generation**: Produces highly natural speech from text prompts.
- **Speech Cloning**: Accurately clones speech from a reference clip of just a few seconds.
- **Fine Control**:
  - Adjust speaking rate
  - Modify pitch variation
  - Control audio quality
  - Express emotions such as happiness, fear, sadness, and anger
- **High-Quality Output**: Generates speech natively at 44kHz.

With Zonos-v0.1, experience next-level speech synthesis that brings text to life with exceptional clarity and realism.


In [None]:
!apt install -y espeak-ng # For Ubuntu
# brew install espeak-ng # For MacOS

In [None]:
!apt install -y espeak-ng  # Install eSpeak phonemizer
!pip install -U uv torch torchaudio  # Install Python dependencies
!pip install git+https://github.com/Zyphra/Zonos.git  # Install Zonos


In [None]:
!pip uninstall -y numpy transformers
!pip install numpy==1.23.5
!pip install transformers --no-cache-dir
!pip uninstall -y torch torchvision torchaudio
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

In [None]:
!pip install -U scikit-learn scipy

In [None]:
import torch
import torchvision

print("Torch version:", torch.__version__)
print("Torchvision version:", torchvision.__version__)
print("CUDA available:", torch.cuda.is_available())


In [None]:
# Remove any broken installation
!pip uninstall -y zonos
!rm -rf /usr/local/lib/python3.11/dist-packages/zonos*

# Install dependencies
!apt install -y espeak-ng  # Required for phonemization
!pip install -U uv torch torchaudio transformers

# Clone and install Zonos from the GitHub repository
!git clone https://github.com/Zyphra/Zonos.git
%cd Zonos
!pip install -e .

In [None]:
import torch
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict

device = "cuda" if torch.cuda.is_available() else "cpu"
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device=device)

print("Zonos model loaded successfully!")


In [None]:
# Load an example audio file
wav, sampling_rate = torchaudio.load("path/to/your/audio_sample.mp3")

# Create speaker embedding
speaker = model.make_speaker_embedding(wav, sampling_rate)

# Define text prompt
cond_dict = make_cond_dict(text="Hello, world!", speaker=speaker, language="en-us")
conditioning = model.prepare_conditioning(cond_dict)

# Generate speech
codes = model.generate(conditioning)

# Decode and save the output
wavs = model.autoencoder.decode(codes).cpu()
torchaudio.save("generated_sample.wav", wavs[0], model.autoencoder.sampling_rate)


In [None]:
%cd Zonos

In [None]:
!uv run gradio_interface.py # Start Gradio UI