# Sesame CSM (1B) - Text to Speech

This notebook showcases how to generate speech from text using the Sesame CSM 1B model. This is ideal for converting instructional or conversational content into natural audio output.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DhivyaBharathy-web/PraisonAI/blob/main/examples/cookbooks/Sesame_CSM_1B_TTS.ipynb)


## 🧩 Dependencies

In [None]:
!pip install -q transformers
!pip install -q torchaudio
!pip install -q soundfile

## 🛠️ Tools
- transformers for loading the model
- torchaudio for audio processing
- soundfile for playback

## 🧾 YAML Prompt
```yaml
task: "Text to Speech"
style: "Clear, educational"
language: "en"
```

## 🧠 Main

In [None]:
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torchaudio
import soundfile as sf
import torch

processor = AutoProcessor.from_pretrained("m-a-p/Sesame-CM")
model = AutoModelForSpeechSeq2Seq.from_pretrained("m-a-p/Sesame-CM")
model = model.to("cuda" if torch.cuda.is_available() else "cpu")

text = "Welcome to the world of voice synthesis using open models!"
inputs = processor(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs)
speech = processor.batch_decode(outputs, return_tensors="pt")[0]

sf.write("output.wav", speech.numpy(), 16000)

## 📤 Output
🖼️ Output Preview (Text Summary):

Prompt: A clear educational text is converted to a .wav file.

🎧 The output audio will say: 'Welcome to the world of voice synthesis using open models!'
This demonstrates how Sesame-CM can be used for building TTS applications easily.