# **Text-to-Speech Conversion using Sarvam AI API**

This notebook demonstrates how to convert text into speech using the Sarvam AI Text-to-Speech API.The resulting audio files are saved as `.wav` files.

## **Prerequisites**

Before running this notebook, ensure you have the following installed:

- Python 3.7 or higher
- Required Python packages: `sarvamai`

You can install the required packages using pip:

In [25]:
!pip install sarvamai



## **Import Required Libraries**

First, let's import all the necessary libraries.

In [26]:
from sarvamai import SarvamAI
from sarvamai.play import play, save

### **2. Call the API endpoint through the SDK, by passing API Parameters**

To use the TTS Bulbul API, you need an API subscription key. Follow these steps to set up your API key:

1. **Obtain your API key**: If you don’t have an API key, sign up on the [Sarvam AI Dashboard](https://dashboard.sarvam.ai/) to get one.
2. **Replace the placeholder key**: In the code below, replace "YOUR_SARVAM_AI_API_KEY" with your actual API key.

In [27]:
SARVAM_API_KEY = "758a2f99-dae7-428c-aac9-d6193a877b97"

In [28]:
client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

## **Understanding the Parameters**  

- `target_language_code`: The language of the text is BCP-47 format

- `speaker`: The speaker voice to be used for the output audio.
    - Female: Anushka, Manisha, Vidya, Arya
    - Male: Abhilash, Karun, Hitesh

- `pitch`: Controls the pitch of the audio. Lower values result in a deeper voice, while higher values make it sharper. The suitable range is between -0.75 and 0.75. Default is 0.0.

- `pace`: Controls the speed of the audio. Lower values result in slower speech, while higher values make it faster. The suitable range is between 0.5 and 2.0. Default is 1.0.

- `loudness`: Controls the loudness of the audio. Lower values result in quieter audio, while higher values make it louder. The suitable range is between 0.3 and 3.0. Default is 1.0.

- `speech_sample_rate`: Specifies the sample rate of the output audio. Supported values are 8000, 16000, 22050, 24000 Hz. If not provided, the default is 22050 Hz.

- `enable_preprocessing`: Controls whether normalization of English words and numeric entities (e.g., numbers, dates) is performed. Set to true for better handling of mixed-language text. Default is false.


In [29]:
text = "नमस्कार! मैं Maitri NetMeds से बोल रही हूँ। हम एक नई सेवा शुरू कर रहे हैं, जिससे आप अपनी दवाइयों की सब्सक्रिप्शन आसानी से ले सकती हैं। इससे आपको समय-समय पर दवाइयाँ नियमित रूप से मिलती रहेंगी, बिना किसी चिंता के। हमने इस सेवा का लिंक आपके WhatsApp पर भेजा है। कृपया उसे खोलें और अपनी सब्सक्रिप्शन प्रक्रिया को पूरा करें। धन्यवाद, और अगर आपको किसी भी मदद की ज़रूरत हो तो हमसे संपर्क करें।"

In [30]:
response = client.text_to_speech.convert(
    inputs=[text],
    target_language_code="hi-IN",
    speaker="anushka",
    enable_preprocessing=True,
)

### **3. Save/Play the audio output from TTS**

**To play audio, which you just inputted**

In [31]:
play(response)

**To save audio, which you just put for TTS to be saved**

In [32]:
save(response, "output.wav")

## **Output**

After running the notebook, you will have one  `output.wav` file containing the speech for the chunk of text you passed in notebook.

## **Conclusion**
This notebook provides a step-by-step guide to converting text into speech using the Sarvam AI API. You can modify the text, language, and other parameters to suit your specific needs.


### **Additional Resources**

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:

- **Documentation**: [docs.sarvam.ai](https://docs.sarvam.ai)  
- **Community**: [Join the Discord Community](https://discord.gg/hTuVuPNF)

---

### **9. Final Notes**

- Keep your API key secure.
- Use clear audio for best results.

**Keep Building!** 🚀