# Playing with Sarvam AI api

Sarvam provides Models & APIs across the stack to help developers build powerful applications. Whether you’re looking for chat completion, text translation, speech-to-text conversion, or a combination of speech recognition and translation, Sarvam has you covered.
Reference: https://docs.sarvam.ai/api-reference-docs/introduction

### Setting up Environment

In [5]:
SARVAM_API_KEY="your-api-key"

In [6]:
!pip install sarvamai




[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Experimenting with Speech to text models

This API transcribes speech to text in multiple Indian languages and English. Supports real-time transcription for interactive applications.

#### Available models: saarika:v1,saarika:v2, saarika:v2.5, saarika:flash

#### Request Params:
##### file -> Required file.
The audio file to transcribe. Supported formats are WAV (.wav) and MP3 (.mp3). The API works best with audio files sampled at 16kHz. If the audio contains multiple channels, they will be merged into a single channel.

##### model -> Optional
Specifies the model to use for speech-to-text conversion. Note:- Default model is saarika:v2

###### language_code-> Optional

Specifies the language of the input audio. This parameter is required to ensure accurate transcription. For the saarika:v1 model, this parameter is mandatory. For the saarika:v2 model, it is optional. unknown: Use this when the language is not known; the API will detect it automatically. Note:- that the saarika:v1 model does not support unknown language code.
##### Language Codes: unknown, hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN


### Code : Speech to Text

In [9]:
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key=SARVAM_API_KEY,
)

response = client.speech_to_text.transcribe(
    file=open("audio2.mp3", "rb"),
    model="saarika:v2",
    language_code="en-IN"
)

print(response)


request_id='20250616_b6f41bb1-294e-46cf-a132-b76a0f76cbc2' transcript="Learning programming in schools is becoming increasingly important in today's digital age. It not only teaches students how to write code, but also enhances their problem-solving, logical thinking, and creativity. Programming builds essential skills such as patience, persistence, and analytical thinking, which are valuable in all fields of life, not just in technology." timestamps=None diarized_transcript=None language_code='en-IN'


In [10]:
response = client.speech_to_text.transcribe(
    file=open("audio2.mp3", "rb"),
    model="saarika:v2",
    language_code="bn-IN"
)

print(response)


request_id='20250616_672f85c0-b033-4338-a6ca-9d7bd400c789' transcript='স্কুলে পড়াশোনা প্রোগ্রামিং এখন আবার অবস্থান হচ্ছে এখন আবার ডিজিটাল বছর থেকে অবস্থান হচ্ছে। এটি নতুন পক্ষে শিক্ষকদের কোড রাখার কিছু কিছু কিছু করতে পারে, বাট তার প্রবলেম সম্পূর্ণ, সংক্রান্ত ভুল এবং কৃতিক্রিয়া করতে অবস্থান অনুযায়ী করে। প্রোগ্রামিং অর্থন স্ক্রিপ্ত স্ক্রিপ্ত ব্যক্তি যেমন প্রেসিস্টেন্স এবং অ্যানালিটিক্যাল ভুল ভুল যায় যাদের সব সব বিদ্যুতের সম্পূর্ণ আছে, না শ' timestamps=None diarized_transcript=None language_code='bn-IN'


#### The direct translation from English Audio to Bengali is not appropriate :(

## Experimenting With Speech To Text Translate

Real-Time Speech to Text Translation API

This API automatically detects the input language, transcribes the speech, and translates the text to English.
#### Models available: saaras:v1, saaras:v2, saaras:v2.5, saaras:flash


### Code: Speech To Text Translate

In [11]:
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key=SARVAM_API_KEY,
)

response = client.speech_to_text.translate(
    file=open("audio.mp3", "rb"),
    model="saaras:v2"

)

print(response)


request_id='20250616_3c467a0e-477b-4af0-95a5-8dccbb78d6f3' transcript='Hi, how are you? Today we are doing some research on the APIs of Serum AI. There are various APIs. We are reading the documents on their websites and writing the code accordingly and experimenting. We really enjoy experimenting.' language_code='bn-IN' diarized_transcript=None


### Now this is cool !!! I recorded an Audio in Bengali - speaking about I am experimenting with Sarvam Ai and boom!! The Sarvam SARAS V2 model could easily understand that this is Bengali and then translated it into English

#### USE CASES:
1) Build Assistants that take commands in your mother tongue and perform tasks accordingly
2) Build evesdropping bots that listen to customers and stores reactions for sentiment Analysis
3) Build Survillence Systems that listens to conversations and Security agencies can explore insights from that data
4) Build Audio call bots that understands local language and feed information to RAG powered back end systems

   Posiibilities ar many !! Keep Exploring!! 

## Experimenting with Text to Speech Model

This is the model to convert text into spoken audio. The output is a wave file encoded as a base64 string.

#### Request Params:
##### text: string -> Required
The text(s) to be converted into speech.
Features:

    Each text should be no longer than 1500 characters
    Supports code-mixed text (English and Indic languages)
    
Important Note:

    For numbers larger than 4 digits, use commas (e.g., ‘10,000’ instead of ‘10000’)
    This ensures proper pronunciation as a whole number

##### target_language: Required

The language of the text is BCP-47 format
###### Language Codes: unknown, hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, en-IN, gu-IN

##### speaker: Optional

The speaker voice to be used for the output audio.

Default: Anushka

Model Compatibility (Speakers compatible with respective model):

    bulbul:v2:
        Female: Anushka, Manisha, Vidya, Arya
        Male: Abhilash, Karun, Hitesh

#### Code : Text To Speech



In [19]:
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key=SARVAM_API_KEY,
)

response = client.text_to_speech.convert(
    text="নিশ্চিতভাবেই! নিচে আমি পদার্থবিজ্ঞানের বিভিন্ন বিষয়কে ঘিরে কিছু প্রশ্ন বাংলায় লিখে দিলাম।",
    target_language_code="bn-IN",
    speaker='vidya'
)

print(response)


request_id='20250616_7190c36d-ceb1-46b5-8c34-6b0ba3c9bb2a' audios=['UklGRiTaAwBXQVZFZm10IBAAAAABAAEAIlYAAESsAAACABAAZGF0YQDaAwAQAAAAAgAJABoAFwAVABIADwARABQAFAAYABoAFwARABEAFQAYABsAGgAcACEAIQAhACAAIgAjACUAJQAlACMAIAAfABwAGQAcABwAHQAgABwAGQAZABYAFQAYABMAEwAYABMADwAMAAcABgAHAAoACQAGAAIA/v/8//r/+v/8//v//v/9//7/AwAEAAEAAAACAAQAAwD/////AAD7//j/+/8CAAUAAwAAAAEABAAHAAgABgAFAAcABwAGAAgACwAMAA4AEAAUABgAGQAWABcAHQAiACIAIQAiACMAIgAkACcAJwApACoAKgArACYAIgAiACUAJwAoACcAJgAkAB4AHQAjACYAJwAqAC0AMQAvAC4ALwAsACoALgAsACkAKwAnACMAIQAcABcAFAARAA0ACQAGAAEAAQADAAMAAgAFAAgABQAEAAcADwANAAcADQASABEAEwAZABkAFgATABIAFwAaABgAGwAjACUAHwAXABkAJAApAC8AOwBDAEYARgBGAEoAWABlAGcAbAB5AIEAfAB2AHkAggCKAIwAiwCFAIUAjACMAIwAjACLAIQAfwB7AHYAcABoAF4AWABSAEUAOQA4ADQAKAAgAB4AHAAWABEADgAGAPz/9v/y/+3/5//j/+D/3f/Z/9X/0//S/9P/2v/f/9j/0v/T/97/5P/d/9j/4P/n/+P/1v/T/9r/3P/e/97/4P/u//P/8f/v/+v/7v/x/+r/5//t//z/+//y//P//P8AAPn/9f8BAA0AEAATABgAIgAqAC8AOAA/AEkAVABYAF8AagBsAGsAaQBjAGgAcgBtAGUAZgBnAGEAXQBbAFsAWQBOAEIAPQA5ADQALgApACIAHgAdAB0AHQAf

### Now lets play this Audio


In [13]:
!pip install pyaudio

Collecting pyaudio
  Downloading PyAudio-0.2.14-cp313-cp313-win_amd64.whl.metadata (2.7 kB)
Downloading PyAudio-0.2.14-cp313-cp313-win_amd64.whl (173 kB)
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.14



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [20]:
import base64
import wave
import pyaudio
import io

base64_wave_data = response.audios[0]

# Decode base64 to binary
audio_data = base64.b64decode(base64_wave_data)

# Load audio into a BytesIO stream and open it with wave
audio_stream = io.BytesIO(audio_data)
wave_file = wave.open(audio_stream, 'rb')

# Set up PyAudio
p = pyaudio.PyAudio()
stream = p.open(
    format=p.get_format_from_width(wave_file.getsampwidth()),
    channels=wave_file.getnchannels(),
    rate=wave_file.getframerate(),
    output=True
)

# Read and play audio
chunk = 1024
data = wave_file.readframes(chunk)
while data:
    stream.write(data)
    data = wave_file.readframes(chunk)

# Cleanup
stream.stop_stream()
stream.close()
p.terminate()
wave_file.close()


### Cool !! the text was in Bengali and I got that read in Bengali only. We can play with different male and female voices
##### USECASES:
1) Voice Chat assistants - addressing Customer Queries
2) AI Agent based Voice call to brodcast offers, protions etc
3) Teaching Agents - explaining topics in Local Languages


## Experimenting With Translate Text Models

Translation converts text from one language to another while preserving its meaning. For Example: ‘मैं ऑफिस जा रहा हूँ’ translates to ‘I am going to the office’ in English, where the script and language change, but the original meaning remains the same.

Available languages:

    bn-IN: Bengali
    en-IN: English
    gu-IN: Gujarati
    hi-IN: Hindi
    kn-IN: Kannada
    ml-IN: Malayalam
    mr-IN: Marathi
    od-IN: Odia
    pa-IN: Punjabi
    ta-IN: Tamil
    te-IN: Telugu

Newly added languages:

    as-IN: Assamese
    brx-IN: Bodo
    doi-IN: Dogri
    kok-IN: Konkani
    ks-IN: Kashmiri
    mai-IN: Maithili
    mni-IN: Manipuri (Meiteilon)
    ne-IN: Nepali
    sa-IN: Sanskrit
    sat-IN: Santali
    sd-IN: Sindhi
    ur-IN: Urdu

#### Request Params:
This endpoint expects an object.
##### 1>> input -> string, Required
<=2000 characters

The text you want to translate is the input text that will be processed by the translation model. The maximum is 1000 characters for Mayura:v1 and 2000 characters for Sarvam-Translate:v1.
source_language_codeenumRequired

Source language code for translation input.

###### mayura:v1 Languages: Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu

###### sarvam-translate:v1 Languages: All mayura:v1 languages and Assamese, Bodo, Dogri, Konkani, Kashmiri, Maithili, Manipuri, Nepali, Sanskrit, Santali, Sindhi, Urdu

##### Note: mayura:v1 supports automatic language detection using ‘auto’ as the source language code. 

##### 2>> target_language_code: Required

The language code of the translated text. This specifies the target language for translation.

###### mayura:v1 Languages: Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu

###### sarvam-translate:v1 Languages: All mayura:v1 and Assamese, Bodo, Dogri, Konkani, Kashmiri, Maithili, Manipuri, Nepali, Sanskrit, Santali, Sindhi, Urdu

##### 3>>  speaker_gender: Optional
Please specify the gender of the speaker for better translations.

##### 4>>  mode: Optional

Specifies the tone or style of the translation.

Model Support:

    mayura:v1: Supports formal, classic-colloquial, and modern-colloquial modes
    sarvam-translate:v1: Only formal mode is supported

Default: formal
Allowed values:
modelenumOptional

Specifies the translation model to use.

    mayura:v1: Supports 12 languages with all modes, output scripts, and automatic language detection.
    sarvam-translate:v1: Supports all 22 scheduled languages of India, formal mode only.


### Code 

In [21]:
from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key=SARVAM_API_KEY,
)

response = client.text.translate(
    input="Hello, how are you?",
    source_language_code="auto",
    target_language_code="hi-IN",
    speaker_gender="Male"
)

print(response)


request_id='20250616_83c33bad-72b0-41ff-a905-8112c3cefa22' translated_text='नमस्ते, आप कैसे हैं??' source_language_code='en-IN'


In [22]:
response = client.text.translate(
    input="ওহমের সূত্র কী?",
    source_language_code="auto",
    target_language_code="hi-IN",
    speaker_gender="Male"
)

print(response)


request_id='20250616_99bdeaec-9d7d-45ec-808e-cd1edb8e6ecb' translated_text='ओम का नियम क्या है??' source_language_code='bn-IN'


In [23]:
response = client.text.translate(
    input="চুম্বকের উত্তর ও দক্ষিণ মেরুর বৈশিষ্ট্য কী?",
    source_language_code="auto",
    target_language_code="en-IN",
    speaker_gender="Male"
)

print(response)

request_id='20250616_7633f388-cded-465b-abdf-8fa12b37cd00' translated_text='What are the characteristics of the north and south poles of a magnet?' source_language_code='bn-IN'


In [24]:
response = client.text.translate(
    input="চুম্বকের উত্তর ও দক্ষিণ মেরুর বৈশিষ্ট্য কী?",
    source_language_code="auto",
    target_language_code="hi-IN",
    speaker_gender="Male"
)

print(response)

request_id='20250616_ffa64866-38b0-4a09-9728-09ed79504271' translated_text='चुंबक के उत्तर और दक्षिण ध्रुवों की विशेषताएँ क्या हैं??' source_language_code='bn-IN'


#### WOW!! the translations seems to be pretty good!!
There are lots of usecases that we can try:

#### 1. **Multilingual Customer Support Chatbots**

* **Use case:** Businesses can provide customer support in a user's native language, regardless of agent availability.
* **Example:** A Tamil-speaking user chats in Tamil; the AI translates it into Hindi or English for the support agent and vice versa.

#### 2. **Government e-Services Accessibility**

* **Use case:** Translate government forms, schemes, and policies into regional languages.
* **Example:** PM Kisan scheme documents in 22 Indian languages for rural farmers.

#### 3. **Education and E-Learning Platforms**

* **Use case:** Translate video lectures, quizzes, and study material into a student’s mother tongue.
* **Example:** An IIT course in English being made available in Bengali, Kannada, and Assamese.

#### 4. **News and Media Syndication**

* **Use case:** Regional news channels or digital portals can cross-publish news across states by translating content between languages.
* **Example:** A breaking news story in Malayalam is instantly translated into Hindi and shared nationwide.

#### 5. **Healthcare Communication**

* **Use case:** Doctors and patients across regions communicate via translated consultations or prescriptions.
* **Example:** A Telugu-speaking patient consulting with a Marathi-speaking doctor.

#### 6. **Legal and Judicial Accessibility**

* **Use case:** Translating court judgments or legal documents across languages to ensure regional understanding.
* **Example:** Supreme Court judgments translated into all 8th Schedule languages.

#### 7. **Agriculture Advisory Platforms**

* **Use case:** Translate crop advisory and weather updates for farmers in their local dialect.
* **Example:** AI translates pesticide use instructions into Odia or Gujarati.

#### 8. **Tourism and Hospitality**

* **Use case:** Real-time translation for guides, signage, and hotel services.
* **Example:** A Bengali tourist in Kerala getting real-time restaurant menu translations.

#### 9. **Content Creation & YouTube Localization**

* **Use case:** Translate and subtitle videos in multiple languages for regional audiences.
* **Example:** A Hindi tech YouTuber offering subtitles in Tamil, Marathi, and Malayalam using AI.

#### 10. **Job Portals & Recruitment**

* **Use case:** Job descriptions and CVs translated for rural job seekers in their native language.
* **Example:** A Kannada-speaking job seeker applying to a company in Maharashtra.



## Experimenting with ChatCompletionsAPI

Calls Sarvam LLM API to get the chat completion. Supported model(s): sarvam-m.
#### Request Params:
##### messages: list of objects -> Required
A list of messages comprising the conversation so far.
##### model: Required

Model ID used to generate the response, like sarvam-m.

##### temperature: double -> Optional
>=0
<=2
Defaults to 0.2

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Read more : https://docs.sarvam.ai/api-reference-docs/chat/completions

### CODE:

### Simple RAG - Retrieval Augumented Generation Experiment

In [26]:
from sarvamai import SarvamAI

client = SarvamAI(
api_subscription_key=SARVAM_API_KEY,)

response = client.chat.completions(messages=[
    {"role": "user", 
     "content": "Answer the query based on the context provided. Query: How is Study Smart innovating schools? Context: Study Smart Innovations is an educational institution dedicated to fostering a passion for software development and innovation. We provide comprehensive coaching on computer programming and web app development, empowering individuals to create impactful projects. Our seminars and workshops offer opportunities for learning and networking. Explore our collection of ebooks and materials to expand your knowledge and skills. "}])

print(response)


id='20250616_1ba4fc21-e502-4f46-bec6-d7873b50ebe5' choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionResponseMessage(content=' Study Smart Innovations is innovating schools by offering comprehensive coaching in computer programming and web app development, which equips students with technical skills to create impactful projects. They also provide seminars and workshops that facilitate learning and networking opportunities, fostering collaboration and exposure to industry insights. Additionally, their ebooks and educational materials allow students to independently expand their knowledge and skills, supporting self-paced learning and skill development.', role='assistant'))] created=1750095944 model='sarvam-m' object='chat.completion' usage=CompletionUsage(completion_tokens=84, prompt_tokens=103, total_tokens=187)


In [29]:
response = client.chat.completions(messages=[
    {"role": "user", 
     "content": "Answer the query based on the context provided. Query: Who is the owner of onlinestudysmart.com? Context: Study Smart Innovations is an educational institution dedicated to fostering a passion for software development and innovation. We provide comprehensive coaching on computer programming and web app development, empowering individuals to create impactful projects. Our seminars and workshops offer opportunities for learning and networking. The website is onlinestudysmart.com. Owner is Hiranmoy.Explore our collection of ebooks and materials to expand your knowledge and skills. "}])

print(response)

id='20250616_d32a3116-89f6-446d-bce7-7c1206ed2d52' choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionResponseMessage(content=' The owner of onlinestudysmart.com is **Hiranmoy**.', role='assistant'))] created=1750096116 model='sarvam-m' object='chat.completion' usage=CompletionUsage(completion_tokens=18, prompt_tokens=126, total_tokens=144)
