# Sarvam¬†AI Pipeline¬†Notebook¬†üìì  
**STT ‚Üí Translation ‚Üí TTS** (with optional Transliteration)  
*Generated on 2025-07-04*  

This Colab/Jupyter notebook demonstrates:  

1. **Speech‚Äëto‚ÄëText (STT)** using the `saarika:v2.5` model  
2. **Text Translation** using `sarvam‚Äëtranslate`  
3. **Text‚Äëto‚ÄëSpeech (TTS)** with the **Bulbul** model  
4. (Optional) **Transliteration** examples (Romanisation ‚Üî Indic, numeral options)  

> üëâ Replace placeholders like `YOUR_SARVAM_API_KEY` with your real key.  


In [1]:
!uv add sarvamai

[2mResolved [1m80 packages[0m [2min 81ms[0m[0m
[2mAudited [1m74 packages[0m [2min 0.14ms[0m[0m


In [2]:
from sarvamai import SarvamAI
from sarvamai.play import play, save
import os, sys, json

## üîë Set your API key

In [3]:
from dotenv import load_dotenv
load_dotenv()
SARVAM_API_KEY = os.getenv("SARVAM_API_KEY") or "YOUR_SARVAM_API_KEY"  # ‚¨ÖÔ∏è EDIT ME
client = SarvamAI(api_subscription_key=SARVAM_API_KEY)

## üéô Upload or provide audio file (.wav / .mp3)

In [4]:
def get_audio_file():
    supported = ['.wav', '.mp3']
    if 'google.colab' in sys.modules:
        from google.colab import files
        uploaded = files.upload()
        if not uploaded:
            return None
        path = list(uploaded.keys())[0]
    else:
        path = input("Enter path to .wav / .mp3 file: ").strip()
    if not os.path.exists(path):
        print("‚ùå File not found:", path); return None
    if os.path.splitext(path)[1].lower() not in supported:
        print("‚ùå Unsupported format."); return None
    print("‚úÖ Using audio:", path)
    return path

audio_file_path = get_audio_file()

‚úÖ Using audio: ./sample_hindi.wav


## üìù Step¬†1 ‚Äì Speech‚Äëto‚ÄëText (Saarika¬†v2.5)

In [10]:
%%time
if audio_file_path:
    with open(audio_file_path, 'rb') as f:
        stt_resp = client.speech_to_text.transcribe(
            file=f,
            model='saarika:v2.5',
            language_code='unknown'  # auto‚Äëdetect
        )
    print("üó£Ô∏è Transcribed Text:", stt_resp)
    original_text = stt_resp.transcript
    detected_lang = stt_resp.language_code
else:
    raise ValueError('No audio file - abort.')

üó£Ô∏è Transcribed Text: request_id='20250704_409b95aa-7e50-4e4b-9107-a81989a53751' transcript='‡§ï‡•á‡§∂‡§µ ‡§ï‡•á ‡§ò‡§∞ ‡§Æ‡•á‡§Ç ‡§ö‡§æ‡§∞ ‡§ñ‡§ø‡§°‡§º‡§ï‡§ø‡§Ø‡§æ‡§Ç ‡§π‡•à‡§Ç‡•§\n‡§ï‡§à ‡§≤‡•ã‡§ó ‡§ï‡•Å‡§Æ‡§æ‡§∞ ‡§ï‡•ã ‡§™‡§∏‡§Ç‡§¶ ‡§ï‡§∞‡§§‡•á ‡§π‡•à‡§Ç‡•§\n‡§§‡•Å‡§Æ‡•ç‡§π‡§æ‡§∞‡•á ‡§ñ‡§∞‡§ó‡•ã‡§∂ ‡§ï‡§æ ‡§∞‡§Ç‡§ó ‡§∏‡§´‡•á‡§¶ ‡§π‡•à‡•§\n‡§Ü‡§™‡§ï‡•Ä ‡§ó‡§æ‡§Ø ‡§ï‡§≤ ‡§∏‡•á ‡§Ø‡§π‡§æ‡§Ç ‡§π‡•à‡•§\n‡§ï‡§≤ ‡§ï‡§æ ‡§ñ‡§æ‡§®‡§æ ‡§∏‡•Å‡§≤‡•á‡§ñ‡§æ ‡§ò‡•Ä ‡§°‡§æ‡§≤‡§ï‡§∞ ‡§¨‡§®‡§æ‡§è‡§ó‡•Ä‡•§\n‡§Ö‡§ï‡•ç‡§∑‡§Ø ‡§ï‡•Ä ‡§ñ‡•Ä‡§∞ ‡§ó‡§∞‡§Æ ‡§π‡•ã ‡§ó‡§à‡•§\n‡§Æ‡•à‡§Ç‡§®‡•á ‡§ï‡§≤ ‡§ñ‡•ç‡§µ‡§æ‡§¨ ‡§Æ‡•á‡§Ç ‡§è‡§ï ‡§ñ‡•Ç‡§¨‡§∏‡•Ç‡§∞‡§§' timestamps=None diarized_transcript=None language_code='hi-IN'
CPU times: user 7.96 ms, sys: 10.5 ms, total: 18.5 ms
Wall time: 965 ms


## üåê Step¬†2 ‚Äì Translate Text

In [13]:
%%time
# Example detected_lang and text
detected_lang = detected_lang if detected_lang != "unknown" else "hi-IN"
translated_texts = []
chunks = [original_text]  # You can replace this with actual chunking if needed

for idx, chunk in enumerate(chunks):
    resp = client.text.translate(
        input=chunk,
        source_language_code=detected_lang,
        target_language_code=TARGET_LANG,
        speaker_gender="Male",
        mode="formal",
        model="sarvam-translate:v1",
        enable_preprocessing=False,
    )
    print(f"Chunk {idx + 1}:\n", resp.translated_text, "\n")
    translated_texts.append(resp.translated_text)

translated_text = "\\n".join(translated_texts)
print("üìù Final Translation:", translated_text)

Chunk 1:
 ‡≤ï‡≥á‡≤∂‡≤µ‡≥ç ‡≤Æ‡≤®‡≥Ü‡≤ó‡≥Ü ‡≤®‡≤æ‡≤≤‡≥ç‡≤ï‡≥Å ‡≤ï‡≤ø‡≤ü‡≤ï‡≤ø‡≤ó‡≤≥‡≤ø‡≤µ‡≥Ü.
‡≤ï‡≥Å‡≤Æ‡≤æ‡≤∞‡≥ç ‡≤é‡≤Ç‡≤¶‡≤∞‡≥Ü ‡≤π‡≤≤‡≤µ‡≤∞‡≤ø‡≤ó‡≥Ü ‡≤á‡≤∑‡≥ç‡≤ü.
‡≤®‡≤ø‡≤Æ‡≥ç‡≤Æ ‡≤Æ‡≥ä‡≤≤ ‡≤¨‡≤ø‡≤≥‡≤ø ‡≤¨‡≤£‡≥ç‡≤£‡≤¶‡≤≤‡≥ç‡≤≤‡≤ø‡≤¶‡≥Ü.
‡≤®‡≤ø‡≤Æ‡≥ç‡≤Æ ‡≤π‡≤∏‡≥Å ‡≤®‡≤ø‡≤®‡≥ç‡≤®‡≥Ü ‡≤á‡≤≤‡≥ç‡≤≤‡≤ø‡≤ó‡≥Ü ‡≤¨‡≤Ç‡≤¶‡≤ø‡≤¶‡≥Ü.
‡≤®‡≤ø‡≤®‡≥ç‡≤®‡≥Ü‡≤Ø ‡≤Ö‡≤°‡≥Å‡≤ó‡≥Ü‡≤Ø‡≤®‡≥ç‡≤®‡≥Å ‡≤§‡≥Å‡≤™‡≥ç‡≤™‡≤¶‡≤≤‡≥ç‡≤≤‡≤ø ‡≤Ö‡≤µ‡≤≥‡≥Å ‡≤Æ‡≤æ‡≤°‡≥ç‡≤§‡≤æ‡≤≥‡≥Ü.
‡≤Ö‡≤ï‡≥ç‡≤∑‡≤Ø‡≥ç ‡≤ñ‡≥Ä‡≤∞‡≥ç ‡≤¨‡≤ø‡≤∏‡≤ø‡≤Ø‡≤æ‡≤ó‡≤ø‡≤¶‡≥Ü.
‡≤®‡≤ø‡≤®‡≥ç‡≤®‡≥Ü ‡≤®‡≤®‡≤ó‡≥Ü ‡≤í‡≤Ç‡≤¶‡≥Å ‡≤∏‡≥Å‡≤Ç‡≤¶‡≤∞‡≤µ‡≤æ‡≤¶ ‡≤ï‡≤®‡≤∏‡≥Å ‡≤¨‡≤ø‡≤§‡≥ç‡≤§‡≥Å. 

üìù Final Translation: ‡≤ï‡≥á‡≤∂‡≤µ‡≥ç ‡≤Æ‡≤®‡≥Ü‡≤ó‡≥Ü ‡≤®‡≤æ‡≤≤‡≥ç‡≤ï‡≥Å ‡≤ï‡≤ø‡≤ü‡≤ï‡≤ø‡≤ó‡≤≥‡≤ø‡≤µ‡≥Ü.
‡≤ï‡≥Å‡≤Æ‡≤æ‡≤∞‡≥ç ‡≤é‡≤Ç‡≤¶‡≤∞‡≥Ü ‡≤π‡≤≤‡≤µ‡≤∞‡≤ø‡≤ó‡≥Ü ‡≤á‡≤∑‡≥ç‡≤ü.
‡≤®‡≤ø‡≤Æ‡≥ç‡≤Æ ‡≤Æ‡≥ä‡≤≤ ‡≤¨‡≤ø‡≤≥‡≤ø ‡≤¨‡≤£‡≥ç‡≤£‡≤¶‡≤≤‡≥ç‡≤≤‡≤ø‡≤¶‡≥Ü.
‡≤®‡≤ø‡≤Æ‡≥ç‡≤Æ ‡≤π‡≤∏‡≥Å ‡≤®‡≤ø‡≤®‡≥ç‡≤®‡≥Ü ‡≤á‡≤≤‡≥ç‡≤≤‡≤ø‡≤ó‡≥Ü ‡≤¨‡≤Ç‡≤¶‡≤ø‡≤¶‡≥Ü.
‡≤®‡≤ø‡≤®‡≥ç‡≤®‡≥Ü‡≤Ø ‡≤Ö‡≤°‡≥Å‡≤ó‡≥Ü‡≤Ø‡≤®‡

## üîä Step¬†3 ‚Äì Generate Speech (Bulbul)

In [15]:
tts_resp = client.text_to_speech.convert(
    text=translated_text,
    target_language_code=f"{TARGET_LANG}" if TARGET_LANG!='en' else "en-IN",
    speaker="anushka",             # female voice; change as desired
    enable_preprocessing=True
)
# Play inline (Colab/Jupyter)
play(tts_resp)
# Save to file
save(tts_resp, "output_audio.wav")
print("üíæ Saved to output_audio.wav")

üíæ Saved to output_audio.wav


## üî° (Optional) Transliteration Examples

In [16]:
hin_text = "‡§Æ‡•Å‡§ù‡•á ‡§ï‡§≤ 9:30am ‡§ï‡•ã appointment ‡§π‡•à"
print("Original Hindi:", hin_text)

# Indic ‚Üí Roman
roman = client.text.transliterate(
    input=hin_text,
    source_language_code="hi-IN",
    target_language_code="en-IN",
    spoken_form=True
).transliterated_text
print("Romanised:", roman)

# Roman ‚Üí Indic
back = client.text.transliterate(
    input=roman,
    source_language_code="hi-IN",
    target_language_code="hi-IN",
    spoken_form=True
).transliterated_text
print("Back to Hindi:", back)

# Native vs English numerals speech
native_num = client.text.transliterate(
    input=hin_text,
    source_language_code="hi-IN",
    target_language_code="hi-IN",
    spoken_form=True,
    numerals_format="native"
).transliterated_text
print("Native numerals:", native_num)

eng_num = client.text.transliterate(
    input=hin_text,
    source_language_code="hi-IN",
    target_language_code="hi-IN",
    spoken_form=True,
    spoken_form_numerals_language="english"
).transliterated_text
print("English-style numerals:", eng_num)

Original Hindi: ‡§Æ‡•Å‡§ù‡•á ‡§ï‡§≤ 9:30am ‡§ï‡•ã appointment ‡§π‡•à
Romanised: Muale ko kal 9:30am ko appointment hai
Back to Hindi: ‡§Æ‡•Å‡§Ü‡§≤‡•á ‡§ï‡•ã ‡§ï‡§≤ ‡§∏‡§æ‡•ù‡•á ‡§®‡•å ‡§¨‡§ú‡•á ‡§è.‡§è‡§Æ ‡§ï‡•ã ‡§Ö‡§™‡•â‡§á‡§Ç‡§ü‡§Æ‡•á‡§Ç‡§ü ‡§π‡•à
Native numerals: ‡§Æ‡•Å‡§ù‡•á ‡§ï‡§≤ ‡§∏‡•Å‡§¨‡§π ‡§∏‡§æ‡•ù‡•á ‡§®‡•å ‡§¨‡§ú‡•á ‡§Ö‡§™‡•â‡§á‡§Ç‡§ü‡§Æ‡•á‡§Ç‡§ü ‡§π‡•à
English-style numerals: ‡§Æ‡•Å‡§ù‡•á ‡§ï‡§≤ ‡§®‡§æ‡§á‡§® ‡§•‡§∞‡•ç‡§ü‡•Ä ‡§è ‡§è‡§Æ ‡§ï‡•ã ‡§Ö‡§™‡•â‡§á‡§Ç‡§ü‡§Æ‡•á‡§Ç‡§ü ‡§π‡•à
