# Oracle Cloud Infrastructure – Text-to-Speech (TTS)

This notebook demonstrates using Oracle Cloud Infrastructure's Text-to-Speech (TTS) service to convert text into speech. 

You'll:
1. Load a text file.
2. Detect the dominant language using OCI AI Language.
3. Map that language to a supported voice.
4. Synthesize speech and save as MP3.
5. Listen to the generated audio right here in the notebook.

----
### Learning & Experimentation

- Try using text files in different languages (see examples in `speech/`).
- Modify the code to select different voices or language codes.
- Experiment by creating your own short text files and see how OCI TTS handles them!

> **Tip:** If you get errors related to credentials or API permissions, check your OCI configuration and resource limits.


### Docs / Links
• Service docs:    https://docs.oracle.com/en-us/iaas/Content/speech/home.htm
• Python SDK:      https://github.com/oracle/oci-python-sdk/tree/master/src/oci/ai_speech
• Real time:       https://github.com/oracle/oci-ai-speech-realtime-python-sdk
• Support Slack:   #oci_speech_service_users   |  #igiu-innovation-lab
• Troubleshooting: #igiu-ai-learning


## Imports & Setup

This code is based on `speech/oci_speech_tts.py`. 

> **You may need to restart the kernel after installing any packages.**


In [1]:
import logging
from pathlib import Path
import os

from dotenv import load_dotenv
from envyaml import EnvYAML
import oci
from oci.ai_speech import AIServiceSpeechClient
from oci.ai_speech.models import (
    SynthesizeSpeechDetails,
    TtsOracleConfiguration,
    TtsOracleSpeechSettings,
    TtsOracleTts2NaturalModelDetails,
)
from oci.ai_language import AIServiceLanguageClient
from oci.ai_language.models import (
    BatchDetectDominantLanguageDetails,
    DominantLanguageDocument,
)
from IPython.display import Audio, display

load_dotenv()
logging.basicConfig(level=logging.INFO, format="%(asctime)s  %(levelname)-8s  %(message)s")
log = logging.getLogger("oci-tts-notebook")


## Helper Functions

*You can experiment by editing these mappings or by making the code read different files, voices, or languages.*

In [2]:
def load_yaml(path: Path):
    try:
        return EnvYAML(path)
    except Exception as exc:
        log.error("Failed to load YAML %s: %s", path, exc)
        return None

def make_oci_config(scfg):
    return oci.config.from_file(
        os.path.expanduser(scfg["oci"]["configFile"]),
        scfg["oci"]["profile"],
    )

def detect_language(text, cfg, compartment):
    client = AIServiceLanguageClient(config=cfg)
    req = BatchDetectDominantLanguageDetails(
        documents=[DominantLanguageDocument(key="1", text=text)],
        compartment_id=compartment,
    )
    res = client.batch_detect_dominant_language(req)
    if (
        getattr(res, "status", None) == 200
        and res.data.documents
        and res.data.documents[0].languages
    ):
        top = res.data.documents[0].languages[0]
        lang_code = top.code.lower()
        log.info("Detected language %s (confidence %.2f)", lang_code, top.score)
        return lang_code
    log.warning("Language detection failed; defaulting to 'en'")
    return "en"

# You can add more voices and codes if available! Try experimenting!
LANGUAGE_TTS_MAP = {
    "en": ("Stacy", "en-US"),
    "hi": ("Priya", "hi-IN"),
    "es": ("Paco", "es-ES"),
    "fr": ("Chloe", "fr-FR"),
    "de": ("Hans", "de-DE"),
}

def voice_and_lang_for(code):
    base = code.split("-")[0]
    return LANGUAGE_TTS_MAP.get(base, LANGUAGE_TTS_MAP["en"])

def synthesize_mp3(text, voice_id, oci_lang_code, outfile, cfg, compartment):
    client = AIServiceSpeechClient(
        config=cfg
    )
    details = SynthesizeSpeechDetails(
        text=text,
        is_stream_enabled=False,
        compartment_id=compartment,
        configuration=TtsOracleConfiguration(
            model_details=TtsOracleTts2NaturalModelDetails(
                voice_id=voice_id,
                language_code=oci_lang_code,
            ),
            speech_settings=TtsOracleSpeechSettings(
                text_type=TtsOracleSpeechSettings.TEXT_TYPE_TEXT,
                sample_rate_in_hz=22050,
                output_format=TtsOracleSpeechSettings.OUTPUT_FORMAT_MP3,
            ),
        ),
    )
    res = client.synthesize_speech(details)
    if res.status != 200:
        raise RuntimeError(f"TTS failed HTTP {res.status}")
    with outfile.open("wb") as fh:
        for chunk in res.data.iter_content():
            fh.write(chunk)
    log.info("MP3 saved → %s", outfile.resolve())


## Configuration

Set up your configuration here. You can try changing the text file (`text_file`), configuration file (`sandbox.yaml`), or experimenting with different lines of text for synthesis.

In [5]:
text_file = Path("speech/text_sample_english.txt")  # Try changing this to another file (e.g. 'speech/text_sample_spanish.txt')
config_file = Path("sandbox.yaml")           # Update if your config YAML is elsewhere

if not text_file.exists():
    raise FileNotFoundError(f"File not found: {text_file}")
text = text_file.read_text(encoding="utf-8").strip()
if not text:
    raise ValueError("Input file is empty.")

scfg = load_yaml(config_file)
if scfg is None:
    raise RuntimeError("Could not load config.")
cfg = make_oci_config(scfg)
compartment = scfg["oci"]["compartment"]


## Language Detection, Voice Selection, and Speech Synthesis

Run the next cell to detect the language, pick the correct voice, generate the audio, and play it. 

Feel free to change the text and experiment!

In [6]:
lang_code = detect_language(text, cfg, compartment)
voice_id, oci_lang = voice_and_lang_for(lang_code)

# Output mp3 path (will overwrite if exists)
mp3_path = text_file.with_suffix(".notebook.mp3")

print(f"Synthesizing with voice '{voice_id}' ({oci_lang}) → {mp3_path}")
synthesize_mp3(text, voice_id, oci_lang, mp3_path, cfg, compartment)

# Play audio in Jupyter directly!
display(Audio(str(mp3_path)))

2025-11-04 14:32:18,258  INFO      Detected language en (confidence 1.00)


Synthesizing with voice 'Stacy' (en-US) → speech/text_sample_english.notebook.mp3


2025-11-04 14:32:20,606  INFO      MP3 saved → /Users/ashish/work/code/python/workshop/speech/text_sample_english.notebook.mp3


----
# Further Experiments

- Try your own text (use any language supported by OCI TTS!)
- Add more language/voice mappings to `LANGUAGE_TTS_MAP` and test them out.
- Change the sample rate or output format in `synthesize_mp3`.
- Use other OCI config/profiles/compartments by editing the config YAML.

If you encounter any issues or want to contribute improvements, reach out on Slack or review the [OCI TTS docs](https://docs.oracle.com/en-us/iaas/Content/speech/using/using-tts.htm).
