<a href="https://colab.research.google.com/github/MikhailRogachev/ai-agent-gemini-tutorial/blob/master/single_speaker_TTS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Single-speaker text-to-speech
[References](https://ai.google.dev/gemini-api/docs/speech-generation#single-speaker)

To convert text to single-speaker audio, set the response modality to "audio", and pass a SpeechConfig object with VoiceConfig set. You'll need to choose a voice name from the prebuilt output voices.

This example saves the output audio from the model in a wave file

In [1]:
from google import genai
from google.genai import types
from google.colab import userdata

import wave

In [2]:
# the filename to save the sound generated.
file_name = 'out.wav'

In [3]:
# get api key from environment
apikey = userdata.get('GOOGLE_API_KEY')

In [4]:
# This function save the data received (pcm) to the file
# filename
def wave_file(filename, pcm):
    channels = 1
    rate = 24000
    sample_width = 2

    with wave.open(filename, "wb") as wf:
      wf.setnchannels(channels)
      wf.setsampwidth(sample_width)
      wf.setframerate(rate)
      wf.writeframes(pcm)

In [5]:
# Create the client using GOOGLE_API_KEY
client = genai.Client(api_key=apikey)

In [6]:
my_phrase = 'Hello my Friend!'

response = client.models.generate_content(
   model = "gemini-2.5-flash-preview-tts",
   contents = my_phrase,
   config = types.GenerateContentConfig(
       response_modalities = ["AUDIO"],
       speech_config = types.SpeechConfig(
           voice_config = types.VoiceConfig(
               prebuilt_voice_config = types.PrebuiltVoiceConfig(
                   voice_name = 'Kore'
                   )
               )
           ),
       )
   )

In [7]:
# data extracted from the Gemini response
data = response.candidates[0].content.parts[0].inline_data.data

In [8]:
# save data received to the file
wave_file(file_name, data)