#### Agents SDK Course

## Voice

The Agents SDK introduces several unique features, with one of the standout capabilities being voice functionality. The voice tutorial demonstrates how to build voice-enabled AI agents that can process spoken input, generate intelligent responses, and deliver those responses as natural-sounding speech.

Firstly we need to get a `OPENAI_API_KEY` set up, for this you will need to create an account on [OpenAI](https://platform.openai.com/api-keys) and grab your API key.

In [124]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass.getpass("OpenAI API Key: ")

This integration also uses (Eleven Labs)[https://elevenlabs.io/app/settings/api-keys] API, So you will need to create an account and grab your API key.

In [125]:
os.environ["ELEVENLABS_API_KEY"] = os.getenv("ELEVENLABS_API_KEY") or getpass.getpass("Eleven Labs API Key: ")

Next we want to create a simple AI agent from the `Agent` object, which has strict instructions to repeat a question so the user understands the input. The rest is optional, the model we will use will be the `gpt-4o-mini` for simple responses.

In [126]:
from agents import Agent

agent = Agent(
    name="Assistant",
    instructions="Repeat the user's question back to them, and then answer it. (Answer in English)",
    model="gpt-4o-mini"
)

Next we want to initialize our `VoicePipeline` object from the `voice` library. Using a single agent we can pass in via the `SingleAgentVoiceWorkflow`.

In [127]:
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline

pipeline = VoicePipeline(
    workflow=SingleAgentVoiceWorkflow(agent)
    )

Next we want to create a `wav` file that the agent can read and answer for us.

For this step we will use the `wavfile` object from the `scripy.io` library.

This will read the wav file and return the frame rate alongside the audio bytes data.

In [128]:
from scipy.io import wavfile

# Load the audio file directly
frame_rate, audio_data = wavfile.read("../resources/president-question.wav")

We can use the print statement to understand what is inside the audio data:

In [129]:
print("Audio Data:", audio_data)

Audio Data: [0 0 0 ... 0 0 0]


Next we can also use the print statement to understand what is inside the frame rate:

In [130]:
print("Frame Rate:", frame_rate)

Frame Rate: 44100


Now we can process the wav file into the correct format for our agent to process, this will initalize `AudioInput` object with the following parameters:
- `buffer` ~ audio bytes data for reading the wav file.
- `frame_rate` ~ how fast the data should be read.
- `sample_width` ~ how many bytes represent each sample in the buffer.
- `channels` ~ how many channels, eg... 1 = mono, 2 = stereo etc...

In [131]:
from agents.voice import AudioInput

audio_input = AudioInput(
    buffer=audio_data,
    frame_rate=frame_rate,
    sample_width=2,
    channels=1
)

Next we can print the current values to look inside the `AudioInput` object we just created.

In [132]:
print("Audio Input:", audio_input)

Audio Input: AudioInput(buffer=array([0, 0, 0, ..., 0, 0, 0], shape=(179712,), dtype=int16), frame_rate=44100, sample_width=2, channels=1)


Now we want to test to make sure the input to our agent is correct, and the best way for this is to use the `Audio` object from `Ipython.display` - this will create a little tab we can press to play the audio sample.

If the audio does not sound like the wav file you have on your local computer then something has gone wrong in the processing, note mp3 files will not work.

In [133]:
from IPython.display import Audio

playable_audio = Audio(
    data=audio_input.buffer,
    rate=audio_input.frame_rate
)

playable_audio

Next we want to run our agent. Instead of using the `runner` we will use our pipeline we defined earlier.

Then we need to create a audio player using the `sounddevice` library. This will use the `OutputStream` method, we can define the frame rate, channels and dtype to play it back to us as this is seperate to the input.

Next we can use the `start` method to begin playing the audio.

Now the audio player is open and running we can start writing data to it, for this we open a for loop for each event using the `voice_stream_event_audio` flag and use the `write` method to create audio from the data provided.

In [134]:
import numpy as np
import sounddevice as sd

result = await pipeline.run(audio_input)

player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()

async for event in result.stream():
    if event.type == "voice_stream_event_audio":
        player.write(event.data)