A real-time voice and text conversation app powered by Google Agent Development Kit (ADK) and Gemini. It streams audio and text through a FastAPI WebSocket server, orchestrates two specialized agents, and delivers detailed post-turn analysis to the browser.
The app uses a manual sequential pattern instead of SequentialAgent because:
- The Live Agent requires
runner.run_live()for bidirectional audio streaming — incompatible withSequentialAgent. - The Detail Agent uses the standard
runner.run()for synchronous text generation. - Both agents share conversation context via session state (transcripts written after each turn).
| File / Module | Role |
|---|---|
main.py |
FastAPI app, WebSocket endpoint, agent orchestration |
google_search_agent/agent.py |
Live agent — handles real-time audio/text conversation |
detail_agent/agent.py |
Detail agent — post-turn analysis of transcripts |
static/index.html |
Browser client (audio capture + WebSocket messaging) |
.env |
API credentials (see Setup) |
- Browser captures microphone audio (PCM) → encodes as base64 → sends over WebSocket.
- Live Agent transcribes user speech (
input_audio_transcription) and speaks back (AUDIOmodality +output_audio_transcription). - Partial transcripts stream to the browser in real time with
"is_input_transcript": true/"is_output_transcript": trueflags. - On
turn_complete, full transcripts are written to a new Detail Agent session state. - Detail Agent reads both transcripts and produces a structured analysis, streamed back with
"is_detailed_analysis": true.
Steps 1–3 use plain text/plain messages; step 4–5 are skipped (analysis only runs in audio mode).
{ "mime_type": "text/plain", "data": "Hello!" }
{ "mime_type": "audio/pcm", "data": "<base64-encoded PCM>" }{ "mime_type": "text/plain", "data": "...", "partial": true, "is_input_transcript": true }
{ "mime_type": "text/plain", "data": "...", "partial": false, "is_output_transcript": true }
{ "mime_type": "audio/pcm", "data": "<base64 PCM>" }
{ "mime_type": "application/json", "data": "...", "partial": false, "is_detailed_analysis": true }
{ "turn_complete": true, "interrupted": false }- Python 3.12+
uv(recommended) orpip- A Google AI / Gemini API key
uv sync
# or: pip install fastapi google-adk google-generativeai python-dotenv python-multipart uvicorncp .env.example .env
# Edit .env and set:
# GOOGLE_API_KEY=your_key_hereuvicorn main:app --reloadOpen http://localhost:8000 in your browser.
.
├── main.py # FastAPI app + WebSocket + agent orchestration
├── google_search_agent/
│ └── agent.py # Live agent definition
├── detail_agent/
│ └── agent.py # Detail analysis agent definition
├── static/
│ └── index.html # Browser client
├── pyproject.toml
├── uv.lock
└── .env # API key (git-ignored)
| Package | Purpose |
|---|---|
google-adk |
Agent Development Kit — runners, sessions, live streaming |
google-generativeai |
Gemini API client |
fastapi |
WebSocket + HTTP server |
uvicorn |
ASGI server |
python-dotenv |
Environment variable loading |
python-multipart |
Multipart form support |
APP_NAMEis set to"ADK Streaming example"— used as the session namespace.- Audio mode uses
AUDIOresponse modality with bothinput_audio_transcriptionandoutput_audio_transcriptionenabled. - Text mode uses
TEXTmodality with no transcription config. - The Detail Agent receives a fresh
InMemoryRunnerand session per turn to avoid state pollution. - Logging for
google.genaiandgoogle.adkis set toERRORto suppress verbose streaming warnings.
MIT