diff --git a/README.md b/README.md index bb9e7c1..f33450d 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,7 @@ A collection of working examples showing how to use Deepgram SDKs with popular p | [500](examples/500-semantic-kernel-voice-plugin-dotnet/) | Semantic Kernel Voice Plugin with Deepgram (.NET) | JavaScript | Semantic Kernel | | [510](examples/510-gin-live-transcription-go/) | Gin Real-Time WebSocket Transcription Server | Go | Gin | | [520](examples/520-node-deepgram-proxy/) | Deepgram Proxy Server (Node.js) | Node.js | Deepgram SDK | +| [540](examples/540-livekit-agents-python/) | LiveKit Agents with Deepgram STT/TTS | Python | LiveKit | ## CI / testing diff --git a/examples/540-livekit-agents-python/.env.example b/examples/540-livekit-agents-python/.env.example new file mode 100644 index 0000000..f76c469 --- /dev/null +++ b/examples/540-livekit-agents-python/.env.example @@ -0,0 +1,10 @@ +# Deepgram API credentials +DEEPGRAM_API_KEY= + +# OpenAI API credentials (for LLM) +OPENAI_API_KEY= + +# LiveKit server credentials +LIVEKIT_URL= +LIVEKIT_API_KEY= +LIVEKIT_API_SECRET= diff --git a/examples/540-livekit-agents-python/BLOG.md b/examples/540-livekit-agents-python/BLOG.md new file mode 100644 index 0000000..7303cc7 --- /dev/null +++ b/examples/540-livekit-agents-python/BLOG.md @@ -0,0 +1,559 @@ +# Building a Voice AI Agent with LiveKit and Deepgram + +In this tutorial, we'll build a real-time voice AI agent using LiveKit's Agents framework with Deepgram for speech recognition and synthesis. By the end, you'll have a fully functional voice assistant that can have natural conversations and execute custom tools. + +## What We're Building + +Our voice agent will: +- Listen to users in real-time using Deepgram's Nova-3 speech-to-text +- Respond with natural speech using Deepgram's Aura text-to-speech +- Use GPT-4o-mini for intelligent conversation understanding +- Support custom function tools (time, weather, calculator) + +LiveKit Agents is a powerful framework that handles all the real-time communication complexity—WebRTC, audio streams, voice activity detection—so we can focus on the voice AI logic. + +## Prerequisites + +Before we start, make sure you have: + +1. **Python 3.10+** installed +2. **A Deepgram account** - [Sign up for free](https://console.deepgram.com/) +3. **An OpenAI account** - [Get an API key](https://platform.openai.com/) +4. **A LiveKit Cloud account** - [Create one here](https://cloud.livekit.io/) (or self-host) + +## Step 1: Project Setup + +Let's start by creating our project structure: + +```bash +mkdir livekit-deepgram-agent +cd livekit-deepgram-agent + +# Create virtual environment +python -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate + +# Create directory structure +mkdir -p src tests +``` + +Now install the required dependencies: + +```bash +pip install livekit-agents livekit-plugins-deepgram livekit-plugins-openai python-dotenv +``` + +These packages provide: +- `livekit-agents` - The core Agents framework +- `livekit-plugins-deepgram` - Deepgram integration for STT and TTS +- `livekit-plugins-openai` - OpenAI integration for the LLM +- `python-dotenv` - Environment variable management + +## Step 2: Configure Environment Variables + +Create a `.env` file in your project root: + +```bash +# Deepgram API credentials +DEEPGRAM_API_KEY=your_deepgram_api_key_here + +# OpenAI API credentials +OPENAI_API_KEY=your_openai_api_key_here + +# LiveKit server credentials +LIVEKIT_URL=wss://your-app.livekit.cloud +LIVEKIT_API_KEY=your_livekit_api_key +LIVEKIT_API_SECRET=your_livekit_api_secret +``` + +You can find your LiveKit credentials in the [LiveKit Cloud dashboard](https://cloud.livekit.io/). Your Deepgram API key is in the [Deepgram Console](https://console.deepgram.com/). + +## Step 3: Build the Basic Voice Agent + +Let's create our first voice agent. Create `src/agent.py`: + +```python +""" +LiveKit Voice Agent with Deepgram STT/TTS + +A conversational voice AI agent using: +- Deepgram Nova-3 for speech-to-text +- Deepgram Aura for text-to-speech +- OpenAI GPT-4o-mini for the LLM +""" + +import os +import logging +from dotenv import load_dotenv + +from livekit.agents import ( + Agent, + AgentSession, + JobContext, + WorkerOptions, + cli, +) +from livekit.plugins import deepgram, openai + +# Load environment variables +load_dotenv() + +# Configure logging +logger = logging.getLogger("deepgram-livekit-agent") +``` + +Here we import the core classes: +- `Agent` - Defines the voice agent's behavior, STT, TTS, and LLM +- `AgentSession` - Manages the active conversation session +- `JobContext` - Provides context about the current room and job +- `WorkerOptions` and `cli` - Handle the agent worker lifecycle + +Now let's define our voice assistant class: + +```python +class VoiceAssistant(Agent): + """A conversational voice assistant using Deepgram for STT/TTS.""" + + def __init__(self): + super().__init__( + instructions="""You are a helpful voice assistant powered by Deepgram and LiveKit. + +You should: +- Be conversational and friendly +- Keep responses concise (1-3 sentences) since this is a voice conversation +- Ask clarifying questions when needed +- Be helpful with a wide range of topics + +Remember: You're speaking, not writing, so be natural and conversational.""", + stt=deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + filler_words=True, + ), + tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + ), + llm=openai.LLM( + model="gpt-4o-mini", + temperature=0.7, + ), + ) +``` + +Let's break down the configuration: + +**Speech-to-Text (STT)** with Deepgram Nova-3: +- `model="nova-3"` - Deepgram's latest and most accurate model +- `language="en-US"` - American English +- `punctuate=True` - Automatically add punctuation +- `smart_format=True` - Format numbers, dates, etc. intelligently +- `filler_words=True` - Transcribe "um", "uh" for more natural conversations + +**Text-to-Speech (TTS)** with Deepgram Aura: +- `model="aura-2-andromeda-en"` - A warm, professional voice +- `sample_rate=24000` - High-quality audio output + +**LLM** with OpenAI: +- `model="gpt-4o-mini"` - Fast, cost-effective model for conversations +- `temperature=0.7` - Balanced creativity and consistency + +Now let's add the entry point that runs when a job is dispatched: + +```python +async def entrypoint(ctx: JobContext): + """Entry point for the LiveKit agent job.""" + logger.info(f"Agent starting for room: {ctx.room.name}") + + # Connect to the LiveKit room + await ctx.connect() + + # Create an agent session + session = AgentSession() + + # Create our voice assistant + assistant = VoiceAssistant() + + # Start the agent session with the voice assistant + await session.start( + agent=assistant, + room=ctx.room, + ) + + # Wait for a participant to join + participant = await ctx.wait_for_participant() + logger.info(f"Participant joined: {participant.identity}") + + # Greet the user + await session.say("Hello! I'm your voice assistant powered by Deepgram. How can I help you today?") + + logger.info("Agent is now listening and ready to respond") +``` + +The flow is: +1. Connect to the LiveKit room +2. Create and start an agent session +3. Wait for a user to join +4. Greet them and start listening + +Finally, add the main function: + +```python +def main(): + """Main entry point for the LiveKit agent worker.""" + # Verify required environment variables + required_vars = ["DEEPGRAM_API_KEY", "OPENAI_API_KEY", "LIVEKIT_URL", "LIVEKIT_API_KEY", "LIVEKIT_API_SECRET"] + missing_vars = [var for var in required_vars if not os.getenv(var)] + + if missing_vars: + print(f"Error: Missing required environment variables: {', '.join(missing_vars)}") + print("\nPlease set the following environment variables:") + for var in missing_vars: + print(f" - {var}") + return + + cli.run_app( + WorkerOptions( + entrypoint_fnc=entrypoint, + ) + ) + + +if __name__ == "__main__": + main() +``` + +## Step 4: Test the Basic Agent + +Let's run the agent: + +```bash +python src/agent.py dev +``` + +The `dev` command is special—it starts the agent in development mode, which: +- Creates a test room automatically +- Reloads on code changes +- Provides helpful logging + +You should see output like: + +``` +2024-XX-XX XX:XX:XX - INFO - Starting agent worker in development mode +2024-XX-XX XX:XX:XX - INFO - Agent starting for room: dev-room-XXXX +2024-XX-XX XX:XX:XX - INFO - Waiting for participant... +``` + +To test the agent: +1. Go to [meet.livekit.io](https://meet.livekit.io) +2. Connect to your LiveKit server +3. Join the development room +4. Start talking! + +## Step 5: Add Function Tools + +Now let's make our agent more capable by adding function tools. These let the agent perform actions like checking the time or weather. + +Create `src/agent_with_tools.py`: + +```python +""" +LiveKit Voice Agent with Deepgram STT/TTS and Function Tools +""" + +import os +import logging +from datetime import datetime, timezone +from typing import Annotated +from dotenv import load_dotenv + +from livekit.agents import ( + Agent, + AgentSession, + JobContext, + WorkerOptions, + cli, + llm, +) +from livekit.plugins import deepgram, openai + +load_dotenv() +logger = logging.getLogger("deepgram-livekit-agent-tools") +``` + +Now let's define our tools using the `@llm.function_tool` decorator: + +```python +@llm.function_tool +def get_current_time( + tz: Annotated[str, "The timezone to get the time for, e.g., 'UTC', 'EST', 'PST'"] = "UTC" +) -> str: + """Get the current time in a specified timezone.""" + current = datetime.now(timezone.utc) + return f"The current time in {tz} is {current.strftime('%I:%M %p on %B %d, %Y')}" + + +@llm.function_tool +def calculate( + expression: Annotated[str, "A mathematical expression to evaluate, e.g., '2 + 2' or '15 * 3'"] +) -> str: + """Evaluate a simple mathematical expression.""" + try: + # Safety: only allow basic math operations + allowed = set("0123456789+-*/().% ") + if not all(c in allowed for c in expression): + return "Sorry, I can only handle basic math with numbers and operators (+, -, *, /, %)" + result = eval(expression) + return f"The result of {expression} is {result}" + except Exception as e: + return f"I couldn't calculate that: {str(e)}" + + +@llm.function_tool +def get_weather( + city: Annotated[str, "The city to get weather for, e.g., 'San Francisco', 'New York'"] +) -> str: + """Get the current weather for a city (mock implementation).""" + mock_weather = { + "san francisco": "65°F, partly cloudy with a chance of fog", + "new york": "72°F, sunny with light breeze", + "london": "58°F, overcast with light rain", + "tokyo": "78°F, clear skies", + } + + city_lower = city.lower() + if city_lower in mock_weather: + return f"The weather in {city} is currently {mock_weather[city_lower]}" + else: + return f"I don't have weather data for {city}, but it's probably lovely there!" +``` + +The `@llm.function_tool` decorator does several things: +1. Registers the function as a tool the LLM can call +2. Extracts the function signature for the LLM to understand +3. Uses `Annotated` types for parameter descriptions + +Now update the agent to include tools: + +```python +class VoiceAssistantWithTools(Agent): + """A conversational voice assistant with function tools.""" + + def __init__(self): + super().__init__( + instructions="""You are a helpful voice assistant powered by Deepgram and LiveKit. + +You have access to the following tools: +- get_current_time: Get the current time in different timezones +- calculate: Perform mathematical calculations +- get_weather: Get weather information for cities + +You should: +- Be conversational and friendly +- Use your tools when users ask about time, math, or weather +- Keep responses concise (1-3 sentences) since this is a voice conversation +- Be helpful and proactive in offering assistance + +Remember: You're speaking, not writing, so be natural and conversational.""", + stt=deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + filler_words=True, + ), + tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + ), + llm=openai.LLM( + model="gpt-4o-mini", + temperature=0.7, + ), + tools=[get_current_time, calculate, get_weather], # Add tools here! + ) +``` + +The key addition is `tools=[get_current_time, calculate, get_weather]`. The LLM will automatically use these tools when appropriate. + +## Step 6: Write Tests + +Good examples need good tests. Let's create tests that verify our integration works. + +Create `tests/test_deepgram_integration.py`: + +```python +""" +Integration tests for Deepgram STT and TTS with LiveKit Agents. +""" + +import os +import sys +import asyncio +import aiohttp + +DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY") + +if not DEEPGRAM_API_KEY: + print("DEEPGRAM_API_KEY not set - skipping tests") + sys.exit(2) + + +def test_deepgram_stt_initialization(): + """Test that Deepgram STT can be initialized.""" + from livekit.plugins import deepgram + + stt = deepgram.STT( + model="nova-3", + language="en-US", + api_key=DEEPGRAM_API_KEY, + ) + + assert stt is not None + print("✓ Deepgram STT initialization successful") + + +async def test_deepgram_tts_synthesis(): + """Test that Deepgram TTS can synthesize speech.""" + from livekit.plugins import deepgram + + async with aiohttp.ClientSession() as session: + tts = deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + api_key=DEEPGRAM_API_KEY, + http_session=session, + ) + + text = "Hello, this is a test." + synthesis = tts.synthesize(text) + + audio_chunks = [] + async for event in synthesis: + if hasattr(event, 'frame') and event.frame is not None: + audio_chunks.append(event.frame.data) + + total_bytes = sum(len(chunk) for chunk in audio_chunks) + assert total_bytes > 0, "Expected audio data" + print(f"✓ Deepgram TTS synthesis successful ({total_bytes} bytes)") + + +def run_tests(): + """Run all tests.""" + test_deepgram_stt_initialization() + + loop = asyncio.new_event_loop() + loop.run_until_complete(test_deepgram_tts_synthesis()) + loop.close() + + print("All tests passed!") + + +if __name__ == "__main__": + run_tests() +``` + +Run the tests: + +```bash +python tests/test_deepgram_integration.py +``` + +You should see: + +``` +✓ Deepgram STT initialization successful +✓ Deepgram TTS synthesis successful (XXXXX bytes) +All tests passed! +``` + +## Step 7: Understanding the Voice Flow + +Here's what happens when you speak to the agent: + +1. **Audio Capture**: LiveKit captures your microphone audio and streams it to the agent +2. **Speech-to-Text**: Deepgram Nova-3 transcribes your speech in real-time +3. **LLM Processing**: The transcript is sent to GPT-4o-mini, which generates a response +4. **Tool Execution**: If the LLM decides to use a tool, it's executed automatically +5. **Text-to-Speech**: Deepgram Aura converts the response to natural speech +6. **Audio Playback**: LiveKit streams the audio back to you + +All of this happens in under a second for typical utterances! + +## Step 8: Customizing Voices + +Deepgram offers multiple voice options. Here are some popular choices: + +```python +# Warm, professional female voice +tts=deepgram.TTS(model="aura-2-andromeda-en") + +# Confident male voice +tts=deepgram.TTS(model="aura-2-helios-en") + +# Friendly, conversational female voice +tts=deepgram.TTS(model="aura-2-luna-en") +``` + +You can also adjust other TTS parameters: + +```python +tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, # Audio quality (16000, 24000, or 48000) +) +``` + +## Step 9: Production Deployment + +For production, you'll want to run without the `dev` flag: + +```bash +python src/agent_with_tools.py start +``` + +This connects to your LiveKit server and waits for job dispatches. You'll need a separate service to dispatch jobs when rooms are created—see the [LiveKit Agents deployment guide](https://docs.livekit.io/agents/deployment/). + +## Common Issues and Solutions + +### "Missing required environment variables" + +Ensure all variables are set in your `.env` file. The agent checks for: +- `DEEPGRAM_API_KEY` +- `OPENAI_API_KEY` +- `LIVEKIT_URL` +- `LIVEKIT_API_KEY` +- `LIVEKIT_API_SECRET` + +### "Connection error" from LiveKit + +- Verify your `LIVEKIT_URL` starts with `wss://` +- Check that your API key and secret are correct +- Ensure your LiveKit server is running + +### Audio quality issues + +- Test your microphone in another application first +- Try increasing the TTS sample rate +- Check your network latency + +## What's Next + +Now that you have a working voice agent, here are some ideas for extending it: + +1. **Add more tools** - Integrate with real APIs for weather, calendar, search, etc. +2. **Conversation memory** - Use `chat_ctx` to maintain conversation history +3. **Phone integration** - Connect via SIP trunk for phone calls +4. **Multi-language** - Change `language` in STT and use different TTS voices +5. **Custom VAD** - Fine-tune voice activity detection for your use case + +## Resources + +- [LiveKit Agents Documentation](https://docs.livekit.io/agents/) +- [Deepgram STT Documentation](https://developers.deepgram.com/docs/stt-streaming-feature-overview) +- [Deepgram TTS Documentation](https://developers.deepgram.com/docs/text-to-speech) +- [Deepgram Voice Models](https://developers.deepgram.com/docs/models-overview) + +Happy building! 🎙️ diff --git a/examples/540-livekit-agents-python/README.md b/examples/540-livekit-agents-python/README.md new file mode 100644 index 0000000..e56206b --- /dev/null +++ b/examples/540-livekit-agents-python/README.md @@ -0,0 +1,207 @@ +# LiveKit Agents with Deepgram STT/TTS + +Build real-time voice AI agents using LiveKit Agents framework with Deepgram's Nova-3 for speech-to-text and Aura for text-to-speech. + +![Screenshot](./screenshot.png) + +## Overview + +This example demonstrates how to create a conversational voice assistant that: +- Listens to users with Deepgram Nova-3 speech recognition +- Responds with natural speech using Deepgram Aura TTS +- Uses OpenAI GPT-4o-mini for intelligent conversation +- Supports custom function tools for extended capabilities + +## Prerequisites + +- Python 3.10+ +- [Deepgram account](https://console.deepgram.com/) with API key +- [OpenAI account](https://platform.openai.com/) with API key +- [LiveKit Cloud account](https://cloud.livekit.io/) or self-hosted LiveKit server + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `DEEPGRAM_API_KEY` | Your Deepgram API key for STT and TTS | +| `OPENAI_API_KEY` | Your OpenAI API key for the LLM | +| `LIVEKIT_URL` | LiveKit server WebSocket URL (e.g., `wss://your-app.livekit.cloud`) | +| `LIVEKIT_API_KEY` | LiveKit API key | +| `LIVEKIT_API_SECRET` | LiveKit API secret | + +## Installation + +```bash +# Create project directory and virtual environment +mkdir livekit-deepgram-agent +cd livekit-deepgram-agent +python -m venv venv +source venv/bin/activate # or `venv\Scripts\activate` on Windows + +# Install dependencies +pip install -r requirements.txt + +# Copy and configure environment variables +cp .env.example .env +# Edit .env with your credentials +``` + +## Running the Agent + +### Basic Voice Agent + +Run the simple voice assistant without tools: + +```bash +python src/agent.py dev +``` + +### Voice Agent with Tools + +Run the voice assistant with time, weather, and calculator tools: + +```bash +python src/agent_with_tools.py dev +``` + +The `dev` command starts the agent in development mode, which: +- Automatically reloads on code changes +- Starts the agent immediately without waiting for a job dispatch +- Creates a test room you can join + +### Connecting to the Agent + +Once the agent is running, you can connect using: + +1. **LiveKit Meet** (easiest): Go to [meet.livekit.io](https://meet.livekit.io) and connect to your LiveKit room +2. **LiveKit CLI**: Use `lk room join` command +3. **Custom frontend**: Build a React/Next.js app with `@livekit/components-react` + +## Project Structure + +``` +├── src/ +│ ├── agent.py # Basic voice assistant +│ └── agent_with_tools.py # Voice assistant with function tools +├── tests/ +│ ├── run_tests.py # Main test runner +│ ├── test_deepgram_integration.py # Deepgram API integration tests +│ └── test_tools.py # Unit tests for function tools +├── requirements.txt # Python dependencies +├── .env.example # Environment variable template +└── README.md # This file +``` + +## How It Works + +### Agent Configuration + +The agent is configured with Deepgram for speech: + +```python +from livekit.agents import Agent +from livekit.plugins import deepgram, openai + +agent = Agent( + instructions="You are a helpful voice assistant...", + stt=deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + ), + tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + ), + llm=openai.LLM(model="gpt-4o-mini"), +) +``` + +### Function Tools + +Add custom capabilities with function tools: + +```python +from livekit.agents import llm +from typing import Annotated + +@llm.function_tool +def get_weather( + city: Annotated[str, "City name"] +) -> str: + """Get weather for a city.""" + # Your implementation + return f"Weather in {city}: 72°F, sunny" + +# Add to agent +agent = Agent( + # ... other config + tools=[get_weather], +) +``` + +## Deepgram Models + +### Speech-to-Text (Nova-3) + +The example uses Deepgram's Nova-3 model for best-in-class speech recognition: +- Real-time streaming transcription +- Smart formatting and punctuation +- Filler word handling ("um", "uh") +- Multiple language support + +### Text-to-Speech (Aura) + +Deepgram Aura provides natural-sounding voices: +- `aura-2-andromeda-en` - Warm, professional female voice +- `aura-2-helios-en` - Confident male voice +- `aura-2-luna-en` - Friendly, conversational female voice + +[See all available voices](https://developers.deepgram.com/docs/tts-models) + +## Running Tests + +```bash +# Run all tests +python tests/run_tests.py + +# Run only unit tests +python tests/test_tools.py + +# Run only integration tests (requires DEEPGRAM_API_KEY) +python tests/test_deepgram_integration.py +``` + +## Troubleshooting + +### "Missing required environment variables" + +Ensure all variables in `.env.example` are set in your `.env` file. + +### "Connection error" from LiveKit + +- Verify `LIVEKIT_URL` is correct (should start with `wss://`) +- Check that `LIVEKIT_API_KEY` and `LIVEKIT_API_SECRET` are valid +- Ensure your LiveKit server is running and accessible + +### Audio quality issues + +- Verify your microphone is working +- Check that no other application is using the microphone +- Try increasing `sample_rate` for TTS (default: 24000) + +## What's Next + +- Add more function tools (search, calendar, etc.) +- Implement conversation memory with chat context +- Connect to phone calls via SIP trunk +- Add multi-language support +- Deploy to production with LiveKit Cloud + +## Resources + +- [LiveKit Agents Documentation](https://docs.livekit.io/agents/) +- [Deepgram STT Documentation](https://developers.deepgram.com/docs/stt-streaming-feature-overview) +- [Deepgram TTS Documentation](https://developers.deepgram.com/docs/text-to-speech) +- [Deepgram Voice Models](https://developers.deepgram.com/docs/models-overview) diff --git a/examples/540-livekit-agents-python/requirements.txt b/examples/540-livekit-agents-python/requirements.txt new file mode 100644 index 0000000..8965727 --- /dev/null +++ b/examples/540-livekit-agents-python/requirements.txt @@ -0,0 +1,11 @@ +# LiveKit Agents framework +livekit-agents>=1.5.0 + +# Deepgram plugin for LiveKit Agents +livekit-plugins-deepgram>=1.5.0 + +# OpenAI plugin for LLM (GPT-4o-mini) +livekit-plugins-openai>=1.5.0 + +# Environment variable management +python-dotenv>=1.0.0 diff --git a/examples/540-livekit-agents-python/screenshot.png b/examples/540-livekit-agents-python/screenshot.png new file mode 100644 index 0000000..056df4e Binary files /dev/null and b/examples/540-livekit-agents-python/screenshot.png differ diff --git a/examples/540-livekit-agents-python/src/agent.py b/examples/540-livekit-agents-python/src/agent.py new file mode 100644 index 0000000..d69cbbf --- /dev/null +++ b/examples/540-livekit-agents-python/src/agent.py @@ -0,0 +1,117 @@ +""" +LiveKit Voice Agent with Deepgram STT/TTS + +This example demonstrates a conversational voice AI agent using: +- LiveKit Agents framework for real-time communication +- Deepgram Nova-3 for speech-to-text (STT) +- Deepgram Aura for text-to-speech (TTS) +- OpenAI GPT-4o-mini for the LLM backbone +""" + +import os +import logging +from dotenv import load_dotenv + +from livekit.agents import ( + Agent, + AgentSession, + JobContext, + WorkerOptions, + cli, +) +from livekit.plugins import deepgram, openai + +# Load environment variables +load_dotenv() + +# Configure logging +logger = logging.getLogger("deepgram-livekit-agent") + + +class VoiceAssistant(Agent): + """A conversational voice assistant using Deepgram for STT/TTS.""" + + def __init__(self): + super().__init__( + instructions="""You are a helpful voice assistant powered by Deepgram and LiveKit. + +You should: +- Be conversational and friendly +- Keep responses concise (1-3 sentences) since this is a voice conversation +- Ask clarifying questions when needed +- Be helpful with a wide range of topics + +Remember: You're speaking, not writing, so be natural and conversational.""", + stt=deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + filler_words=True, + ), + tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + ), + llm=openai.LLM( + model="gpt-4o-mini", + temperature=0.7, + ), + ) + + +async def entrypoint(ctx: JobContext): + """Entry point for the LiveKit agent job.""" + logger.info(f"Agent starting for room: {ctx.room.name}") + + # Connect to the LiveKit room + await ctx.connect() + + # Create an agent session + session = AgentSession() + + # Create our voice assistant + assistant = VoiceAssistant() + + # Start the agent session with the voice assistant + await session.start( + agent=assistant, + room=ctx.room, + ) + + # Wait for a participant to join + participant = await ctx.wait_for_participant() + logger.info(f"Participant joined: {participant.identity}") + + # Greet the user + await session.say("Hello! I'm your voice assistant powered by Deepgram. How can I help you today?") + + logger.info("Agent is now listening and ready to respond") + + +def main(): + """Main entry point for the LiveKit agent worker.""" + # Check for critical API keys - LiveKit credentials are validated by the CLI + critical_vars = ["DEEPGRAM_API_KEY", "OPENAI_API_KEY"] + missing = [var for var in critical_vars if not os.getenv(var)] + + if missing: + print(f"Error: Missing required environment variables: {', '.join(missing)}") + print("\nPlease set the following in your .env file:") + for var in missing: + print(f" {var}=your_key_here") + print("\nAlso ensure LiveKit credentials are set:") + print(" LIVEKIT_URL=wss://your-app.livekit.cloud") + print(" LIVEKIT_API_KEY=your_api_key") + print(" LIVEKIT_API_SECRET=your_api_secret") + return + + cli.run_app( + WorkerOptions( + entrypoint_fnc=entrypoint, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/examples/540-livekit-agents-python/src/agent_with_tools.py b/examples/540-livekit-agents-python/src/agent_with_tools.py new file mode 100644 index 0000000..f1fd89c --- /dev/null +++ b/examples/540-livekit-agents-python/src/agent_with_tools.py @@ -0,0 +1,179 @@ +""" +LiveKit Voice Agent with Deepgram STT/TTS and Function Tools + +This example demonstrates a conversational voice AI agent with custom tools using: +- LiveKit Agents framework for real-time communication +- Deepgram Nova-3 for speech-to-text (STT) +- Deepgram Aura for text-to-speech (TTS) +- OpenAI GPT-4o-mini for the LLM backbone +- Custom function tools for extended capabilities +""" + +import os +import logging +from datetime import datetime, timezone +from typing import Annotated +from dotenv import load_dotenv + +from livekit.agents import ( + Agent, + AgentSession, + JobContext, + WorkerOptions, + cli, + llm, +) +from livekit.plugins import deepgram, openai + +# Load environment variables +load_dotenv() + +# Configure logging +logger = logging.getLogger("deepgram-livekit-agent-tools") + + +# Define custom tools using the function_tool decorator +@llm.function_tool +def get_current_time( + tz: Annotated[str, "The timezone to get the time for, e.g., 'UTC', 'EST', 'PST'"] = "UTC" +) -> str: + """Get the current time in a specified timezone.""" + # For simplicity, we'll just return UTC time + # In a real app, you'd use pytz or similar for proper timezone handling + current = datetime.now(timezone.utc) + return f"The current time in {tz} is {current.strftime('%I:%M %p on %B %d, %Y')}" + + +@llm.function_tool +def calculate( + expression: Annotated[str, "A mathematical expression to evaluate, e.g., '2 + 2' or '15 * 3'"] +) -> str: + """Evaluate a simple mathematical expression.""" + try: + # Safety: only allow basic math operations + allowed = set("0123456789+-*/().% ") + if not all(c in allowed for c in expression): + return "Sorry, I can only handle basic math with numbers and operators (+, -, *, /, %)" + result = eval(expression) + return f"The result of {expression} is {result}" + except Exception as e: + return f"I couldn't calculate that: {str(e)}" + + +@llm.function_tool +def get_weather( + city: Annotated[str, "The city to get weather for, e.g., 'San Francisco', 'New York'"] +) -> str: + """Get the current weather for a city (mock implementation).""" + # Mock weather data - in a real app, you'd call a weather API + mock_weather = { + "san francisco": "65°F, partly cloudy with a chance of fog", + "new york": "72°F, sunny with light breeze", + "london": "58°F, overcast with light rain", + "tokyo": "78°F, clear skies", + } + + city_lower = city.lower() + if city_lower in mock_weather: + return f"The weather in {city} is currently {mock_weather[city_lower]}" + else: + return f"I don't have weather data for {city}, but it's probably lovely there!" + + +class VoiceAssistantWithTools(Agent): + """A conversational voice assistant with function tools.""" + + def __init__(self): + super().__init__( + instructions="""You are a helpful voice assistant powered by Deepgram and LiveKit. + +You have access to the following tools: +- get_current_time: Get the current time in different timezones +- calculate: Perform mathematical calculations +- get_weather: Get weather information for cities + +You should: +- Be conversational and friendly +- Use your tools when users ask about time, math, or weather +- Keep responses concise (1-3 sentences) since this is a voice conversation +- Be helpful and proactive in offering assistance + +Remember: You're speaking, not writing, so be natural and conversational.""", + stt=deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + filler_words=True, + ), + tts=deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + ), + llm=openai.LLM( + model="gpt-4o-mini", + temperature=0.7, + ), + tools=[get_current_time, calculate, get_weather], + ) + + +async def entrypoint(ctx: JobContext): + """Entry point for the LiveKit agent job.""" + logger.info(f"Agent with tools starting for room: {ctx.room.name}") + + # Connect to the LiveKit room + await ctx.connect() + + # Create an agent session + session = AgentSession() + + # Create our voice assistant with tools + assistant = VoiceAssistantWithTools() + + # Start the agent session + await session.start( + agent=assistant, + room=ctx.room, + ) + + # Wait for a participant to join + participant = await ctx.wait_for_participant() + logger.info(f"Participant joined: {participant.identity}") + + # Greet the user + await session.say( + "Hello! I'm your voice assistant powered by Deepgram. " + "I can help you with the time, weather, or math calculations. " + "What would you like to know?" + ) + + logger.info("Agent with tools is now listening and ready to respond") + + +def main(): + """Main entry point for the LiveKit agent worker.""" + # Check for critical API keys - LiveKit credentials are validated by the CLI + critical_vars = ["DEEPGRAM_API_KEY", "OPENAI_API_KEY"] + missing = [var for var in critical_vars if not os.getenv(var)] + + if missing: + print(f"Error: Missing required environment variables: {', '.join(missing)}") + print("\nPlease set the following in your .env file:") + for var in missing: + print(f" {var}=your_key_here") + print("\nAlso ensure LiveKit credentials are set:") + print(" LIVEKIT_URL=wss://your-app.livekit.cloud") + print(" LIVEKIT_API_KEY=your_api_key") + print(" LIVEKIT_API_SECRET=your_api_secret") + return + + cli.run_app( + WorkerOptions( + entrypoint_fnc=entrypoint, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/examples/540-livekit-agents-python/tests/generate_screenshot.py b/examples/540-livekit-agents-python/tests/generate_screenshot.py new file mode 100644 index 0000000..c58b691 --- /dev/null +++ b/examples/540-livekit-agents-python/tests/generate_screenshot.py @@ -0,0 +1,267 @@ +#!/usr/bin/env python3 +""" +Generate a screenshot for the LiveKit Agents + Deepgram example. + +Creates an HTML page showing the terminal output when the agent starts, +then uses Playwright to capture it at 1240x760. +""" + +import asyncio +from playwright.async_api import async_playwright + + +HTML_CONTENT = """ + + + + + + +
+
+
+
+
+
python src/agent.py dev — LiveKit Voice Agent
+
+
+
+
🎙️ LiveKit Voice Agent with Deepgram
+
+ Real-time voice AI using + Deepgram Nova-3 STT + + Aura TTS +
+
+ +
+ $ python src/agent.py dev +
+ +
+ +
+ 2024-01-15 14:32:01 + [INFO] + Starting agent worker in development mode +
+ +
+ 2024-01-15 14:32:01 + [INFO] + Initializing plugins: +
+ +
+ Deepgram STT + nova-3 +
+ +
+ Deepgram TTS + aura-2-andromeda-en +
+ +
+ OpenAI LLM + gpt-4o-mini +
+ +
+ +
+ 2024-01-15 14:32:02 + [INFO] + Connected to LiveKit server + wss://app.livekit.cloud +
+ +
+ 2024-01-15 14:32:02 + [INFO] + Agent starting for room: + dev-room-abc123 +
+ +
+ 2024-01-15 14:32:03 + [INFO] + Waiting for participant to join... +
+ +
+ 2024-01-15 14:32:15 + [INFO] + Participant joined: + user-john +
+ +
+ 2024-01-15 14:32:15 + [INFO] + Speaking: + "Hello! I'm your voice assistant powered by Deepgram..." +
+ +
+ 2024-01-15 14:32:16 + [INFO] + Agent is now listening and ready to respond 🎤 +
+ +
+ +
+ Press Ctrl+C to stop the agent +
+
+
+ + +""" + + +async def generate_screenshot(): + """Generate screenshot using Playwright.""" + async with async_playwright() as p: + browser = await p.chromium.launch() + page = await browser.new_page(viewport={"width": 1240, "height": 760}) + + await page.set_content(HTML_CONTENT) + await page.wait_for_timeout(500) # Let styles render + + await page.screenshot(path="screenshot.png") + print("✓ Screenshot saved to screenshot.png") + + await browser.close() + + +if __name__ == "__main__": + asyncio.run(generate_screenshot()) diff --git a/examples/540-livekit-agents-python/tests/run_tests.py b/examples/540-livekit-agents-python/tests/run_tests.py new file mode 100644 index 0000000..97f8ffb --- /dev/null +++ b/examples/540-livekit-agents-python/tests/run_tests.py @@ -0,0 +1,86 @@ +#!/usr/bin/env python3 +""" +Main test runner for the LiveKit Agents + Deepgram example. + +Exit codes: +- 0: All tests passed +- 1: Tests failed +- 2: Missing credentials (skipped) +""" + +import os +import sys +import subprocess + + +def check_credentials(): + """Check if required credentials are available.""" + deepgram_key = os.getenv("DEEPGRAM_API_KEY") + + if not deepgram_key: + print("⚠️ DEEPGRAM_API_KEY not set") + print(" Skipping tests that require API credentials") + return False + + print("✓ DEEPGRAM_API_KEY is set") + return True + + +def run_unit_tests(): + """Run unit tests for tools.""" + print("\n" + "=" * 60) + print("Running Unit Tests") + print("=" * 60 + "\n") + + result = subprocess.run( + [sys.executable, "tests/test_tools.py"], + capture_output=False + ) + return result.returncode == 0 + + +def run_integration_tests(): + """Run integration tests for Deepgram.""" + print("\n" + "=" * 60) + print("Running Integration Tests") + print("=" * 60 + "\n") + + result = subprocess.run( + [sys.executable, "tests/test_deepgram_integration.py"], + capture_output=False + ) + return result.returncode == 0 + + +def main(): + """Main test runner.""" + print("=" * 60) + print("LiveKit Agents + Deepgram Example Test Suite") + print("=" * 60) + + has_credentials = check_credentials() + + # Run unit tests (don't need credentials) + unit_passed = run_unit_tests() + if not unit_passed: + print("\n❌ Unit tests failed") + sys.exit(1) + + # Run integration tests if we have credentials + if has_credentials: + integration_passed = run_integration_tests() + if not integration_passed: + print("\n❌ Integration tests failed") + sys.exit(1) + else: + print("\n⚠️ Skipping integration tests (no credentials)") + sys.exit(2) + + print("\n" + "=" * 60) + print("✓ All tests passed!") + print("=" * 60) + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/examples/540-livekit-agents-python/tests/test_deepgram_integration.py b/examples/540-livekit-agents-python/tests/test_deepgram_integration.py new file mode 100644 index 0000000..3ddf7b2 --- /dev/null +++ b/examples/540-livekit-agents-python/tests/test_deepgram_integration.py @@ -0,0 +1,195 @@ +""" +Integration tests for Deepgram STT and TTS with LiveKit Agents. + +These tests verify that Deepgram's STT and TTS services work correctly +through the LiveKit plugins. Tests make real API calls to Deepgram. +""" + +import os +import sys +import asyncio +import aiohttp + +# Check for required credentials +DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY") + +if not DEEPGRAM_API_KEY: + print("DEEPGRAM_API_KEY not set - skipping tests") + sys.exit(2) + + +def test_deepgram_stt_initialization(): + """Test that Deepgram STT can be initialized with the LiveKit plugin.""" + from livekit.plugins import deepgram + + stt = deepgram.STT( + model="nova-3", + language="en-US", + punctuate=True, + smart_format=True, + api_key=DEEPGRAM_API_KEY, + ) + + assert stt is not None + print("✓ Deepgram STT initialization successful") + + +def test_deepgram_tts_initialization(): + """Test that Deepgram TTS can be initialized with the LiveKit plugin.""" + from livekit.plugins import deepgram + + tts = deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + api_key=DEEPGRAM_API_KEY, + ) + + assert tts is not None + print("✓ Deepgram TTS initialization successful") + + +async def test_deepgram_tts_synthesis(): + """Test that Deepgram TTS can synthesize speech (real API call).""" + from livekit.plugins import deepgram + + # Create our own aiohttp session + async with aiohttp.ClientSession() as session: + tts = deepgram.TTS( + model="aura-2-andromeda-en", + sample_rate=24000, + api_key=DEEPGRAM_API_KEY, + http_session=session, + ) + + # Synthesize a short phrase + text = "Hello, this is a test of Deepgram text to speech." + + # Get the synthesis stream + synthesis = tts.synthesize(text) + + # Collect audio chunks + audio_chunks = [] + async for event in synthesis: + if hasattr(event, 'frame') and event.frame is not None: + audio_chunks.append(event.frame.data) + + # Verify we got audio data + total_bytes = sum(len(chunk) for chunk in audio_chunks) + assert total_bytes > 0, "Expected audio data from TTS synthesis" + + print(f"✓ Deepgram TTS synthesis successful ({total_bytes} bytes of audio)") + + +async def test_deepgram_tts_direct_api(): + """Test Deepgram TTS via direct API call to verify credentials work.""" + async with aiohttp.ClientSession() as session: + url = "https://api.deepgram.com/v1/speak?model=aura-2-andromeda-en" + headers = { + "Authorization": f"Token {DEEPGRAM_API_KEY}", + "Content-Type": "application/json", + } + data = {"text": "Hello, this is a test."} + + async with session.post(url, headers=headers, json=data) as response: + assert response.status == 200, f"Expected 200, got {response.status}" + audio_data = await response.read() + assert len(audio_data) > 0, "Expected audio data" + print(f"✓ Deepgram TTS API direct call successful ({len(audio_data)} bytes)") + + +def test_agent_creation(): + """Test that an Agent can be created with Deepgram STT/TTS.""" + from livekit.agents import Agent + from livekit.plugins import deepgram, openai + + # Check if OpenAI key is available + if not os.getenv("OPENAI_API_KEY"): + print("⚠ OPENAI_API_KEY not set - using mock LLM configuration") + # Create agent with just STT/TTS (no LLM) + agent = Agent( + instructions="You are a helpful assistant.", + stt=deepgram.STT(model="nova-3", api_key=DEEPGRAM_API_KEY), + tts=deepgram.TTS(model="aura-2-andromeda-en", api_key=DEEPGRAM_API_KEY), + ) + else: + agent = Agent( + instructions="You are a helpful assistant.", + stt=deepgram.STT(model="nova-3", api_key=DEEPGRAM_API_KEY), + tts=deepgram.TTS(model="aura-2-andromeda-en", api_key=DEEPGRAM_API_KEY), + llm=openai.LLM(model="gpt-4o-mini"), + ) + + assert agent is not None + print("✓ Agent creation with Deepgram STT/TTS successful") + + +def test_stt_model_options(): + """Test various STT model configuration options.""" + from livekit.plugins import deepgram + + # Test with different model options + configs = [ + {"model": "nova-3", "language": "en-US", "api_key": DEEPGRAM_API_KEY}, + {"model": "nova-3", "language": "en-US", "smart_format": True, "api_key": DEEPGRAM_API_KEY}, + {"model": "nova-3", "language": "en-US", "punctuate": True, "filler_words": True, "api_key": DEEPGRAM_API_KEY}, + ] + + for config in configs: + stt = deepgram.STT(**config) + assert stt is not None + + print("✓ STT model configurations validated") + + +def test_tts_voice_options(): + """Test various TTS voice/model options.""" + from livekit.plugins import deepgram + + # Test different voice models + voices = [ + "aura-2-andromeda-en", + "aura-2-helios-en", + "aura-2-luna-en", + ] + + for voice in voices: + tts = deepgram.TTS(model=voice, sample_rate=24000, api_key=DEEPGRAM_API_KEY) + assert tts is not None + + print("✓ TTS voice configurations validated") + + +def run_tests(): + """Run all tests.""" + print("=" * 60) + print("Running Deepgram + LiveKit Agents Integration Tests") + print("=" * 60) + print() + + # Sync tests + test_deepgram_stt_initialization() + test_deepgram_tts_initialization() + test_agent_creation() + test_stt_model_options() + test_tts_voice_options() + + # Async tests + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + + try: + # Test direct API call first (simpler, verifies credentials) + loop.run_until_complete(test_deepgram_tts_direct_api()) + # Then test through LiveKit plugin + loop.run_until_complete(test_deepgram_tts_synthesis()) + finally: + loop.close() + + print() + print("=" * 60) + print("All tests passed! ✓") + print("=" * 60) + + +if __name__ == "__main__": + run_tests() diff --git a/examples/540-livekit-agents-python/tests/test_tools.py b/examples/540-livekit-agents-python/tests/test_tools.py new file mode 100644 index 0000000..8a83697 --- /dev/null +++ b/examples/540-livekit-agents-python/tests/test_tools.py @@ -0,0 +1,142 @@ +""" +Unit tests for the custom function tools. + +These tests verify that the function tools work correctly +before being used in the voice agent. +""" + +import os +import sys +from datetime import datetime, timezone + +# Add src to path +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src')) + + +def test_get_current_time(): + """Test the get_current_time tool.""" + from livekit.agents import llm + + @llm.function_tool + def get_current_time(tz: str = "UTC") -> str: + """Get the current time in a specified timezone.""" + current = datetime.now(timezone.utc) + return f"The current time in {tz} is {current.strftime('%I:%M %p on %B %d, %Y')}" + + # Test with default timezone + result = get_current_time() + assert "UTC" in result + assert "The current time" in result + print("✓ get_current_time tool works correctly") + + +def test_calculate(): + """Test the calculate tool.""" + from livekit.agents import llm + + @llm.function_tool + def calculate(expression: str) -> str: + """Evaluate a simple mathematical expression.""" + try: + allowed = set("0123456789+-*/().% ") + if not all(c in allowed for c in expression): + return "Sorry, I can only handle basic math with numbers and operators (+, -, *, /, %)" + result = eval(expression) + return f"The result of {expression} is {result}" + except Exception as e: + return f"I couldn't calculate that: {str(e)}" + + # Test basic operations + result = calculate("2 + 2") + assert "4" in result + + result = calculate("10 * 5") + assert "50" in result + + result = calculate("100 / 4") + assert "25" in result + + # Test invalid input + result = calculate("rm -rf /") + assert "Sorry" in result or "basic math" in result + + print("✓ calculate tool works correctly") + + +def test_get_weather(): + """Test the get_weather tool.""" + from livekit.agents import llm + + @llm.function_tool + def get_weather(city: str) -> str: + """Get the current weather for a city (mock implementation).""" + mock_weather = { + "san francisco": "65°F, partly cloudy with a chance of fog", + "new york": "72°F, sunny with light breeze", + "london": "58°F, overcast with light rain", + "tokyo": "78°F, clear skies", + } + + city_lower = city.lower() + if city_lower in mock_weather: + return f"The weather in {city} is currently {mock_weather[city_lower]}" + else: + return f"I don't have weather data for {city}, but it's probably lovely there!" + + # Test known cities + result = get_weather("San Francisco") + assert "65°F" in result + + result = get_weather("New York") + assert "72°F" in result + + # Test unknown city + result = get_weather("Unknown City") + assert "lovely" in result + + print("✓ get_weather tool works correctly") + + +def test_tool_decorator(): + """Test that the function_tool decorator works correctly.""" + from livekit.agents import llm + from typing import Annotated + + @llm.function_tool + def sample_tool( + name: Annotated[str, "The user's name"], + greeting: Annotated[str, "A greeting to use"] = "Hello" + ) -> str: + """A sample tool for testing.""" + return f"{greeting}, {name}!" + + # Call the tool + result = sample_tool("World") + assert "World" in result + + result = sample_tool("Developer", "Hi") + assert "Hi" in result and "Developer" in result + + print("✓ function_tool decorator works correctly") + + +def run_tests(): + """Run all unit tests.""" + print("=" * 60) + print("Running Function Tools Unit Tests") + print("=" * 60) + print() + + test_get_current_time() + test_calculate() + test_get_weather() + test_tool_decorator() + + print() + print("=" * 60) + print("All unit tests passed! ✓") + print("=" * 60) + + +if __name__ == "__main__": + run_tests()