A production-ready voice agent implementation using LiveKit and Python, featuring advanced conversational AI capabilities and optional telephony integration.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ LiveKit Voice Agent Architecture │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Web Client │ │ Phone System │ │ Mobile App │
│ (Next.js) │ │ (Twilio) │ │ (React) │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
┌────────────▼────────────┐
│ LiveKit Server │
│ (WebRTC Gateway) │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Voice Pipeline Agent │
│ │
│ ┌─────────────────┐ │
│ │ Turn Detection │ │
│ │ (Silero) │ │
│ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ Audio Pipeline │ │
│ │ ┌─────────────┐ │ │
│ │ │ Krisp │ │ │
│ │ │ (Noise │ │ │
│ │ │ Cancel) │ │ │
│ │ └─────────────┘ │ │
│ └─────────────────┘ │
└────────────┬────────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Speech-to- │ │ Language │ │ Text-to- │
│ Text (STT) │ │ Model (LLM) │ │ Speech (TTS) │
│ │ │ │ │ │
│ Deepgram API │ │ OpenAI API │ │ ElevenLabs │
│ │ │ │ │ API │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ ┌─────────▼────────┐ │
│ │ Function Calling │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Weather │ │ │
│ │ │ Service │ │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Clock │ │ │
│ │ │ Service │ │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Custom │ │ │
│ │ │ Tools │ │ │
│ │ └──────────────┘ │ │
│ └──────────────────┘ │
│ │
└───────────────────┐ ┌─────────────────────┘
│ │
┌──────▼─────▼──────┐
│ Logging & │
│ Analytics │
│ │
│ • Usage Metrics │
│ • Conversation │
│ Summaries │
│ • Performance │
│ Monitoring │
└───────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Data Flow Process │
└─────────────────────────────────────────────────────────────────────────────────┘
1. Audio Input → 2. Noise Cancellation → 3. Speech Detection → 4. STT Processing
↓
8. Audio Output ← 7. TTS Generation ← 6. Response Generation ← 5. LLM Processing
↓
Function Execution
(Weather, Clock, etc.)
┌─────────────────────────────────────────────────────────────────────────────────┐
│ Telephony Integration │
└─────────────────────────────────────────────────────────────────────────────────┘
Phone Call → Twilio SIP → LiveKit SIP Gateway → Voice Agent → Response Pipeline
↑ ↓
└──────────────── Audio Response ←─────────────────────────────┘
Regional SIP Configuration:
• US East: 54.172.60.0, 54.244.51.0
• US West: 54.171.127.192, 35.156.191.128
• Europe: 54.171.127.200, 35.156.191.140
• Asia Pacific: 54.169.127.128, 52.65.191.64
- Users connect via web browsers, mobile apps, or phone calls
- LiveKit server handles WebRTC connections and SIP integration
- Agent automatically detects connection type and optimizes accordingly
- Input: Raw audio from user's microphone or phone
- Noise Cancellation: Krisp AI removes background noise
- Turn Detection: Silero VAD detects when user starts/stops speaking
- Speech-to-Text: Deepgram converts speech to text in real-time
- Language Understanding: OpenAI processes user intent
- Function Calling: Agent can execute tools (weather, time, custom functions)
- Context Management: Maintains conversation history and state
- Text Generation: LLM creates appropriate responses
- Text-to-Speech: ElevenLabs converts text to natural speech
- Audio Delivery: Processed audio sent back to user
- Real-time performance metrics
- Conversation logging and summaries
- Usage analytics and optimization insights
- Intelligent Turn Detection - Natural conversation flow with automatic speech detection
- Function Calling - Extensible tool integration including:
- Weather information retrieval
- Real-time clock functionality
- Comprehensive Logging - Usage analytics and conversation summaries
- Telephony Integration - Inbound call support via Twilio SIP trunking
- Audio Enhancement - Krisp noise cancellation for crystal-clear communication
- Optimized Models - Automatic model switching for telephony vs. web-based interactions
- Python 3.8 or higher
- LiveKit Cloud account or self-hosted LiveKit server
- API keys for required services (OpenAI, ElevenLabs, Deepgram)
- Optional: Twilio account for telephony features
- Clone and navigate to the repository:
git clone https://github.com/danieladdisonorg/livekit-voice-agent.git
cd livekit-voice-agent
- Set up Python environment:
Linux/macOS:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 agent.py download-files
Windows:
python3 -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python3 agent.py download-files
- Environment Setup: Copy the example environment file and configure your API credentials:
cp .env.example .env.local
-
Required Environment Variables:
LIVEKIT_URL=your_livekit_server_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret OPENAI_API_KEY=your_openai_key ELEVEN_API_KEY=your_elevenlabs_key DEEPGRAM_API_KEY=your_deepgram_key
-
Automated Configuration (Optional): If using LiveKit Cloud, you can auto-configure using the CLI:
lk app env
Start the agent in development mode:
python3 agent.py dev
This agent requires a compatible frontend application. We recommend using the LiveKit Next.js Voice Agent Interface for a complete solution.
Enable inbound phone calls through Twilio SIP integration.
- LiveKit CLI installed and authenticated
- Twilio account with phone number
- SIP trunk configuration
- Install LiveKit CLI (macOS):
brew update && brew install livekit-cli
- Authenticate with LiveKit Cloud:
lk cloud auth
-
Create Twilio Resources:
- Sign up for a Twilio account
- Purchase a phone number
- Create a new SIP trunk in the Twilio Console
-
Configure SIP Trunk:
- Navigate to: Elastic SIP Trunking → SIP Trunks → Create
- Add Origination URI:
<YOUR_LIVEKIT_SIP_URI>;transport=tcp
- Associate your phone number with priority 1, weight 1
-
Deploy LiveKit SIP Configuration:
Create Inbound Trunk:
lk sip inbound create inbound-trunk.json
Create Dispatch Rule:
lk sip dispatch create dispatch-rule.json
Update inbound-trunk.json
with appropriate Twilio SIP signaling IP addresses for your region. The default configuration includes US IP addresses.
- Agent Core - Main conversation logic and state management
- Function Registry - Extensible tool calling system
- Audio Pipeline - Real-time audio processing with noise cancellation
- SIP Integration - Telephony gateway for inbound calls
- Logging System - Comprehensive usage and performance analytics
For issues and questions:
- Check the LiveKit Documentation
- Review existing GitHub issues
- Contact support through your LiveKit Cloud dashboard