A comprehensive Python framework for building AI-powered agents using the LiveKit platform. Includes both modern voice agents and video processing capabilities.
The new agent.py provides a production-ready voice AI assistant using the latest LiveKit Agents framework.
Key Features:
- Real-time voice conversations with AI
- Support for multiple STT/LLM/TTS providers via LiveKit Inference
- Background noise cancellation
- Multilingual turn detection
- Function calling support for custom tools
- Metrics and logging
- Production-ready architecture
Quick Start:
# Install dependencies
pip install -r requirements.txt
# Download required models
python agent.py download-files
# Configure credentials in .env
cp .env.example .env
# Run in console mode (test locally)
python agent.py console
# Run in dev mode (connect to frontend)
python agent.py devSee the Voice Agent section below for detailed instructions.
The original video agent framework (video_agent.py) provides advanced video processing capabilities.
Features:
- Real-time video and audio streaming with LiveKit
- Voice assistant integration with OpenAI LLM and Deepgram STT/TTS
- Custom video frame processing pipeline
- Extensible architecture for adding custom processors
- Event-driven participant and track management
- Data channel messaging support
- Full async/await support
- Python 3.9 or higher
- LiveKit server (local or cloud instance)
- API keys for:
- LiveKit (API Key and Secret)
- OpenAI (for LLM capabilities)
- Deepgram (for speech-to-text and text-to-speech)
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt- Set up your environment variables:
cp .env.example .env- Edit
.envfile with your credentials:
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key-
Install Dependencies
pip install -r requirements.txt
-
Download Required Models
Before first run, download the VAD and turn detection models:
python agent.py download-files
-
Get LiveKit Credentials
Sign up for free at LiveKit Cloud and create a project to get:
- LIVEKIT_URL (e.g.,
wss://your-project.livekit.cloud) - LIVEKIT_API_KEY
- LIVEKIT_API_SECRET
- LIVEKIT_URL (e.g.,
-
Configure Environment
cp .env.example .env # Edit .env with your credentials
Console Mode - Test directly in your terminal:
python agent.py consoleDevelopment Mode - Connect to a frontend application:
python agent.py devProduction Mode - Deploy for production:
python agent.py startThe voice agent works with any LiveKit-compatible frontend:
| Platform | Repository | Description |
|---|---|---|
| React/Next.js | agent-starter-react | Web voice AI assistant |
| iOS/macOS | agent-starter-swift | Native iOS/macOS app |
| Flutter | agent-starter-flutter | Cross-platform mobile |
| React Native | voice-assistant-react-native | React Native app |
| Android | agent-starter-android | Native Android app |
Change Agent Personality:
Edit the instructions in agent.py:
class VoiceAssistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="You are a helpful assistant that..."
)Change Voice Models:
Modify the session configuration in entrypoint():
session = AgentSession(
stt=inference.STT(model="assemblyai/universal-streaming", language="en"),
llm=inference.LLM(model="openai/gpt-4o-mini"),
tts=inference.TTS(model="cartesia/sonic-3", voice="your-voice-id"),
)Available options:
Add Function Tools: Give your agent custom capabilities:
from livekit.agents import function_tool, RunContext
class VoiceAssistant(Agent):
@function_tool
async def get_weather(self, context: RunContext, location: str):
"""Look up weather information.
Args:
location: City name
"""
return f"The weather in {location} is sunny."Run a basic video agent that connects to a room and processes video:
python example_basic.pyRun an agent with voice assistant capabilities:
python example_voice.pyRun an agent with custom video processing:
python example_custom.pyRun as a LiveKit Agents worker (production mode):
python video_agent.py devFor production deployment:
python video_agent.py start-
LiveKitVideoAgent: Base class for video agents
- Room connection management
- Track subscription/unsubscription
- Video/audio processing
- Data channel messaging
-
VoiceEnabledVideoAgent: Extended agent with voice capabilities
- Voice assistant integration
- Speech-to-text (STT)
- Text-to-speech (TTS)
- LLM-powered conversations
-
AgentConfig: Configuration dataclass
- LiveKit connection settings
- API credentials
- Feature flags
await agent.connect(room_name="my-room", participant_identity="agent-1")
await agent.disconnect()# Register a custom video processor
async def my_processor(frame, participant):
# Process frame
pass
agent.register_video_processor("my-processor", my_processor)# Publish video
await agent.publish_video_track(video_source)
# Publish audio
await agent.publish_audio_track(audio_source)await agent.start_voice_assistant(
initial_prompt="You are a helpful assistant"
)import asyncio
from video_agent import LiveKitVideoAgent, AgentConfig
async def main():
config = AgentConfig(
livekit_url="wss://your-server.com",
livekit_api_key="key",
livekit_api_secret="secret",
enable_video_processing=True,
)
agent = LiveKitVideoAgent(config)
await agent.connect("my-room", "monitor-agent")
# Agent will monitor video streams
await asyncio.sleep(3600) # Run for 1 hour
await agent.disconnect()
asyncio.run(main())import asyncio
from video_agent import VoiceEnabledVideoAgent, AgentConfig
async def main():
config = AgentConfig(
livekit_url="wss://your-server.com",
livekit_api_key="key",
livekit_api_secret="secret",
openai_api_key="openai-key",
deepgram_api_key="deepgram-key",
enable_voice_assistant=True,
)
agent = VoiceEnabledVideoAgent(config)
await agent.connect("voice-room", "voice-agent")
await agent.start_voice_assistant(
initial_prompt="You are a meeting assistant"
)
await asyncio.sleep(float('inf'))
asyncio.run(main())import asyncio
import cv2
from video_agent import LiveKitVideoAgent, AgentConfig
async def face_detector(frame, participant):
# Convert frame to OpenCV format
# Apply face detection
# Send results back
pass
async def main():
config = AgentConfig(
livekit_url="wss://your-server.com",
livekit_api_key="key",
livekit_api_secret="secret",
)
agent = LiveKitVideoAgent(config)
agent.register_video_processor("face_detection", face_detector)
await agent.connect("analysis-room", "analyzer")
await asyncio.sleep(float('inf'))
asyncio.run(main())Video processors are async functions that receive video frames:
async def my_custom_processor(frame, participant):
"""
Args:
frame: VideoFrame from LiveKit
participant: RemoteParticipant who sent the frame
"""
# Your processing logic here
# Example: Run ML model, detect objects, etc.
pass
agent.register_video_processor("custom", my_custom_processor)Create your own agent class:
from video_agent import VoiceEnabledVideoAgent
class MyCustomAgent(VoiceEnabledVideoAgent):
async def _process_video_track(self, track, participant):
# Override with custom logic
await super()._process_video_track(track, participant)
# Add your custom processing
passpython video_agent.py devCreate a Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "video_agent.py", "start"]Build and run:
docker build -t livekit-video-agent .
docker run -d --env-file .env livekit-video-agentDeploy to cloud platforms (AWS, GCP, Azure) using container services or serverless functions.
| Variable | Description | Required | Default |
|---|---|---|---|
| LIVEKIT_URL | LiveKit server URL | Yes | - |
| LIVEKIT_API_KEY | LiveKit API key | Yes | - |
| LIVEKIT_API_SECRET | LiveKit API secret | Yes | - |
| OPENAI_API_KEY | OpenAI API key | No* | - |
| DEEPGRAM_API_KEY | Deepgram API key | No* | - |
| AGENT_NAME | Agent display name | No | "Video AI Agent" |
| ENABLE_VIDEO_PROCESSING | Enable video processing | No | true |
| ENABLE_VOICE_ASSISTANT | Enable voice features | No | true |
* Required if voice assistant is enabled
- Verify LiveKit server is running and accessible
- Check API credentials are correct
- Ensure firewall allows WebSocket connections
- Verify OpenAI and Deepgram API keys
- Check API quota and limits
- Review logs for specific errors
- Check that video tracks are being published
- Verify permissions and subscriptions
- Monitor CPU/memory usage
Run multiple agents in the same room:
agent1 = LiveKitVideoAgent(config)
agent2 = LiveKitVideoAgent(config)
await agent1.connect("room", "agent-1")
await agent2.connect("room", "agent-2")# Send data to specific participants
await agent.send_data(
data=b"custom message",
participant_ids=["user-1", "user-2"]
)Implement real-time video effects by processing frames and publishing modified video.
- Use GPU acceleration for video processing (CUDA, OpenCL)
- Implement frame skipping for heavy processing
- Use connection pooling for API calls
- Monitor and limit concurrent track subscriptions
- Never commit
.envfile with real credentials - Use environment variables for secrets
- Implement rate limiting for API calls
- Validate all user inputs
- Use secure WebSocket connections (wss://)
MIT License - feel free to use and modify for your projects.
For issues and questions:
- LiveKit: https://livekit.io/support
- This framework: Create an issue in the repository
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request with tests
- Initial release
- Basic video agent functionality
- Voice assistant integration
- Custom processor support