LiveKit AI Agent Framework

A comprehensive Python framework for building AI-powered agents using the LiveKit platform. Includes both modern voice agents and video processing capabilities.

🎙️ Modern Voice Agent (Recommended)

The new agent.py provides a production-ready voice AI assistant using the latest LiveKit Agents framework.

Key Features:

Real-time voice conversations with AI
Support for multiple STT/LLM/TTS providers via LiveKit Inference
Background noise cancellation
Multilingual turn detection
Function calling support for custom tools
Metrics and logging
Production-ready architecture

Quick Start:

# Install dependencies
pip install -r requirements.txt

# Download required models
python agent.py download-files

# Configure credentials in .env
cp .env.example .env

# Run in console mode (test locally)
python agent.py console

# Run in dev mode (connect to frontend)
python agent.py dev

See the Voice Agent section below for detailed instructions.

🎥 Video Agent Framework (Legacy)

The original video agent framework (video_agent.py) provides advanced video processing capabilities.

Features:

Real-time video and audio streaming with LiveKit
Voice assistant integration with OpenAI LLM and Deepgram STT/TTS
Custom video frame processing pipeline
Extensible architecture for adding custom processors
Event-driven participant and track management
Data channel messaging support
Full async/await support

Prerequisites

Python 3.9 or higher
LiveKit server (local or cloud instance)
API keys for:
- LiveKit (API Key and Secret)
- OpenAI (for LLM capabilities)
- Deepgram (for speech-to-text and text-to-speech)

Installation

Clone or download this repository
Install dependencies:

pip install -r requirements.txt

Set up your environment variables:

cp .env.example .env

Edit .env file with your credentials:

LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key

Voice Agent Usage

Installation and Setup

Install Dependencies
```
pip install -r requirements.txt
```
Download Required Models

Before first run, download the VAD and turn detection models:
```
python agent.py download-files
```
Get LiveKit Credentials

Sign up for free at LiveKit Cloud and create a project to get:
- LIVEKIT_URL (e.g., wss://your-project.livekit.cloud)
- LIVEKIT_API_KEY
- LIVEKIT_API_SECRET

Configure Environment

cp .env.example .env
# Edit .env with your credentials

Running the Voice Agent

Console Mode - Test directly in your terminal:

python agent.py console

Development Mode - Connect to a frontend application:

python agent.py dev

Production Mode - Deploy for production:

python agent.py start

Connecting a Frontend

The voice agent works with any LiveKit-compatible frontend:

Platform	Repository	Description
React/Next.js	agent-starter-react	Web voice AI assistant
iOS/macOS	agent-starter-swift	Native iOS/macOS app
Flutter	agent-starter-flutter	Cross-platform mobile
React Native	voice-assistant-react-native	React Native app
Android	agent-starter-android	Native Android app

Customization Options

Change Agent Personality: Edit the instructions in agent.py:

class VoiceAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful assistant that..."
        )

Change Voice Models: Modify the session configuration in entrypoint():

session = AgentSession(
    stt=inference.STT(model="assemblyai/universal-streaming", language="en"),
    llm=inference.LLM(model="openai/gpt-4o-mini"),
    tts=inference.TTS(model="cartesia/sonic-3", voice="your-voice-id"),
)

Available options:

Add Function Tools: Give your agent custom capabilities:

from livekit.agents import function_tool, RunContext

class VoiceAssistant(Agent):
    @function_tool
    async def get_weather(self, context: RunContext, location: str):
        """Look up weather information.

        Args:
            location: City name
        """
        return f"The weather in {location} is sunny."

Video Agent (Legacy)

Quick Start

Basic Video Agent

Run a basic video agent that connects to a room and processes video:

python example_basic.py

Voice-Enabled Agent

Run an agent with voice assistant capabilities:

python example_voice.py

Custom Processing

Run an agent with custom video processing:

python example_custom.py

Using LiveKit Agents Worker

Run as a LiveKit Agents worker (production mode):

python video_agent.py dev

For production deployment:

python video_agent.py start

Architecture

Core Components

LiveKitVideoAgent: Base class for video agents
- Room connection management
- Track subscription/unsubscription
- Video/audio processing
- Data channel messaging
VoiceEnabledVideoAgent: Extended agent with voice capabilities
- Voice assistant integration
- Speech-to-text (STT)
- Text-to-speech (TTS)
- LLM-powered conversations
AgentConfig: Configuration dataclass
- LiveKit connection settings
- API credentials
- Feature flags

Key Methods

Connection Management

await agent.connect(room_name="my-room", participant_identity="agent-1")
await agent.disconnect()

Video Processing

# Register a custom video processor
async def my_processor(frame, participant):
    # Process frame
    pass

agent.register_video_processor("my-processor", my_processor)

Publishing Tracks

# Publish video
await agent.publish_video_track(video_source)

# Publish audio
await agent.publish_audio_track(audio_source)

Voice Assistant

await agent.start_voice_assistant(
    initial_prompt="You are a helpful assistant"
)

Usage Examples

Example 1: Simple Video Monitor

import asyncio
from video_agent import LiveKitVideoAgent, AgentConfig

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
        enable_video_processing=True,
    )

    agent = LiveKitVideoAgent(config)
    await agent.connect("my-room", "monitor-agent")

    # Agent will monitor video streams
    await asyncio.sleep(3600)  # Run for 1 hour

    await agent.disconnect()

asyncio.run(main())

Example 2: Interactive Voice Agent

import asyncio
from video_agent import VoiceEnabledVideoAgent, AgentConfig

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
        openai_api_key="openai-key",
        deepgram_api_key="deepgram-key",
        enable_voice_assistant=True,
    )

    agent = VoiceEnabledVideoAgent(config)
    await agent.connect("voice-room", "voice-agent")

    await agent.start_voice_assistant(
        initial_prompt="You are a meeting assistant"
    )

    await asyncio.sleep(float('inf'))

asyncio.run(main())

Example 3: Custom Video Analysis

import asyncio
import cv2
from video_agent import LiveKitVideoAgent, AgentConfig

async def face_detector(frame, participant):
    # Convert frame to OpenCV format
    # Apply face detection
    # Send results back
    pass

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
    )

    agent = LiveKitVideoAgent(config)
    agent.register_video_processor("face_detection", face_detector)

    await agent.connect("analysis-room", "analyzer")
    await asyncio.sleep(float('inf'))

asyncio.run(main())

Customization

Adding Custom Video Processors

Video processors are async functions that receive video frames:

async def my_custom_processor(frame, participant):
    """
    Args:
        frame: VideoFrame from LiveKit
        participant: RemoteParticipant who sent the frame
    """
    # Your processing logic here
    # Example: Run ML model, detect objects, etc.
    pass

agent.register_video_processor("custom", my_custom_processor)

Extending the Agent

Create your own agent class:

from video_agent import VoiceEnabledVideoAgent

class MyCustomAgent(VoiceEnabledVideoAgent):
    async def _process_video_track(self, track, participant):
        # Override with custom logic
        await super()._process_video_track(track, participant)

        # Add your custom processing
        pass

Deployment

Local Development

python video_agent.py dev

Production (Docker)

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "video_agent.py", "start"]

Build and run:

docker build -t livekit-video-agent .
docker run -d --env-file .env livekit-video-agent

Cloud Deployment

Deploy to cloud platforms (AWS, GCP, Azure) using container services or serverless functions.

Configuration Options

Variable	Description	Required	Default
LIVEKIT_URL	LiveKit server URL	Yes	-
LIVEKIT_API_KEY	LiveKit API key	Yes	-
LIVEKIT_API_SECRET	LiveKit API secret	Yes	-
OPENAI_API_KEY	OpenAI API key	No*	-
DEEPGRAM_API_KEY	Deepgram API key	No*	-
AGENT_NAME	Agent display name	No	"Video AI Agent"
ENABLE_VIDEO_PROCESSING	Enable video processing	No	true
ENABLE_VOICE_ASSISTANT	Enable voice features	No	true

* Required if voice assistant is enabled

Troubleshooting

Connection Issues

Verify LiveKit server is running and accessible
Check API credentials are correct
Ensure firewall allows WebSocket connections

Voice Assistant Issues

Verify OpenAI and Deepgram API keys
Check API quota and limits
Review logs for specific errors

Video Processing Issues

Check that video tracks are being published
Verify permissions and subscriptions
Monitor CPU/memory usage

Advanced Topics

Multi-Agent Coordination

Run multiple agents in the same room:

agent1 = LiveKitVideoAgent(config)
agent2 = LiveKitVideoAgent(config)

await agent1.connect("room", "agent-1")
await agent2.connect("room", "agent-2")

Custom Data Messaging

# Send data to specific participants
await agent.send_data(
    data=b"custom message",
    participant_ids=["user-1", "user-2"]
)

Video Effects and Filters

Implement real-time video effects by processing frames and publishing modified video.

Performance Optimization

Use GPU acceleration for video processing (CUDA, OpenCL)
Implement frame skipping for heavy processing
Use connection pooling for API calls
Monitor and limit concurrent track subscriptions

Security Considerations

Never commit .env file with real credentials
Use environment variables for secrets
Implement rate limiting for API calls
Validate all user inputs
Use secure WebSocket connections (wss://)

Resources

License

MIT License - feel free to use and modify for your projects.

Support

For issues and questions:

LiveKit: https://livekit.io/support
This framework: Create an issue in the repository

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request with tests

Changelog

Version 1.0.0

Initial release
Basic video agent functionality
Voice assistant integration
Custom processor support

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.claude		.claude
__pycache__		__pycache__
mcp_client		mcp_client
.env.example		.env.example
.gitignore		.gitignore
AGENT_NAVIGATION_QUICKSTART.md		AGENT_NAVIGATION_QUICKSTART.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
agent.py		agent.py
agent_old.py		agent_old.py
docker-compose.yml		docker-compose.yml
livekit NAV.md		livekit NAV.md
prompts.py		prompts.py
requirements.txt		requirements.txt
restart_agent.bat		restart_agent.bat
server.py		server.py
test_navigation.md		test_navigation.md
tools.py		tools.py

Folders and files

Latest commit

History

Repository files navigation

LiveKit AI Agent Framework

🎙️ Modern Voice Agent (Recommended)

🎥 Video Agent Framework (Legacy)

Prerequisites

Installation

Voice Agent Usage

Installation and Setup

Running the Voice Agent

Connecting a Frontend

Customization Options

Video Agent (Legacy)

Quick Start

Basic Video Agent

Voice-Enabled Agent

Custom Processing

Using LiveKit Agents Worker

Architecture

Core Components

Key Methods

Connection Management

Video Processing

Publishing Tracks

Voice Assistant

Usage Examples

Example 1: Simple Video Monitor

Example 2: Interactive Voice Agent

Example 3: Custom Video Analysis

Customization

Adding Custom Video Processors

Extending the Agent

Deployment

Local Development

Production (Docker)

Cloud Deployment

Configuration Options

Troubleshooting

Connection Issues

Voice Assistant Issues

Video Processing Issues

Advanced Topics

Multi-Agent Coordination

Custom Data Messaging

Video Effects and Filters

Performance Optimization

Security Considerations

Resources

License

Support

Contributing

Changelog

Version 1.0.0

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages