Skip to content

ai-super-dev/Livekit_AI_Agent_Framework

Repository files navigation

LiveKit AI Agent Framework

A comprehensive Python framework for building AI-powered agents using the LiveKit platform. Includes both modern voice agents and video processing capabilities.

🎙️ Modern Voice Agent (Recommended)

The new agent.py provides a production-ready voice AI assistant using the latest LiveKit Agents framework.

Key Features:

  • Real-time voice conversations with AI
  • Support for multiple STT/LLM/TTS providers via LiveKit Inference
  • Background noise cancellation
  • Multilingual turn detection
  • Function calling support for custom tools
  • Metrics and logging
  • Production-ready architecture

Quick Start:

# Install dependencies
pip install -r requirements.txt

# Download required models
python agent.py download-files

# Configure credentials in .env
cp .env.example .env

# Run in console mode (test locally)
python agent.py console

# Run in dev mode (connect to frontend)
python agent.py dev

See the Voice Agent section below for detailed instructions.

🎥 Video Agent Framework (Legacy)

The original video agent framework (video_agent.py) provides advanced video processing capabilities.

Features:

  • Real-time video and audio streaming with LiveKit
  • Voice assistant integration with OpenAI LLM and Deepgram STT/TTS
  • Custom video frame processing pipeline
  • Extensible architecture for adding custom processors
  • Event-driven participant and track management
  • Data channel messaging support
  • Full async/await support

Prerequisites

  • Python 3.9 or higher
  • LiveKit server (local or cloud instance)
  • API keys for:
    • LiveKit (API Key and Secret)
    • OpenAI (for LLM capabilities)
    • Deepgram (for speech-to-text and text-to-speech)

Installation

  1. Clone or download this repository

  2. Install dependencies:

pip install -r requirements.txt
  1. Set up your environment variables:
cp .env.example .env
  1. Edit .env file with your credentials:
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key

Voice Agent Usage

Installation and Setup

  1. Install Dependencies

    pip install -r requirements.txt
  2. Download Required Models

    Before first run, download the VAD and turn detection models:

    python agent.py download-files
  3. Get LiveKit Credentials

    Sign up for free at LiveKit Cloud and create a project to get:

    • LIVEKIT_URL (e.g., wss://your-project.livekit.cloud)
    • LIVEKIT_API_KEY
    • LIVEKIT_API_SECRET
  4. Configure Environment

    cp .env.example .env
    # Edit .env with your credentials

Running the Voice Agent

Console Mode - Test directly in your terminal:

python agent.py console

Development Mode - Connect to a frontend application:

python agent.py dev

Production Mode - Deploy for production:

python agent.py start

Connecting a Frontend

The voice agent works with any LiveKit-compatible frontend:

Platform Repository Description
React/Next.js agent-starter-react Web voice AI assistant
iOS/macOS agent-starter-swift Native iOS/macOS app
Flutter agent-starter-flutter Cross-platform mobile
React Native voice-assistant-react-native React Native app
Android agent-starter-android Native Android app

Customization Options

Change Agent Personality: Edit the instructions in agent.py:

class VoiceAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful assistant that..."
        )

Change Voice Models: Modify the session configuration in entrypoint():

session = AgentSession(
    stt=inference.STT(model="assemblyai/universal-streaming", language="en"),
    llm=inference.LLM(model="openai/gpt-4o-mini"),
    tts=inference.TTS(model="cartesia/sonic-3", voice="your-voice-id"),
)

Available options:

Add Function Tools: Give your agent custom capabilities:

from livekit.agents import function_tool, RunContext

class VoiceAssistant(Agent):
    @function_tool
    async def get_weather(self, context: RunContext, location: str):
        """Look up weather information.

        Args:
            location: City name
        """
        return f"The weather in {location} is sunny."

Video Agent (Legacy)

Quick Start

Basic Video Agent

Run a basic video agent that connects to a room and processes video:

python example_basic.py

Voice-Enabled Agent

Run an agent with voice assistant capabilities:

python example_voice.py

Custom Processing

Run an agent with custom video processing:

python example_custom.py

Using LiveKit Agents Worker

Run as a LiveKit Agents worker (production mode):

python video_agent.py dev

For production deployment:

python video_agent.py start

Architecture

Core Components

  1. LiveKitVideoAgent: Base class for video agents

    • Room connection management
    • Track subscription/unsubscription
    • Video/audio processing
    • Data channel messaging
  2. VoiceEnabledVideoAgent: Extended agent with voice capabilities

    • Voice assistant integration
    • Speech-to-text (STT)
    • Text-to-speech (TTS)
    • LLM-powered conversations
  3. AgentConfig: Configuration dataclass

    • LiveKit connection settings
    • API credentials
    • Feature flags

Key Methods

Connection Management

await agent.connect(room_name="my-room", participant_identity="agent-1")
await agent.disconnect()

Video Processing

# Register a custom video processor
async def my_processor(frame, participant):
    # Process frame
    pass

agent.register_video_processor("my-processor", my_processor)

Publishing Tracks

# Publish video
await agent.publish_video_track(video_source)

# Publish audio
await agent.publish_audio_track(audio_source)

Voice Assistant

await agent.start_voice_assistant(
    initial_prompt="You are a helpful assistant"
)

Usage Examples

Example 1: Simple Video Monitor

import asyncio
from video_agent import LiveKitVideoAgent, AgentConfig

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
        enable_video_processing=True,
    )

    agent = LiveKitVideoAgent(config)
    await agent.connect("my-room", "monitor-agent")

    # Agent will monitor video streams
    await asyncio.sleep(3600)  # Run for 1 hour

    await agent.disconnect()

asyncio.run(main())

Example 2: Interactive Voice Agent

import asyncio
from video_agent import VoiceEnabledVideoAgent, AgentConfig

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
        openai_api_key="openai-key",
        deepgram_api_key="deepgram-key",
        enable_voice_assistant=True,
    )

    agent = VoiceEnabledVideoAgent(config)
    await agent.connect("voice-room", "voice-agent")

    await agent.start_voice_assistant(
        initial_prompt="You are a meeting assistant"
    )

    await asyncio.sleep(float('inf'))

asyncio.run(main())

Example 3: Custom Video Analysis

import asyncio
import cv2
from video_agent import LiveKitVideoAgent, AgentConfig

async def face_detector(frame, participant):
    # Convert frame to OpenCV format
    # Apply face detection
    # Send results back
    pass

async def main():
    config = AgentConfig(
        livekit_url="wss://your-server.com",
        livekit_api_key="key",
        livekit_api_secret="secret",
    )

    agent = LiveKitVideoAgent(config)
    agent.register_video_processor("face_detection", face_detector)

    await agent.connect("analysis-room", "analyzer")
    await asyncio.sleep(float('inf'))

asyncio.run(main())

Customization

Adding Custom Video Processors

Video processors are async functions that receive video frames:

async def my_custom_processor(frame, participant):
    """
    Args:
        frame: VideoFrame from LiveKit
        participant: RemoteParticipant who sent the frame
    """
    # Your processing logic here
    # Example: Run ML model, detect objects, etc.
    pass

agent.register_video_processor("custom", my_custom_processor)

Extending the Agent

Create your own agent class:

from video_agent import VoiceEnabledVideoAgent

class MyCustomAgent(VoiceEnabledVideoAgent):
    async def _process_video_track(self, track, participant):
        # Override with custom logic
        await super()._process_video_track(track, participant)

        # Add your custom processing
        pass

Deployment

Local Development

python video_agent.py dev

Production (Docker)

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "video_agent.py", "start"]

Build and run:

docker build -t livekit-video-agent .
docker run -d --env-file .env livekit-video-agent

Cloud Deployment

Deploy to cloud platforms (AWS, GCP, Azure) using container services or serverless functions.

Configuration Options

Variable Description Required Default
LIVEKIT_URL LiveKit server URL Yes -
LIVEKIT_API_KEY LiveKit API key Yes -
LIVEKIT_API_SECRET LiveKit API secret Yes -
OPENAI_API_KEY OpenAI API key No* -
DEEPGRAM_API_KEY Deepgram API key No* -
AGENT_NAME Agent display name No "Video AI Agent"
ENABLE_VIDEO_PROCESSING Enable video processing No true
ENABLE_VOICE_ASSISTANT Enable voice features No true

* Required if voice assistant is enabled

Troubleshooting

Connection Issues

  • Verify LiveKit server is running and accessible
  • Check API credentials are correct
  • Ensure firewall allows WebSocket connections

Voice Assistant Issues

  • Verify OpenAI and Deepgram API keys
  • Check API quota and limits
  • Review logs for specific errors

Video Processing Issues

  • Check that video tracks are being published
  • Verify permissions and subscriptions
  • Monitor CPU/memory usage

Advanced Topics

Multi-Agent Coordination

Run multiple agents in the same room:

agent1 = LiveKitVideoAgent(config)
agent2 = LiveKitVideoAgent(config)

await agent1.connect("room", "agent-1")
await agent2.connect("room", "agent-2")

Custom Data Messaging

# Send data to specific participants
await agent.send_data(
    data=b"custom message",
    participant_ids=["user-1", "user-2"]
)

Video Effects and Filters

Implement real-time video effects by processing frames and publishing modified video.

Performance Optimization

  • Use GPU acceleration for video processing (CUDA, OpenCL)
  • Implement frame skipping for heavy processing
  • Use connection pooling for API calls
  • Monitor and limit concurrent track subscriptions

Security Considerations

  • Never commit .env file with real credentials
  • Use environment variables for secrets
  • Implement rate limiting for API calls
  • Validate all user inputs
  • Use secure WebSocket connections (wss://)

Resources

License

MIT License - feel free to use and modify for your projects.

Support

For issues and questions:

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request with tests

Changelog

Version 1.0.0

  • Initial release
  • Basic video agent functionality
  • Voice assistant integration
  • Custom processor support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages