Skip to content

MABSSSSS/livekit

Repository files navigation

LiveKit Voice AI: WhatsApp Real-Time Integration

A cutting-edge, low-latency voice AI system integrating WhatsApp Business calls with the LiveKit Multimodal Agent Framework. Bridging the gap between traditional telephony and state-of-the-art Generative Voice AI.


System Architecture: The Voice Bridge

This project implements a sophisticated Telephony-to-AI Gateway. It allows users to call a WhatsApp Business number and interact with a high-performance AI agent in real-time.

The Flow:

  1. Incoming Call: Meta triggers a Webhook on our FastAPI Server (whatsapp_server.py).
  2. WebRTC Signaling: The server performs SDP (Session Description Protocol) negotiation between Meta and LiveKit.
  3. Room Orchestration: A unique LiveKit room is created and the caller is added as a participant.
  4. Agent Dispatch: The LiveKit Agent Worker (agent.py) is dispatched to the room.
  5. Multimodal Interaction: The agent uses Google's Realtime Multimodal Model to listen, reason, and speak back to the caller with ultra-low latency.

Key Technical Features

  • Real-Time Multimodal AI: Powered by Google's latest voice models for natural, human-like conversations.
  • Advanced Audio Processing: Integrated BVC Noise Cancellation to ensure clarity even in noisy environments.
  • Meta Webhook Security: Full implementation of HMAC-SHA256 signature verification to protect against unauthorized requests.
  • Full Call Lifecycle Management: Automatic room creation, participant signaling, and automated teardown to optimize resource usage.
  • Outbound Calling Capability: Dedicated endpoint to initiate AI-driven calls directly to customers.

🛠️ Technical Stack

  • Platform: LiveKit (Cloud or Self-Hosted).
  • Backend: FastAPI, Python 3.10+, Node.js (for token generation).
  • AI Models: Google Gemini Multimodal (Voice: Aoede).
  • Integration: Meta Graph API (WhatsApp Business SDK).
  • Signaling: WebRTC (SDP Offer/Answer), HTTPX for async communication.

⚙️ Setup & Configuration

1. Environment Variables (.env.local)

The system requires deep integration with Meta and LiveKit:

# Meta / WhatsApp
META_ACCESS_TOKEN="your_token"
META_PHONE_NUMBER_ID="your_id"
META_APP_SECRET="your_secret"
META_WEBHOOK_VERIFY_TOKEN="your_verify_token"

# LiveKit
LIVEKIT_URL="wss://your-project.livekit.cloud"
LIVEKIT_API_KEY="your_key"
LIVEKIT_API_SECRET="your_secret"

# LLM
GOOGLE_API_KEY="your_google_key"

2. Installation

# Setup Python Environment
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

# Setup Token Generator (Optional)
npm install

Execution Guide

To run the complete system, you need to execute two components:

Step 1: Start the LiveKit Agent Worker

The worker listens for rooms that need an AI assistant.

python agent.py dev

Step 2: Start the WhatsApp Integration Server

The server handles incoming webhooks and dispatches the workers.

uvicorn whatsapp_server:app --reload --port 8000

Performance & Security

  • Latency: Sub-second response times using WebRTC.
  • Scalability: LiveKit's JobContext allows for thousands of concurrent voice sessions.
  • Integrity: Every Meta request is validated using SHA256 hmac signatures before processing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors