A cutting-edge, low-latency voice AI system integrating WhatsApp Business calls with the LiveKit Multimodal Agent Framework. Bridging the gap between traditional telephony and state-of-the-art Generative Voice AI.
This project implements a sophisticated Telephony-to-AI Gateway. It allows users to call a WhatsApp Business number and interact with a high-performance AI agent in real-time.
- Incoming Call: Meta triggers a Webhook on our FastAPI Server (
whatsapp_server.py). - WebRTC Signaling: The server performs SDP (Session Description Protocol) negotiation between Meta and LiveKit.
- Room Orchestration: A unique LiveKit room is created and the caller is added as a participant.
- Agent Dispatch: The LiveKit Agent Worker (
agent.py) is dispatched to the room. - Multimodal Interaction: The agent uses Google's Realtime Multimodal Model to listen, reason, and speak back to the caller with ultra-low latency.
- Real-Time Multimodal AI: Powered by Google's latest voice models for natural, human-like conversations.
- Advanced Audio Processing: Integrated BVC Noise Cancellation to ensure clarity even in noisy environments.
- Meta Webhook Security: Full implementation of HMAC-SHA256 signature verification to protect against unauthorized requests.
- Full Call Lifecycle Management: Automatic room creation, participant signaling, and automated teardown to optimize resource usage.
- Outbound Calling Capability: Dedicated endpoint to initiate AI-driven calls directly to customers.
- Platform: LiveKit (Cloud or Self-Hosted).
- Backend: FastAPI, Python 3.10+, Node.js (for token generation).
- AI Models: Google Gemini Multimodal (Voice: Aoede).
- Integration: Meta Graph API (WhatsApp Business SDK).
- Signaling: WebRTC (SDP Offer/Answer), HTTPX for async communication.
The system requires deep integration with Meta and LiveKit:
# Meta / WhatsApp
META_ACCESS_TOKEN="your_token"
META_PHONE_NUMBER_ID="your_id"
META_APP_SECRET="your_secret"
META_WEBHOOK_VERIFY_TOKEN="your_verify_token"
# LiveKit
LIVEKIT_URL="wss://your-project.livekit.cloud"
LIVEKIT_API_KEY="your_key"
LIVEKIT_API_SECRET="your_secret"
# LLM
GOOGLE_API_KEY="your_google_key"# Setup Python Environment
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
# Setup Token Generator (Optional)
npm installTo run the complete system, you need to execute two components:
The worker listens for rooms that need an AI assistant.
python agent.py devThe server handles incoming webhooks and dispatches the workers.
uvicorn whatsapp_server:app --reload --port 8000- Latency: Sub-second response times using WebRTC.
- Scalability: LiveKit's JobContext allows for thousands of concurrent voice sessions.
- Integrity: Every Meta request is validated using SHA256 hmac signatures before processing.