TwilioVoiceBot

Project Overview

A voice-based AI bot using Twilio for communication, with a Vue.js frontend and Python backend. Based on https://www.twilio.com/en-us/blog/outbound-calls-python-openai-realtime-api-voice. More details at DeepWiki

Project Structure

Root Directory

_server.py: Main server-side Python script
_client.py: Alternative console-based client for batch processing phone calls. Useful for testing and development. Features include:
- Excel/CSV file processing for bulk phone numbers
- Phone number validation using phonenumbers library
- Organized call results storage
- Color-coded console output for better readability
- Integration with the main bot server for making calls
audio_processing.py: Comprehensive audio processing utilities:
- Stereo audio recording with preserved timing (need to be improved)
- Audio format conversion (µ-law to WAV)(by default, Twilio uses g711, a audio enconding for audio in VoIP)
- In this file, audio filters can be implemented. Currently, the filter implemented is a volume reduction, to reduce the current VAD server sensibility to noise, however, other more sophisticated filters can be impemented, however OpenAI VAD server will probably be improved in the future, making the necessity of filters less important. Also, take on count the the filter must be implemented in a real-time manner, making most of the popular denoising algorithms more complex to implement.
bot_initialization.py: Handles bot setup and session management:
- Dynamic tool loading from YAML configuration
- Voice and system message configuration
- OpenAI voice model configuration. Includes:
  - Turn detection with server-side Voice Activity Detection (VAD)
    - Threshold: 0.9 for speech detection
    - 300ms prefix padding for smooth transitions
    - 800ms silence duration for turn completion
  - Audio format: g711 µ-law for input/output (VoIP standard)
  - Voice selection (configurable via .env)
  - Multi-modal support: text and audio
  - Temperature: 0.6 for balanced response creativity (minimum is 0.6)
  - Dynamic tool integration (currently the only implemented tool is the end_call, used by the model to end the conversation. A data extraction tool is also implemented, used by the model to extract information from the conversation, however, the current tested model is a bit dumb, and often does not understand what the user said, so and independent data extraction moelule with a combination of transcription and structured data estraction was implemented, and it is applied over the recoreded audio file).
- Initial conversation script loading and execution
- Modular tool configuration system
info_extraction.py: Temporal information extraction system:
- Integration with OpenAI's API for transcription and analysis
- Speaker diarization using Pyannote Audio (this is to identify what speaker is speaking at any given moment, making easier for the model to correctly interpret the user input)
- Structured data extraction using Pydantic models (modifyng the pydantic models, allows to modify which information is extracted from the audio file. pydantic models for data extraaction is the class ClientInfoExtraction, at the begining of the file).
- If any acronym or specific name or word, that is uncommon, will be normally used, add it as prompt to the whisper model. Whisper model only supports up to 224 tokens as prompt input. More info about Whisper: https://platform.openai.com/docs/guides/speech-to-text
requirements.txt: Python dependencies
db.json: Local database/storage (Currently using TinyDB. This is a simple JSON file, but it could be replaced with a more robust database like MongoDB or PostgreSQL if needed. Take in count that TinyDB be default has no support concurrent writing, so it could be useful to switch to a more robust database in the future. Calling features are independent from the database.)

Directories

AI/: AI-related components
- prompts/: Prompt configuration files
- tools/: AI tool implementations
frontend/: Vue.js frontend application
- src/: Source code
  - components/: Reusable Vue components
  - pages/: Page-level Vue components
- Configuration files: .eslintrc, quasar.config.js, etc.
static/: Static assets (in this folder goes the compiled frontend files. Frontend is loosely coupled with the backend, so the frontend can be built and deployed separately, and other technologies can be used. Only enure to name the main html file "index.html")
recordings/: Stored voice recordings
call_results/: Results from voice calls (Only for the console client)
numbers_to_call/: Contact information (Only for the console client)

Key Technologies

Backend: Python
Frontend: Vue.js (Quasar Framework)
Voice Communication: Twilio
AI: OpenAI API

Setup and Installation

Install Python dependencies: pip install -r requirements.txt
Install frontend dependencies: cd frontend && npm install (for development only)
Configure environment variables in .env:
- OpenAI Configuration (Create account):
  - OPENAI_API_KEY: Your OpenAI API key (Get key)
  - OPENAI_REALTIME_MODEL: Realtime model (Docs) (default: gpt-4o-mini-realtime-preview)
  - OPENAI_AUDIO_VOICE: TTS voice model (Docs) (default: coral)
  - OPENAI_STT_MODEL: Speech-to-text model (Docs) (default: whisper-1)
  - OPENAI_TEXT_MODEL: Text generation model (Docs) (default: gpt-4o-mini)
- Twilio Configuration (Create account):
  - TWILIO_ACCOUNT_SID: Your Twilio Account SID
  - TWILIO_AUTH_TOKEN: Your Twilio Auth Token
  - PHONE_NUMBER_FROM: Your Twilio phone number
  - When using Twilio in Trial version, it is required to add a phone number as the verified number receive calls from Twilio, and also it is required asset it risk level. It is easier to run the code, and follow the instructions given in the console when Twilio API raises the exceptions. Otherwise, consult the Twilio documentation, (Verify phone number).
- Hugging Face Configuration(Create account):
  - HUGGINGFACE_TOKEN: Token for Pyannote Speaker Diarization (Get token)
  - Pyannote Segmentation is a Gated Model requiring organization name and email (Model page)
- Additional Settings:
  - MODEL_LANGUAGE: Language setting (default: es)
  - CALL_DURATION_LIMIT: Call duration limit in seconds
  - NGROK_TOKEN: Ngrok authentication token (Create account)
  - PORT: Server port (default: 6060)

Development

Backend: Python scripts in root directory
Frontend: Vue components in frontend/src/
AI Prompts: Configured in AI/prompts/

Server API

Call Management

POST /make_call
- Make an outbound call to a phone number
- Body: { "phone_number": string }
- Returns: Call details including status
- Timeout: will last until the call ends or times out.
- This is one of the main functions of the backend.

User Management

POST /users
- Create a new user
- Body: User data object
- Validates phone numbers with country code support
GET /users
- Get all users
- Returns: List of users
GET /users/{phone_number}
- Get specific user details
- Returns: User data or 404
PUT /users/{phone_number}
- Update user information
- Body: Updated user data
DELETE /users/{phone_number}
- Delete user and associated audio files
- Returns: Deletion confirmation

Call History

GET /call_histories
- Get call history with filters
- Query Parameters:
  - start_date: Filter by start date
  - end_date: Filter by end date
  - phone_number: Filter by phone number
  - call_status: Filter by call status
  - extracted_info_keyword: Search in extracted information
POST /call_histories/export
- Export call histories to Excel/CSV
- Body: Filter criteria

Bulk Operations

POST /users/import
- Import users from file
- Supports Excel, CSV, TXT formats
- File format:
  - Excel/CSV: columns = [phone_number, name(optional)]
  - TXT: Comma-separated phone numbers
GET /users/export
- Export users list
- Query Parameter: format (xlsx, csv, txt)
- Returns: File download

WebSocket Endpoint

WebSocket /media-stream
- Handles real-time audio streaming
- Manages communication between Twilio and OpenAI
- Supports:
  - Audio format: g711 µ-law
  - Turn detection
  - Real-time transcription and response
- This function, along with make_call, are the main functions of the backend. The rest are support functions to control call flows.
- Take in count that this functions contains two inner async loops, that takes care of the communication between Twilio and OpenAI, so any improvement made on it must NOT include sleeps or blocking operations, otherwise, it may affect the quality of the call.

Static Files

GET /
- Serves the frontend SPA
- Returns: index.html from static directory

Usage

Run the server: python _server.py

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwilioVoiceBot

Project Overview

Project Structure

Root Directory

Directories

Key Technologies

Setup and Installation

Development

Server API

Call Management

User Management

Call History

Bulk Operations

WebSocket Endpoint

Static Files

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
AI		AI
__pycache__		__pycache__
frontend		frontend
static		static
.env_example		.env_example
.gitignore		.gitignore
README.md		README.md
_client.py		_client.py
_server.py		_server.py
audio_processing.py		audio_processing.py
bot_initialization.py		bot_initialization.py
db.json		db.json
info_extraction.py		info_extraction.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TwilioVoiceBot

Project Overview

Project Structure

Root Directory

Directories

Key Technologies

Setup and Installation

Development

Server API

Call Management

User Management

Call History

Bulk Operations

WebSocket Endpoint

Static Files

Usage

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages