A real-time audio group chat implementation enabling voice and text communication between humans and AI agents. This project combines WebRTC, speech-to-text, text-to-speech, and LLM capabilities to create interactive conversations with AI agents.
- Real-time audio communication using WebRTC
- Multiple AI agents with distinct voices and personalities
- Text-to-Speech (TTS) with customizable voice options
- Speech-to-Text (STT) for human voice input
- Round-robin speaker selection for balanced conversations
- Gradio-based web interface for easy interaction
- Support for both voice and text channels
- Python 3.8+
- Node.js (for frontend components)
- Ollama (for local LLM support)
- Clone the repository:
git clone <repository-url>
cd AudioGroupChat
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure Ollama settings in
main_app.py
:
config_list = [{
"model": "gemma3:1b", # or other supported models
"base_url": "http://localhost:11434/v1",
"price": [0.00, 0.00],
}]
- (Optional) Set up Twilio TURN server credentials for improved WebRTC connectivity:
export TWILIO_ACCOUNT_SID=your_account_sid
export TWILIO_AUTH_TOKEN=your_auth_token
- Start the application:
python main_app.py
-
Open the provided Gradio interface URL in your browser (typically http://localhost:7860)
-
Start a conversation by:
- Speaking into your microphone
- Typing text messages
- Using the provided UI controls
main_app.py
: Main application entry pointaudio_groupchat.py
: Core audio group chat implementationgradio_ui.py
: Gradio web interface componentstest_group_chat.py
: Test cases and examples
The system supports multiple voice options for AI agents:
- Energetic (fast, US English)
- Calm (slower, US English)
- British (UK English)
- Authoritative (moderate speed, US English)
- Default (standard US English)
class AudioGroupChat(GroupChat):
def __init__(self, agents=None, messages=None, max_round=10,
speaker_selection_method="round_robin",
allow_repeat_speaker=False)
Key methods:
initialize()
: Set up audio processing componentsadd_human_participant(user_id)
: Add a human participantstart_audio_session(user_id)
: Start an audio session
class GradioUI:
def __init__(self, audio_chat: AudioGroupChat)
def create_interface(self) -> gr.Blocks
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.