"Speak your language, they hear theirs."
FluentMeet is a state-of-the-art, real-time voice translation video conferencing platform. It eliminates language barriers in global professional collaborations by providing instantaneous, natural-sounding voice translation, allowing participants to communicate naturally in their native tongues.
- Near-Instantaneous Translation: Targeted glass-to-glass latency of under 1.5 seconds.
- Natural Voice Synthesis: High-quality TTS that preserves the natural flow of conversation.
- Zero-Friction Client: Participants join via a secure link with no mandatory account creation for guests.
- Intelligent Audio Routing: An "SFU-Lite" logic for routing raw or translated audio based on individual language preferences.
- Dual-Language Captions: Real-time transcripts showing both original and translated text concurrently.
- Professional Vocabulary: Optimised for business, technical, and domain-specific context.
- Framework: FastAPI (Asynchronous, high-concurrency architecture).
- Data Persistence: PostgreSQL with SQLAlchemy 2.0 (Async).
- Migration Management: Alembic (Configured for asynchronous migrations).
- Event Streaming: Apache Kafka (Decoupled, event-driven audio processing pipeline).
- Real-time Communication: WebSockets for media signaling and caption streaming.
- In-Memory Store: Redis for live room state, participant sessions, and rate-limiting.
- STT (Speech-to-Text): Deepgram / OpenAI Whisper (High-accuracy streaming).
- Machine Translation: DeepL API / GPT-4o (Context-aware translation).
- TTS (Text-to-Speech): Voice.ai (Natural audio synthesis).
FluentMeet utilizes an event-driven pipeline to ensure minimal latency and high scalability:
- Ingest: Speaker's audio is captured via WebRTC and streamed over WebSockets to the Backend.
- STT: Raw audio chunks are pushed to Kafka (
audio.raw), consumed by STT workers, and converted to text. - Translation: Original text is pushed to Kafka (
text.original), consumed by Translation workers, and converted to the target language. - TTS: Translated text is pushed to Kafka (
text.translated), consumed by TTS workers, and synthesized into target audio. - Egress: Synthesized audio is pushed back to Kafka (
audio.synthesized) and routed via WebSockets to listeners who require that language.
graph TD
UserA[Speaker] -->|WebRTC/WS| Backend[Signaling & Routing Server]
Backend -->|Raw Audio| K1((Kafka: audio.raw))
K1 --> STT[STT Worker]
STT -->|Original Text| K2((Kafka: text.original))
K2 --> TL[Translation Worker]
TL -->|Translated Text| K3((Kafka: text.translated))
K3 --> TTS[TTS Worker]
TTS -->|Synthesized Audio| K4((Kafka: audio.synthesized))
K4 --> Backend
Backend -->|Translated Audio| UserB[Listener]
- Python 3.11+
- Docker and Docker Compose
- Access to Deepgram, DeepL, and Voice.ai APIs (API Keys needed).
git clone <repository-url>
cd FluentMeetCopy the example environment file and fill in your credentials:
cp .env.example .envGenerate a secure SECRET_KEY for JWT:
python -c "import secrets; print(secrets.token_hex(32))"It is highly recommended to use a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtStart the required infrastructure (PostgreSQL, Redis, Kafka, Zookeeper):
docker-compose up -dInitialize the database schema using Alembic:
alembic upgrade headuvicorn app.main:app --reloadThe API will be available at http://localhost:8000. You can access the interactive API documentation (Swagger UI) at http://localhost:8000/docs.
Create your SQLAlchemy models in `app/models.py` using the async syntax.Ensure your models are imported in `app/models/init.py` for Alembic to detect them during migrations.python -m alembic revision --autogenerate -m "Add Meeting model"python -m alembic upgrade headpytestGenerate and view a coverage report:
pytest tests/ -v --cov=app --cov-report=html --cov-report=term
# Open htmlcov/index.html in your browser- Authentication: JWT-based authentication with
HttpOnly,Secure,SameSite=Strictcookies for Refresh Tokens. - Data Privacy: Ephemeral audio/text processing; no data is persisted after the meeting ends.
- Rate Limiting: Redis-backed throttling to manage API costs and prevent abuse.
- Soft-Delete: Strict account deletion policies preventing reactivation via login.
- Black: Enforce consistent code formatting.
black .- isort: Sort imports for readability.
isort .- ruff: Linting for code quality and style.
ruff .python -m ruff check .We welcome contributions! Please follow these steps:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Ensure your code follows Black and isort formatting.
- Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.