Real-Time Meeting Transcription and Summarization System

A complete pipeline for real-time audio ingestion, speaker diarization, transcription, and per-speaker summarization using local services and LLM. Supports Zoom meeting platform

Features

Real-time audio streaming via WebSocket
Speaker diarization and identification
Automatic transcription using local Whisper model
AI-powered per-speaker summaries via LLM
Reverse proxy for unified endpoint access
JSON API for transcripts and summaries

Architecture Overview

                     ┌───────────────────────┐
                     │  Zoom Meet / Attendee│
                     │      Bot Audio         │
                     └─────────┬─────────────┘
                               │  (WebSocket)
                               ▼
                  ┌───────────────────────────┐
                  │  Reverse Proxy (WebSocket)│
                  │      reverse_proxy.py     │
                  └─────────┬─────────────────┘
                            │
                            ▼
               ┌─────────────────────────────┐
               │  Audio Consumer WebSocket   │
               │  websocket_server.py        │
               │                             │
               │ - DiarizedAudioProcessor    │
               │   → maps audio → speaker    │
               │ - SpeakerAudioAggregator    │
               │   → collects per-speaker   │
               │     audio segments          │
               │ - process_speaker_segment() │
               │   → transcribes using      │
               │     Whisper model           │
               │ - TranscriptManager         │
               │   → stores per-speaker +    │
               │     complete transcript     │
               └─────────┬───────────────────┘
                         │
                         ▼
              ┌─────────────────────────────┐
              │  Transcription Service       │
              │  transcription_service.py   │
              │ - WhisperTranscriptionService│
              │   → converts audio → text    │
              └─────────┬───────────────────┘
                        │
                        ▼
              ┌─────────────────────────────┐
              │  Transcript Microservice    │
              │  transcript_microservice.py │
              │ - Stores raw & per-speaker  │
              │   transcripts locally       │
              │ - Fetch loop: polls reverse │
              │   proxy for updated transcript│
              │ - /transcript endpoint      │
              │   → returns JSON transcript │
              │ - /summary endpoint         │
              │   → sends per-speaker text  │
              │     to LLM (Groq ) │
              │   → returns summary JSON    │
              └─────────┬───────────────────┘
                        │
                        ▼
             ┌─────────────────────────────┐
             │     External LLM API        │
             │  (Groq / OpenAI / Claude)  │
             │ - Summarizes per-speaker   │
             └─────────────────────────────┘

Component Flow

Audio Source: Meeting audio streams sent to WebSocket audio server
WebSocket Audio Server (ws://127.0.0.1:5005):
- Receives audio chunks in real-time
- Groups audio by speaker using SpeakerAudioAggregator
- Maps audio chunks to speakers via DiarizedAudioProcessor
- Sends completed segments to Whisper for transcription
Webhook/Local Transcript Server (http://127.0.0.1:5006/5007):
- Receives speaker diarization updates via webhooks
- creates transcripts and stores in TranscriptManager
- Exposes transcripts as JSON at /transcripts 5007
Reverse Proxy (http://127.0.0.1:8080):
- Consolidates WebSocket and HTTP traffic under one public port
- Routes /attendee-websocket* to WebSocket server
- Routes /webhook* to webhook server
- Supports bidirectional streaming
Microservice (FastAPI on port 8000):
- Polls local transcripts every 5 seconds
- Maintains raw transcripts and per-speaker stores
- Generates per-speaker summaries via Groq LLM
- Exposes endpoints for transcripts and summaries

Project Overview

Core Components

websocket_server.py

Speaker diarization and audio chunk mapping (DiarizedAudioProcessor)
Transcript management (TranscriptManager)
WebSocket audio server
Webhook HTTP server
Local transcript HTTP server
Use of dataclass for TranscriptUtterance
Asynchronous handling with asyncio and websockets

transcription_service.py

Defines a base interface (TranscriptionService) for audio-to-text transcription.
Implements Whisper-based transcription with WhisperTranscriptionService.
Loads a Whisper model (tiny/base/small/...) on initialization.
transcribe method converts NumPy audio array → text, language, and segments.
get_transcription_service() returns the singleton instance.
set_transcription_service() allows replacing the global service (for testing or swapping models).

audio_aggregator.py

SpeakerAudioSegment: stores audio, timestamps, and speaker info for a single segment.
SpeakerAudioAggregator: groups incoming audio chunks by speaker into segments.
add_audio_chunk: adds audio, finalizes segments if too long or new speaker appears.
finalize_stale_segments: ends segments idle for a while.
finalizeAllSegments: ends all active segments (e.g., meeting end).
Tracks active speakers and provides segment stats for transcription or analysis.

reverse_proxy.py

Local Servers

Audio WebSocket Server:
- Runs on ws://127.0.0.1:5005.
- Receives real-time audio chunks from meetings or clients.
- Buffers audio per speaker using SpeakerAudioAggregator.
- Sends completed segments for transcription using WhisperTranscriptionService.
Webhook / Local Transcript HTTP Server:
- Runs on http://127.0.0.1:5006 (webhooks) and http://127.0.0.1:5007 (local transcript).
- Receives speaker diarization updates (/webhook).
- Exposes local transcripts via JSON (/transcripts, /transcripts/per-speaker).

Reverse Proxy (Port 8080)

Consolidates multiple local services under one public port.
WebSocket proxy: /attendee-websocket* → forwards to ws://127.0.0.1:5005.
- Maintains bidirectional streaming between client and local audio server.
HTTP proxy: /webhook* and /transcripts* → forwards to http://127.0.0.1:5006.
- Preserves request method, headers, and body; returns backend response.

ngrok Integration

Uses pyngrok to create a secure tunnel to the reverse proxy on port 8080.
Provides a public URL for external services to connect to local servers.

microservice.py

Purpose: FastAPI microservice to fetch transcripts and generate per-speaker summaries using an LLM.
Fetch loop: Polls http://127.0.0.1:5007 every 5 seconds to update local stores:
- raw_transcripts → full JSON
- per_speaker_store → transcripts grouped by speaker
Endpoints:
- GET /transcripts → returns raw transcript JSON
- GET /summary → returns LLM-generated summaries per speaker
LLM usage: Sends each speaker's transcript to Groq LLM to generate a concise summary

Installation

Prerequisites

Python 3.8+
ngrok (for public tunnel)
Groq API key (for LLM summaries)

Install Dependencies

pip install -r requirements.txt

Setup and Run

Run the following services in separate terminal windows:

1. Start WebSocket Audio Server

python audio_consumer/websocket_server.py

2. Start Reverse Proxy

python reverse_proxy.py

3. Expose Reverse Proxy via ngrok

ngrok http 8080

Copy the https://<ngrok-id>.ngrok-free.app URL for configuration.

4. Start Microservice

python microservice.py

API Usage

Trigger Meeting Bot

curl -X POST 'https://app.attendee.dev/api/v1/bots' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Token YOUR_API_TOKEN' \
  -d '{
    "meeting_url": "https://us05web.zoom.us/j/YOUR_MEETING_ID",
    "bot_name": "Meeting Bot",
    "websocket_settings": {
      "audio": {
        "url": "wss://YOUR-NGROK-URL.ngrok-free.app/attendee-websocket",
        "sample_rate": 16000
      }
    },
    "webhooks": [
      {
        "url": "https://YOUR-NGROK-URL.ngrok-free.app/webhook",
        "triggers": ["transcript.update", "bot.state_change"]
      }
    ]
  }'

Access Transcripts

Raw transcripts:

http://127.0.0.1:5007/transcripts

Per-speaker summaries:

http://127.0.0.1:8000/summary

Example Outputs

Raw Transcript JSON

{
  "format": "json",
  "meeting_info": {
    "start_time_ms": 1763438437019,
    "end_time_ms": 1763438704648,
    "total_segments": 9,
    "speakers": ["deepit shah"]
  },
  "transcripts": [
    {
      "timestamp_ms": 1763438437019,
      "end_timestamp_ms": 1763438467023,
      "duration_ms": 30004,
      "speaker_uuid": "16778240",
      "speaker_name": "deepit shah",
      "speaker_is_host": true,
      "transcription": "Hi, so let us everything is up and running...",
      "audio_samples": 480160,
      "sample_rate": 16000,
      "processed_at": "2025-11-17T20:03:45.282319"
    }
  ]
}

Summary per Speaker

{
  "summary_per_speaker": {
    "deepit shah": "The speaker is setting up and demonstrating a system with multiple components, including a web audio server, webhook server, local transcript server, and a microservice. The microservice generates summaries using an LLM..."
  }
}

Configuration

Ports

5005: WebSocket Audio Server
5006: Webhook Server
5007: Local Transcript Server
8000: Microservice (FastAPI)
8080: Reverse Proxy

Environment Variables

export GROQ_API_KEY="your_groq_api_key"
export WHISPER_MODEL="base"  # Options: tiny, base, small, medium, large

Troubleshooting

WebSocket Connection Issues

Ensure ngrok is running and the URL is correctly configured
Check firewall settings for port 8080

Transcription Quality

Increase Whisper model size (base → small → medium)
Verify audio sample rate is 16000 Hz

Missing Summaries

Verify Groq API key is set
Check microservice logs for LLM errors

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
audio_consumer		audio_consumer
README.md		README.md
llm_test.py		llm_test.py
microservice.py		microservice.py
requirements.txt		requirements.txt
reverse_proxy.py		reverse_proxy.py
transcriptions.log		transcriptions.log

TechMax101/AI-Meeting-Bot

Folders and files

Latest commit

History

Repository files navigation