Skip to content

TechMax101/AI-Meeting-Bot

Repository files navigation

Real-Time Meeting Transcription and Summarization System

A complete pipeline for real-time audio ingestion, speaker diarization, transcription, and per-speaker summarization using local services and LLM. Supports Zoom meeting platform

Features

  • Real-time audio streaming via WebSocket
  • Speaker diarization and identification
  • Automatic transcription using local Whisper model
  • AI-powered per-speaker summaries via LLM
  • Reverse proxy for unified endpoint access
  • JSON API for transcripts and summaries

Architecture Overview

                     ┌───────────────────────┐
                     │  Zoom Meet / Attendee│
                     │      Bot Audio         │
                     └─────────┬─────────────┘
                               │  (WebSocket)
                               ▼
                  ┌───────────────────────────┐
                  │  Reverse Proxy (WebSocket)│
                  │      reverse_proxy.py     │
                  └─────────┬─────────────────┘
                            │
                            ▼
               ┌─────────────────────────────┐
               │  Audio Consumer WebSocket   │
               │  websocket_server.py        │
               │                             │
               │ - DiarizedAudioProcessor    │
               │   → maps audio → speaker    │
               │ - SpeakerAudioAggregator    │
               │   → collects per-speaker   │
               │     audio segments          │
               │ - process_speaker_segment() │
               │   → transcribes using      │
               │     Whisper model           │
               │ - TranscriptManager         │
               │   → stores per-speaker +    │
               │     complete transcript     │
               └─────────┬───────────────────┘
                         │
                         ▼
              ┌─────────────────────────────┐
              │  Transcription Service       │
              │  transcription_service.py   │
              │ - WhisperTranscriptionService│
              │   → converts audio → text    │
              └─────────┬───────────────────┘
                        │
                        ▼
              ┌─────────────────────────────┐
              │  Transcript Microservice    │
              │  transcript_microservice.py │
              │ - Stores raw & per-speaker  │
              │   transcripts locally       │
              │ - Fetch loop: polls reverse │
              │   proxy for updated transcript│
              │ - /transcript endpoint      │
              │   → returns JSON transcript │
              │ - /summary endpoint         │
              │   → sends per-speaker text  │
              │     to LLM (Groq ) │
              │   → returns summary JSON    │
              └─────────┬───────────────────┘
                        │
                        ▼
             ┌─────────────────────────────┐
             │     External LLM API        │
             │  (Groq / OpenAI / Claude)  │
             │ - Summarizes per-speaker   │
             └─────────────────────────────┘

Component Flow

  1. Audio Source: Meeting audio streams sent to WebSocket audio server

  2. WebSocket Audio Server (ws://127.0.0.1:5005):

    • Receives audio chunks in real-time
    • Groups audio by speaker using SpeakerAudioAggregator
    • Maps audio chunks to speakers via DiarizedAudioProcessor
    • Sends completed segments to Whisper for transcription
  3. Webhook/Local Transcript Server (http://127.0.0.1:5006/5007):

    • Receives speaker diarization updates via webhooks
    • creates transcripts and stores in TranscriptManager
    • Exposes transcripts as JSON at /transcripts 5007
  4. Reverse Proxy (http://127.0.0.1:8080):

    • Consolidates WebSocket and HTTP traffic under one public port
    • Routes /attendee-websocket* to WebSocket server
    • Routes /webhook* to webhook server
    • Supports bidirectional streaming
  5. Microservice (FastAPI on port 8000):

    • Polls local transcripts every 5 seconds
    • Maintains raw transcripts and per-speaker stores
    • Generates per-speaker summaries via Groq LLM
    • Exposes endpoints for transcripts and summaries

Project Overview

Core Components

websocket_server.py

  • Speaker diarization and audio chunk mapping (DiarizedAudioProcessor)
  • Transcript management (TranscriptManager)
  • WebSocket audio server
  • Webhook HTTP server
  • Local transcript HTTP server
  • Use of dataclass for TranscriptUtterance
  • Asynchronous handling with asyncio and websockets

transcription_service.py

  • Defines a base interface (TranscriptionService) for audio-to-text transcription.
  • Implements Whisper-based transcription with WhisperTranscriptionService.
  • Loads a Whisper model (tiny/base/small/...) on initialization.
  • transcribe method converts NumPy audio array → text, language, and segments.
  • get_transcription_service() returns the singleton instance.
  • set_transcription_service() allows replacing the global service (for testing or swapping models).

audio_aggregator.py

  • SpeakerAudioSegment: stores audio, timestamps, and speaker info for a single segment.
  • SpeakerAudioAggregator: groups incoming audio chunks by speaker into segments.
  • add_audio_chunk: adds audio, finalizes segments if too long or new speaker appears.
  • finalize_stale_segments: ends segments idle for a while.
  • finalizeAllSegments: ends all active segments (e.g., meeting end).
  • Tracks active speakers and provides segment stats for transcription or analysis.

reverse_proxy.py

Local Servers

  • Audio WebSocket Server:

    • Runs on ws://127.0.0.1:5005.
    • Receives real-time audio chunks from meetings or clients.
    • Buffers audio per speaker using SpeakerAudioAggregator.
    • Sends completed segments for transcription using WhisperTranscriptionService.
  • Webhook / Local Transcript HTTP Server:

    • Runs on http://127.0.0.1:5006 (webhooks) and http://127.0.0.1:5007 (local transcript).
    • Receives speaker diarization updates (/webhook).
    • Exposes local transcripts via JSON (/transcripts, /transcripts/per-speaker).

Reverse Proxy (Port 8080)

  • Consolidates multiple local services under one public port.
  • WebSocket proxy: /attendee-websocket* → forwards to ws://127.0.0.1:5005.
    • Maintains bidirectional streaming between client and local audio server.
  • HTTP proxy: /webhook* and /transcripts* → forwards to http://127.0.0.1:5006.
    • Preserves request method, headers, and body; returns backend response.

ngrok Integration

  • Uses pyngrok to create a secure tunnel to the reverse proxy on port 8080.
  • Provides a public URL for external services to connect to local servers.

microservice.py

  • Purpose: FastAPI microservice to fetch transcripts and generate per-speaker summaries using an LLM.
  • Fetch loop: Polls http://127.0.0.1:5007 every 5 seconds to update local stores:
    • raw_transcripts → full JSON
    • per_speaker_store → transcripts grouped by speaker
  • Endpoints:
    • GET /transcripts → returns raw transcript JSON
    • GET /summary → returns LLM-generated summaries per speaker
  • LLM usage: Sends each speaker's transcript to Groq LLM to generate a concise summary

Installation

Prerequisites

  • Python 3.8+
  • ngrok (for public tunnel)
  • Groq API key (for LLM summaries)

Install Dependencies

pip install -r requirements.txt

Setup and Run

Run the following services in separate terminal windows:

1. Start WebSocket Audio Server

python audio_consumer/websocket_server.py

2. Start Reverse Proxy

python reverse_proxy.py

3. Expose Reverse Proxy via ngrok

ngrok http 8080

Copy the https://<ngrok-id>.ngrok-free.app URL for configuration.

4. Start Microservice

python microservice.py

API Usage

Trigger Meeting Bot

curl -X POST 'https://app.attendee.dev/api/v1/bots' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Token YOUR_API_TOKEN' \
  -d '{
    "meeting_url": "https://us05web.zoom.us/j/YOUR_MEETING_ID",
    "bot_name": "Meeting Bot",
    "websocket_settings": {
      "audio": {
        "url": "wss://YOUR-NGROK-URL.ngrok-free.app/attendee-websocket",
        "sample_rate": 16000
      }
    },
    "webhooks": [
      {
        "url": "https://YOUR-NGROK-URL.ngrok-free.app/webhook",
        "triggers": ["transcript.update", "bot.state_change"]
      }
    ]
  }'

Access Transcripts

Raw transcripts:

http://127.0.0.1:5007/transcripts

Per-speaker summaries:

http://127.0.0.1:8000/summary

Example Outputs

Raw Transcript JSON

{
  "format": "json",
  "meeting_info": {
    "start_time_ms": 1763438437019,
    "end_time_ms": 1763438704648,
    "total_segments": 9,
    "speakers": ["deepit shah"]
  },
  "transcripts": [
    {
      "timestamp_ms": 1763438437019,
      "end_timestamp_ms": 1763438467023,
      "duration_ms": 30004,
      "speaker_uuid": "16778240",
      "speaker_name": "deepit shah",
      "speaker_is_host": true,
      "transcription": "Hi, so let us everything is up and running...",
      "audio_samples": 480160,
      "sample_rate": 16000,
      "processed_at": "2025-11-17T20:03:45.282319"
    }
  ]
}

Summary per Speaker

{
  "summary_per_speaker": {
    "deepit shah": "The speaker is setting up and demonstrating a system with multiple components, including a web audio server, webhook server, local transcript server, and a microservice. The microservice generates summaries using an LLM..."
  }
}

Configuration

Ports

  • 5005: WebSocket Audio Server
  • 5006: Webhook Server
  • 5007: Local Transcript Server
  • 8000: Microservice (FastAPI)
  • 8080: Reverse Proxy

Environment Variables

export GROQ_API_KEY="your_groq_api_key"
export WHISPER_MODEL="base"  # Options: tiny, base, small, medium, large

Troubleshooting

WebSocket Connection Issues

  • Ensure ngrok is running and the URL is correctly configured
  • Check firewall settings for port 8080

Transcription Quality

  • Increase Whisper model size (base → small → medium)
  • Verify audio sample rate is 16000 Hz

Missing Summaries

  • Verify Groq API key is set
  • Check microservice logs for LLM errors

About

AI meeting bot that transcribes audio from zoom using local models and summaries text per speaker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages