# Meeting Audio Processing Pipeline
### This notebook walks you through an easy workflow for processing meeting audio:
1. Upload a meeting recording.
2. Transcribe the audio to text automatically.
3. Get a summarized version of the meeting notes.
4. Listen to a spoken version of the summary (text-to-speech).
###With this, you‚Äôll quickly generate readable and audible meeting summaries‚Äîperfect for documentation and accessibility.

## Step 1: Import Libraries and Load Environment Variables

This cell imports all the dependencies used throughout the notebook:

- **`base64`** ‚Äî for encoding audio files before sending them to cloud APIs.
- **`numpy` / `scipy.io.wavfile`** ‚Äî for audio array manipulation and WAV file I/O.
- **`tempfile` / `os`** ‚Äî for creating temporary files and reading environment variables.
- **`gradio`** ‚Äî for building the interactive web UI.
- **`dotenv`** ‚Äî for loading API keys from a `.env` file.
- **`huggingface_hub`** ‚Äî for authenticating with the Hugging Face model hub.
- **`openai`** ‚Äî the OpenAI Python client, also used to talk to OpenRouter.
- **`transformers`** ‚Äî for loading the local Whisper speech-recognition model.

`load_dotenv(override=True)` reads the `.env` file and makes its variables available via `os.getenv`.

In [28]:
import base64
import numpy as np
import scipy.io.wavfile as wavfile
import tempfile
import os
import gradio as gr
from dotenv import load_dotenv
from huggingface_hub import login
from openai import OpenAI
from transformers import pipeline

load_dotenv(override=True)

True

## Step 2: Verify Hugging Face Token

Load the `HF_TOKEN` environment variable and confirm it is set. This token is needed to authenticate with the Hugging Face Hub for downloading the Whisper model in a later step.

In [29]:
HF_TOKEN = os.getenv("HF_TOKEN")

if not HF_TOKEN:
  print("HF_TOKEN is not set")
else:
  print("HF_TOKEN is set")

HF_TOKEN is set


## Step 3: Authenticate with Hugging Face

Log in to the Hugging Face Hub using the token retrieved in the previous step. This authentication is required to download gated or private models (like Whisper) from the Hub.

In [10]:
# Login to Hugging Face
login(token=HF_TOKEN)
print ("Logged in to Hugging Face")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Logged in to Hugging Face


## Step 4: Load the Whisper Speech Recognition Model

The Hugging Face `pipeline` API is used to load **OpenAI's Whisper Medium (English)** model locally. On the first run this downloads ~3 GB of model weights. The `transcribe_audio_hf` wrapper function accepts an audio file path and returns the transcribed text. This provides an offline/local alternative to the cloud-based Gemini transcription used later.

In [None]:
# Load Whisper pipeline once (downloads ~3GB on first run)
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium.en")

def transcribe_audio_hf(audio_path: str) -> str:
    """Transcribe audio using HuggingFace Whisper (local model)."""
    result = pipe(audio_path)
    return result["text"]

## Step 5: Verify OpenRouter API Key

Similar to the Hugging Face token check, this cell loads the `OPENROUTER_API_KEY` from the environment and confirms it is available. OpenRouter is used as a unified gateway to access multiple LLM providers (GPT-4o-mini for summarization, Gemini for transcription).

In [30]:
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

if not OPENROUTER_API_KEY:
  print("OPENROUTER_API_KEY is not set")
else:
  print("OPENROUTER_API_KEY is set")

OPENROUTER_API_KEY is set


## Step 6: Initialize API Clients

Two OpenAI-compatible clients are created:

- **`openrouter`** ‚Äî points to the OpenRouter API (`openrouter.ai/api/v1`). This is used for transcription (Gemini) and summarization (GPT-4o-mini).
- **`openai_client`** ‚Äî points to the standard OpenAI API. This is used for text-to-speech generation, which requires a direct OpenAI connection.

In [31]:
openrouter = OpenAI(api_key=OPENROUTER_API_KEY, base_url="https://openrouter.ai/api/v1")
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

## Step 7: Define the Summarization System Prompt

This system prompt tells the LLM how to behave when summarizing a meeting transcript. It instructs the model to extract key discussion points, decisions, and action items with owners, and to keep the output concise enough to be read aloud (since the summary will later be converted to speech).

In [32]:
SYSTEM_PROMPT = """You are an expert at summarizing meeting transcriptions.
Given a raw meeting transcript, produce a concise, well-structured summary that covers:
- Key discussion points
- Decisions made
- Action items with owners (if mentioned)
Keep the summary clear and suitable for being read aloud."""

## Step 8: Create the Transcript Summarization Function

`summarize_transcript` sends the raw transcript to **GPT-4o-mini** via OpenRouter with streaming enabled. As each chunk of the summary arrives, the function yields the accumulated text so the UI can display it progressively ‚Äî giving the user real-time feedback instead of waiting for the full response.

In [33]:
def summarize_transcript(transcript: str):
    """Summarize a meeting transcript using OpenRouter with streaming."""
    stream = openrouter.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Please summarize the following meeting transcript:\n\n{transcript}"},
        ],
        stream=True,
    )
    collected = ""
    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content:
            collected += delta.content
            yield collected

## Step 9: Define Audio Transcription (Gemini) and Text-to-Speech Functions

Two utility functions are defined here:

- **`transcribe_audio_openrouter`** ‚Äî reads the audio file, base64-encodes it, and sends it to **Google Gemini 2.0 Flash** via OpenRouter's chat completions endpoint. The audio is embedded as a data URL inside a multimodal message, and Gemini returns a verbatim transcription.
- **`text_to_speech`** ‚Äî sends the summary text to the **OpenAI TTS API** (`gpt-4o-mini-tts`, voice "alloy") and streams the response directly to a WAV file on disk.

In [34]:
def transcribe_audio_openrouter(audio_path: str) -> str:
    print("Transcribe audio by sending it to Gemini via OpenRouter as a base64 data URL.")
    with open(audio_path, "rb") as f:
        audio_b64 = base64.b64encode(f.read()).decode("utf-8")

    ext = audio_path.rsplit(".", 1)[-1].lower()
    mime = {"mp3": "audio/mpeg", "wav": "audio/wav", "m4a": "audio/mp4"}.get(ext, "audio/mpeg")
    print("ext", ext)
    response = openrouter.chat.completions.create(
        model="google/gemini-2.0-flash-001",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please transcribe this audio recording verbatim."},
                    {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{audio_b64}"}},
                ],
            }
        ],
    )
    return response.choices[0].message.content


def text_to_speech(text: str, output_path: str = "summary_audio.wav") -> str:
    """Convert text to speech using OpenAI TTS API directly."""
    response = openai_client.audio.speech.create(
        model="gpt-4o-mini-tts",
        voice="alloy",
        input=text,
        response_format="wav",
    )
    response.stream_to_file(output_path)
    return output_path

## Step 10: Build the Gradio Web Interface

This is the main orchestration layer. Two core functions tie everything together:

- **`summarize_meeting_audio`** ‚Äî accepts an uploaded audio file and the chosen transcription engine, runs transcription followed by summarization, and streams progress updates to the UI.
- **`generate_audio_from_summary`** ‚Äî takes the finished summary text and converts it to a WAV file via the OpenAI TTS API.

A **Gradio Blocks** UI is then assembled with:
- An audio upload widget and a radio selector for the transcription engine (Gemini via OpenRouter *or* local Whisper).
- A "Summarize" button that triggers transcription + summarization.
- A "Generate Audio Summary" button that produces a spoken version of the summary.
- Live status indicators and a markdown display for the generated summary.

In [35]:
def summarize_meeting_audio(audio_file_path: str, transcription_method: str):
    """Transcribe and summarize the meeting audio with streaming."""
    if audio_file_path is None:
        yield "", "Please upload an audio file.", "Please upload an audio file."
        return

    yield "‚è≥ Transcribing audio...", "", ""

    if transcription_method == "OpenRouter / Gemini":
        transcript = transcribe_audio_openrouter(audio_file_path)
    else:
        transcript = transcribe_audio_hf(audio_file_path)

    yield "‚úÖ Transcribed audio\n\n‚è≥ Summarizing transcript...", "", ""

    final = ""
    for partial in summarize_transcript(transcript):
        final = partial
        yield "‚úÖ Transcribed audio\n\n‚è≥ Summarizing transcript...", final, final

    yield "‚úÖ Transcribed audio\n\n‚úÖ Summarized transcript", final, final


def generate_audio_from_summary(summary: str):
    """Convert an existing summary text to speech."""
    if not summary or summary.strip() == "":
        yield "‚ö†Ô∏è No summary to convert to audio.", None
        return
    try:
        yield "‚è≥ Generating audio from summary...", None
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
            tts_path = tmp.name
        audio_path = text_to_speech(summary, tts_path)
        yield "‚úÖ Audio summary generated", audio_path
    except Exception as e:
        yield f"‚ùå TTS failed: {e}", None


with gr.Blocks(title="Meeting Audio Summarizer", theme=gr.themes.Soft()) as demo:
    gr.Markdown(
        """
        # üéôÔ∏è Meeting Audio Summarizer
        Upload a meeting recording (MP3 or WAV) and get an AI-generated summary ‚Äî both as text and audio.
        """
    )

    with gr.Row():
        with gr.Column():
            audio_input = gr.Audio(
                label="Upload Meeting Recording (MP3 / WAV / M4A)",
                type="filepath",
                sources=["upload"],
                waveform_options=gr.WaveformOptions(show_recording_waveform=True),
            )
            transcription_choice = gr.Radio(
                choices=[
                    "OpenRouter / Gemini",
                    "HuggingFace Whisper (local, ~3 GB)",
                ],
                value="OpenRouter / Gemini",
                label="Transcription Engine",
            )
            summarize_btn = gr.Button("Summarize", variant="primary")
            speak_btn = gr.Button("Generate Audio Summary", variant="secondary")

        with gr.Column():
            status_display = gr.Markdown(value="", elem_id="status-log")
            summary_state = gr.State("")
            summary_display = gr.Markdown(
                value="*Your meeting summary will appear here...*",
            )
            summary_audio = gr.Audio(
                label="Listen to Summary",
                type="filepath",
                interactive=False,
                waveform_options=gr.WaveformOptions(show_recording_waveform=True),
            )

    summarize_btn.click(
        fn=summarize_meeting_audio,
        inputs=[audio_input, transcription_choice],
        outputs=[status_display, summary_state, summary_display],
    )

    speak_btn.click(
        fn=generate_audio_from_summary,
        inputs=[summary_state],
        outputs=[status_display, summary_audio],
    )

demo.launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7866
* To create a public link, set `share=True` in `launch()`.




ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/sheriffibrahim/projects/llm_engineering/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sheriffibrahim/projects/llm_engineering/.venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sheriffibrahim/projects/llm_engineering/.venv/lib/python3.12/site-packages/fastapi/applications.py", line 1134, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/sheriffibrahim/projects/llm_engineering/.venv/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/sheriffi

Transcribe audio by sending it to Gemini via OpenRouter as a base64 data URL.
ext mp3
Transcribe audio by sending it to Gemini via OpenRouter as a base64 data URL.
ext mp3
Transcribe audio by sending it to Gemini via OpenRouter as a base64 data URL.
ext mp3
