# Meeting Minutes Generator From Audio File

**Author:** [Pouria Ebrahimnezhad]  
**Date:** [14-03-2025]  
**Description:** This Jupyter Notebook uses a Hugging Face audio model to transcribe city council meeting audio recordings
and generate structured meeting minutes in Markdown format.

## Installation
Ensure dependencies are installed:
```bash
pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2
```

## Usage
1. **Specify the audio file path** in the `AUDIO_FILE` variable.
2. **Run the notebook** to process the audio and generate minutes.
3. **Review and edit the Markdown output** as needed.
4. **Save or share the meeting minutes.**

## Example Markdown Output

# City Council Meeting - [Date]

## Attendees
- [List of Attendees]

## Agenda
1. [Agenda Item 1]
2. [Agenda Item 2]

## Key Decisions
- [Decision 1]
- [Decision 2]

## Action Items
- [Task 1]
- [Task 2]


In [None]:
#!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2

In [None]:
# imports

import os
import requests
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig, AutoModelForSpeechSeq2Seq, AutoProcessor
import torch
import re
from IPython.display import Markdown

In [None]:
# Constants

AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"

In [None]:
# Connect to Colab to Google Drive and locate the Audio File

drive.mount("/content/drive")
audio_filename = "/content/drive/MyDrive/llms/Seattle-council-extract.mp3"

In [None]:
# Sign in to HuggingFace Hub

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

In [None]:
# Sign in to OpenAI using Secrets in Colab

openai_api_key = userdata.get('OPENAI_API_KEY')
openai = OpenAI(api_key=openai_api_key)

In [None]:
# Use the Whisper OpenAI model to convert the Audio to Text

audio_file = open(audio_filename, "rb")
transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format="text")
print(transcription)

In [None]:
system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
user_prompt = f"Below is an extract transcript of a Seattle council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]


In [None]:
# define the contization to reduce the memory impact of the models

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

In [None]:
# cerate a tokenizer for the model
tokenizer = AutoTokenizer.from_pretrained(LLAMA)
# setting the padding
tokenizer.pad_token = tokenizer.eos_token
# pass in the message and the entire transcript and use GPU
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
# use streaming
streamer = TextStreamer(tokenizer)
# create the model passing the model and, GPU utilization and the quantization config from above
model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map="auto", quantization_config=quant_config)


outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)

In [None]:
response = tokenizer.decode(outputs[0])
display(Markdown(response))

## Alternative implementation

This variation uses an open-source model to transcribe the meeting Audio

In [None]:
AUDIO_MODEL = "openai/whisper-medium"
speech_model = AutoModelForSpeechSeq2Seq.from_pretrained(AUDIO_MODEL, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True)
speech_model.to('cuda')
processor = AutoProcessor.from_pretrained(AUDIO_MODEL)

pipe = pipeline(
    "automatic-speech-recognition",
    model=speech_model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    torch_dtype=torch.float16,
    device='cuda',
)

In [None]:
# Use the Whisper OpenAI model to convert the Audio to Text
result = pipe(audio_filename, return_timestamps=True)

In [None]:
transcription = result["text"]
print(transcription)