<a href="https://colab.research.google.com/github/deepakjangir15/MeetingScribe/blob/main/Meeting_Minutes_from_an_Audio_File.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

# Create meeting minutes from an Audio file

I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here:  
https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing

If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).

The goal of this product is to use the Audio to generate meeting minutes, including actions.

For this project, you can either use the Denver meeting minutes, or you can record something of your own!



# Installing the pip libraries required

In [1]:
!pip install -q requests torch bitsandbytes transformers sentencepiece accelerate openai httpx==0.27.2 gradio

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m72.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m63.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

- **requests**: Handles HTTP requests for fetching data from web APIs.  
- **torch**: Core library for PyTorch, used for deep learning and tensor computations.  
- **bitsandbytes**: Enables 8-bit quantisation for efficient model inference.  
- **transformers**: Provides pre-trained transformer models from Hugging Face.  
- **sentencepiece**: Tokeniser for handling text segmentation in NLP models.  
- **accelerate**: Optimises deep learning model training and inference.  
- **openai**: API client for interacting with OpenAI’s models.  
- **httpx==0.27.2**: Asynchronous HTTP client for handling API requests.  
- **gradio**: Builds user-friendly web interfaces for machine learning models.  

# Setting the constants and initialization

In [2]:
# imports

import os
import requests
from openai import OpenAI
from google.colab import drive
from huggingface_hub import login
from google.colab import userdata
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig
import torch
import gradio as gr

In [3]:
# Constants

AUDIO_MODEL = "whisper-1"
LLAMA = "meta-llama/Meta-Llama-3.1-8B-Instruct"

In [4]:
# Sign in to HuggingFace Hub

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

In [5]:
# Sign in to OpenAI using Secrets in Colab

openai_api_key = userdata.get('OPENAI_API_KEY')
openai = OpenAI(api_key=openai_api_key)

In [6]:
# Initialize Llama model and tokenizer

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(LLAMA)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    LLAMA,
    device_map="auto",
    quantization_config=quant_config
)

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

# Defining all the functions required

In [7]:
# Generate meeting minutes

def generate_minutes(transcription, model, tokenizer, progress=gr.Progress()):
    progress(0.6, desc="Generating meeting minutes from transcript...")

    system_message = "You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown."
    user_prompt = f"Below is an extract transcript of a meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\n{transcription}"

    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt}
    ]

    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
    outputs = model.generate(inputs, max_new_tokens=2000)
    response = tokenizer.decode(outputs[0])

    # Clean up the response, keep only the minutes
    progress(0.9, desc="Cleaning and formatting minutes...")
    response = response.split("<|end_header_id|>")[-1].strip().replace("<|eot_id|>","")

    return response

In [8]:
# Transcribe the uploaded audio file using OpenAI's Whisper model

def transcribe_audio(audio_path, progress=gr.Progress()):
    progress(0.3, desc="Creating transcript from audio...")

    try:
        with open(audio_path, "rb") as audio_file:
            transcription = openai.audio.transcriptions.create(
                model=AUDIO_MODEL,
                file=audio_file,
                response_format="text"
            )
            return transcription
    except Exception as e:
        return f"Error during transcription: {str(e)}"

In [9]:
# Process the uploaded audio file, transcribe it, and generate meeting minutes

def process_upload(audio_file, progress=gr.Progress()):
    progress(0.1, desc="Starting process...")

    if audio_file is None:
        return "Please upload an audio file."

    try:
        # Check file format
        if not str(audio_file).lower().endswith('.mp3'):
            return "Please upload an MP3 file."

        # Get transcription
        transcription = transcribe_audio(audio_file)
        if transcription.startswith("Error"):
            return transcription

        # Generate minutes
        minutes = generate_minutes(transcription, model, tokenizer)
        progress(1.0, desc="Process complete!")
        return minutes

    except Exception as e:
        return f"Error processing file: {str(e)}"

# Using Gradio for the Interface

In [10]:
# Create Gradio interface

interface = gr.Interface(
    fn=process_upload,
    inputs=gr.Audio(type="filepath", label="Upload MP3 File", format="mp3"),
    outputs=gr.Markdown(label="Meeting Minutes", min_height=60),
    title="Meeting Minutes Generator",
    description="Upload an MP3 recording of your meeting to get AI-generated meeting minutes. This process may take a few minutes.",
    flagging_mode="never"
)

> **Information**
>
> I used the following video meeting and converted it to MP3 to test the model. The results below are shown on the video.
>
> [GitLab Unfiltered Meeting Video](https://www.youtube.com/watch?v=rOqgRiNMVqg&ab_channel=GitLabUnfiltered)
>
> After implementing the model, I am using Gradio to test the model.

In [13]:
# Launch Gradio interface

interface.launch()

Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----
Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://22c757d489e4f0f0fd.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


