📌 **Problem:** Struggling with Disorganized, Unproductive Meetings?
We've all sat through meetings that drag on, lack focus, and end without clear outcomes. It's easy to lose track of key points, action items, or even who said what. And once the meeting ends? You're left sorting through messy notes — or worse, wondering, “What did we actually decide?”

 **Solution: Meet Your AI-Powered Meeting Assistant — MinuteMaker
MinuteMaker transforms your messy, hour-long meeting into:**

📋 A clean, concise summary

💬 A smart Q&A chatbot trained on your conversation

📊 A ready-to-use PowerPoint slide deck

Built with cutting-edge tools like OpenAI Whisper, Google Gemini, ChromaDB, and Python automation, MinuteMaker turns conversation into actionable insights — instantly

Import the Liberaries

In [17]:
!pip install -q git+https://github.com/openai/whisper.git python-pptx google-generativeai
!pip install -q chromadb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [18]:
import os
import json
import uuid
import logging
import tempfile
from pathlib import Path
from typing import List, Dict, Any, Tuple
from google import genai

import whisper
from pptx import Presentation
from pptx.util import Pt, Inches
from pptx.dml.color import RGBColor
from pptx.enum.shapes import MSO_SHAPE
from google.genai import types, Client
from google.api_core import retry
import chromadb
from chromadb.utils.embedding_functions import EmbeddingFunction
import enum
from IPython.display import Markdown, display
from pptx.util import Inches
from PIL import Image, ImageEnhance

In [19]:
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

Code to create a PPT File

In [31]:
#### Constants
MODEL_NAME = "gemini-2.0-flash"
EMBEDDING_MODEL = "models/text-embedding-004"

CHROMA_STORAGE_PATH = Path("./chroma_storage")
COLLECTION_NAME = "meeting_summary_collection"

PPTX_FILENAME = "Meeting Summary.pptx"
MEETING_THEMES = {
    "team_sync": {"title": "Team Sync","color": "#8E44AD",  "bg_image": Path("/content/meeting-2.jpg")},
    "project_kickoff": {"title": "Project Kickoff", "color": "#8E44AD",  "bg_image": Path("/content/meeting-1.jpg")},
    "retrospective": {"title": "Sprint Retrospective", "color": "#8E44AD",  "bg_image": Path("//content/meeting-4.jpg")},
    "client_review": {"title": "Client Review", "color": "#8E44AD",  "bg_image": Path("/content/meeting-3.jpg")},
    "default": {"title": "Meeting Summary", "color": "#8E44AD",  "bg_image": Path("/content/meeting-5.jpg")},
}

MAX_OUTPUT_TOKENS = 9000
TEMPERATURE = 0.4
TOP_P = 0.9
TOP_K = 40
EMBEDDING_TASK = "retrieval_document"

In [21]:
logger = logging.getLogger(__name__)
# Suppress httpx and google_genai.models INFO logs by default
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("google_genai.models").setLevel(logging.WARNING)

# Configure root logger
logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

In [22]:
import google.generativeai as genai
from dotenv import load_dotenv

# Load your .env
load_dotenv()
api_key = os.getenv("AIzaSyB5xPZxC4ygsj70enxaGfQ6AIERTBtFuEE")
GOOGLE_API_KEY="AIzaSyB5xPZxC4ygsj70enxaGfQ6AIERTBtFuEE"
# Configure the GenAI client
genai.configure(api_key="AIzaSyB5xPZxC4ygsj70enxaGfQ6AIERTBtFuEE")

# Correct model name
model = genai.GenerativeModel(model_name="models/gemini-pro")


Import the Audio File

In [23]:
from pathlib import Path

src_path = Path('/content/meeting-clip2.wav')
if not src_path.exists():
    raise FileNotFoundError("Audio file not found")

print("Using audio file:", src_path)

Using audio file: /content/meeting-clip2.wav


In [24]:
# save audio file
def save_file(content):
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.wav')
    temp_file.write(content)
    temp_file.close()
    return Path(temp_file.name)

#Audio Transcription (Whisper)
This component leverages the Whisper model to convert spoken meeting audio into raw text.
It loads a pre-trained model and provides transcription capabilities, enabling accurate speech-to-text conversion for meeting recordings.

In [25]:
class AudioTranscriber:
    """
    Transcribe speech to text using Whisper.
    """

    def __init__(self, model_name = "base"):
        try:
            self.model = whisper.load_model(model_name)
            logger.info(f"Loaded Whisper model '{model_name}'")
        except Exception:
            logger.exception("Failed to load Whisper model.")
            raise

    def transcribe(self, audio_path):
        """
        Transcribe the audio file and return the transcript.
        """
        try:
            result = self.model.transcribe(str(audio_path))
            logger.info("Transcription successful.")
            return result.get("text", "")
        except Exception:
            logger.exception(f"Transcription failed for {audio_path}")
            raise

#Meeting Summarization (Gemini)
This module uses Google's Gemini API to generate concise summaries from meeting transcripts.
It sends a carefully crafted prompt and expects a structured JSON response, which is parsed and used for downstream tasks such as slide generation and Q&A.

In [26]:
class MeetingSummarizer:
    """
    Summarizes meeting transcripts via Google GenAI.
    """

    def __init__(self, client):
        self.client = client

    def summarize(self, transcript, prompt):
        """
        Generates a structured summary (sections with titles, summary, bullets).
        """

        config = types.GenerateContentConfig(
            max_output_tokens=MAX_OUTPUT_TOKENS,
            temperature=TEMPERATURE,
            top_p=TOP_P,
            top_k=TOP_K,
        )
        try:
            resp = self.client.models.generate_content(
                model=MODEL_NAME, config=config, contents=[prompt, transcript]
            )
            summary_text = resp.text
            json_str = summary_text.split("```json")[1].split("```")[0]
            summary_slides = json.loads(json_str)
            logger.info("Parsed summary to JSON.")
            return summary_slides, summary_text
        except Exception:
            logger.exception("Summarization error.")
            raise

#Evaluation with Gemini (Rubric-Based)
This component leverages Google Gemini to evaluate the quality of generated meeting summaries using a rubric-based prompt.
The model assesses the summary across key dimensions:

🧱 Slide Structure

📌 Groundedness

✂️ Conciseness

🗣️ Fluency

By mimicking human feedback, this automated review loop ensures consistency and quality in the generated content.

In [27]:
class SummaryRating(enum.Enum):
    VERY_GOOD = '5'
    GOOD = '4'
    OK = '3'
    BAD = '2'
    VERY_BAD = '1'

class SummaryEvaluator:
    """
    Evaluates Summary generated from transcript via Google GenAI.
    """
    def __init__(self, client):
        self.client = client

    def eval(self, summary, transcript, summary_prompt):
        """
        Evaluate the summary of the transcript and provide a rating on a scale 1 to 5, 1 being "Very Poor" and 5 being "Excellent"
        """
        prompt = (f"""
            # Instruction
            You are an expert evaluator for slide presentations. Your task is to evaluate the quality of a meeting summary generated for a transcript by an AI model, which is intended to be turned into a PowerPoint slide deck.

            We will provide you with the original prompt and transcript given to the model and the AI-generated structured summary. Your evaluation should focus on whether this summary is effective for slide-based presentation.

            # Evaluation
            ## Metric Definition
            You will assess the meeting summary’s quality with regard to slide-readiness. A good slide summary should:
            - Break the content into clear sections
            - Contain accurate and concise summaries
            - Use bullet points that can be used directly in presentation slides
            - Avoid introducing information that wasn't in the source

            ## Criteria
            1. **Structure for Slides**: The summary is clearly broken down into presentation-friendly sections with meaningful titles.
            2. **Groundedness**: The summary uses only content grounded in the original meeting transcript and does not hallucinate.
            3. **Conciseness and Slide-Readiness**: The bullets are clear, well-chunked, and ready to be used on slides (not full paragraphs).
            4. **Fluency and Readability**: The summaries and bullets are easy to understand and grammatically correct.

            ## Rating Rubric
            5 (Excellent): Summary is well-structured, fully grounded, concise, and presentation-ready with fluent writing.
            4 (Good): Summary is mostly well-structured and grounded; bullets are usable with minor edits.
            3 (Fair): Summary is okay but needs editing to be usable in slides (e.g., too verbose, not well-structured).
            2 (Poor): Summary is grounded but hard to use in a slide deck without major revisions.
            1 (Very Poor): Summary is ungrounded, off-topic, or incoherent.

            ## Evaluation Steps
            STEP 1: Assess the summary for presentation-readiness using the 4 criteria.
            STEP 2: Score the summary using the rubric.

            # User Inputs and AI-generated Response
            ## Prompt
            {summary_prompt}
            ## transcript
            {transcript}

            ## AI-generated Summary (JSON format intended for slide generation)
            ```
            {summary}
            ```
                """
        )
        try:
            resp = self.client.chats.create(
                model=MODEL_NAME).send_message(prompt)
            verbose_eval = resp.text
            logger.info("Evaluated the summary generated.")
            return verbose_eval
        except Exception:
            logger.exception("Evaluation error.")
            raise


#Slide Generation (python-pptx)
This module utilizes the python-pptx library to automatically transform the structured JSON summary into a polished PowerPoint presentation.

Each slide is dynamically generated with:

🏷️ Section Titles

📝 Concise Summaries

🔘 Bullet Points, styled for clarity and readability

The goal is to eliminate the manual effort of turning meeting notes into presentation-ready slides — saving time and ensuring consistency.

In [28]:
class PPTGenerator:
    """
    Create a PowerPoint from structured summary.
    """

    def __init__(self, themes):
        self.themes = themes

    @staticmethod
    def hex_to_rgb(h):
        h = h.lstrip("#")
        return tuple(int(h[i : i + 2], 16) for i in (0, 2, 4))

    @staticmethod
    def is_dark(rgb):
        r, g, b = rgb
        return (0.299 * r + 0.587 * g + 0.114 * b) < 150

    @staticmethod
    def add_bg(slide, rgb, image_path):
        """
        Adds a background image or fallback color to the slide.
        """
        try:
            if image_path:
                # Resize image first
                with Image.open(image_path) as img:
                    img = img.resize((960, 720))  # PowerPoint slide size
                    temp_path = "/tmp/resized_bg.jpg"
                    img.save(temp_path)

                # Apply opacity to resized image
                faded_path = "/tmp/faded_bg.png"
                PPTGenerator.apply_opacity(temp_path, faded_path, opacity=0.3)

                # Add faded image to slide
                img_shape = slide.shapes.add_picture(faded_path, 0, 0, width=Inches(10), height=Inches(7.5))

                # Send to back of z-order
                spTree = slide.shapes._spTree
                spTree.remove(img_shape._element)
                spTree.insert(2, img_shape._element)
                return
        except Exception:
            logger.exception("Background image failed. Falling back to color.")

        # Fallback to solid color background
        if rgb:
            shape = slide.shapes.add_shape(
                MSO_SHAPE.RECTANGLE, Inches(0), Inches(0), Inches(10), Inches(7.5)
            )
            shape.fill.solid()
            shape.fill.fore_color.rgb = RGBColor(*rgb)
            shape.line.fill.background()
            slide.shapes._spTree.insert(2, slide.shapes._spTree[-1])
    @staticmethod
    def apply_opacity(image_path, output_path, opacity = 0.3):
        """
        Saves a faded version of the image to use as background.
        """
        img = Image.open(image_path).convert("RGBA")
        alpha = img.split()[3]
        alpha = ImageEnhance.Brightness(alpha).enhance(opacity)
        img.putalpha(alpha)
        img.save(output_path)

    def generate(self, sections, filename, mtype = "default"):
        """
        Build and save the .pptx file.
        """
        try:
            theme = MEETING_THEMES.get(mtype, MEETING_THEMES["default"])
            rgb = self.hex_to_rgb(theme["color"])
            bg_image = theme.get("bg_image")
            font_rgb = (0, 0, 0)
            use_light_text = self.is_dark(rgb)
            prs = Presentation()

            # Title Slide
            slide = prs.slides.add_slide(prs.slide_layouts[0])
            self.add_bg(slide, rgb=rgb, image_path=bg_image)
            slide.shapes.title.text = theme["title"]
            slide.shapes.title.text_frame.paragraphs[0].font.color.rgb = RGBColor(
                *font_rgb
            )
            body = slide.placeholders[1]
            body.text = filename.stem
            body.text_frame.paragraphs[0].font.color.rgb = RGBColor(*font_rgb)

            # TOC
            toc = prs.slides.add_slide(prs.slide_layouts[1])
            self.add_bg(toc, rgb=rgb, image_path=bg_image)
            toc.shapes.title.text = "Table of Contents"
            toc.shapes.title.text_frame.paragraphs[0].font.color.rgb = RGBColor(
                *font_rgb
            )
            toc_body = toc.placeholders[1]
            toc_body.text = "\n".join(s["section_title"] for s in sections)
            for p in toc_body.text_frame.paragraphs:
                p.font.color.rgb = RGBColor(*font_rgb)

            # Content Slides
            for sec in sections:
                sld = prs.slides.add_slide(prs.slide_layouts[1])
                self.add_bg(sld, rgb=rgb, image_path=bg_image)
                sld.shapes.title.text = sec["section_title"]
                sld.shapes.title.text_frame.paragraphs[0].font.color.rgb = RGBColor(
                    *font_rgb
                )
                box = sld.placeholders[1]
                tf = box.text_frame
                tf.clear()
                summary_text = sec.get("summary", "")
                summary_pt = tf.add_paragraph()
                summary_pt.text = f"Summary: {summary_text}"
                summary_pt.font.size = Pt(18)
                summary_pt.font.color.rgb = RGBColor(*font_rgb)
                summary_pt.font.bold = True
                for bullet in sec.get("bullets", []):
                    pb = tf.add_paragraph()
                    pb.text = bullet
                    pb.level = 1
                    pb.font.size = Pt(20)
                    pb.font.color.rgb = RGBColor(*font_rgb)


            prs.save(str(filename))
            logger.info(f"PPT saved: {filename}")
        except Exception:
            logger.exception("Failed PPT generation.")
            raise

#Embedding and Retrieval (RAG)
This module implements Retrieval-Augmented Generation (RAG) by embedding previous meeting transcripts and storing them in a vector database (ChromaDB).

When processing new meetings, relevant past meetings are retrieved based on semantic similarity, providing valuable context or identifying recurring themes.

This approach enhances the agent's memory and ability to recognize long-term patterns, improving both the accuracy and relevance of generated insights.

In [29]:
class RAGEngine:
    """
    Retrieval-Augmented Generation using ChromaDB.
    """
    def __init__(self, client, storage_path, collection_name):
        self.client = client
        self.storage_path = storage_path
        self.collection_name = collection_name

    class _EmbeddingFn(EmbeddingFunction):
        def __init__(self, client, model):
            self.client = client
            self.model = model

        def __call__(self, texts):
            try:
                res = self.client.models.embed_content(
                    model=self.model,
                    contents=texts,
                    config=types.EmbedContentConfig(
                    task_type=EMBEDDING_TASK,
                    ),
                )
                raw = getattr(res, 'embeddings', None) or res.get('embeddings') or res.get('embedding')
                if not raw:
                    raise ValueError("No embeddings returned from GenAI.")
                processed = [item.values if hasattr(item, 'values') else item for item in raw]
                return processed
            except Exception:
                logger.exception("Embedding failed.")
                return []

    def init_db(self) :
        """
        Initialize or get a ChromaDB collection.
        """
        try:
            client = chromadb.PersistentClient(path=str(self.storage_path))
            existing = [col.name for col in client.list_collections()]
            if self.collection_name in existing:
                logger.info("Using existing ChromaDB collection.")
                return client.get_collection(name=self.collection_name)
            logger.info(f"Creating new ChromaDB collection: {self.collection_name}")
            return client.create_collection(
                name=self.collection_name,
                embedding_function=self._EmbeddingFn(self.client, EMBEDDING_MODEL)
            )
        except Exception:
            logger.exception("Failed to initialize ChromaDB collection.")
            raise

    def add_document(self, db, text):
        """
        Add a document to ChromaDB.
        """
        try:
            if text:
                db.add(documents=[text], ids=[str(uuid.uuid4())])
                logger.info("Document added to ChromaDB.")
        except Exception:
            logger.exception("Failed to add document to ChromaDB.")

    def query(self, db, query, k = 2):
        """
        Query ChromaDB for top-k relevant documents.
        """
        try:
            result = db.query(query_texts=[query], n_results=k)
            return result.get('documents', [])
        except Exception:
            logger.exception("Failed to query ChromaDB.")
            return []

    def answer(self, db, query, k = 2):
        """
        Answer a question using retrieved passages from ChromaDB.
        """
        try:
            passages = self.query(db, query, k)
            prompt = (
                f"""You are a helpful and informative bot that answers questions using only the
                provided reference passage below.

                Instructions:
                - Provide a complete, well-explained answer based solely on the passage.
                - If the answer is not available, respond with: "I'm not sure."
                - You are responding to a technical audience, so explain clearly but concisely.
                - Break down complex concepts into understandable parts.
                - Ignore irrelevant information.

                Passage:
                {passages}

                Question:
                {query}

                Answer:RAG Q&A"""
            )
            resp = self.client.models.generate_content(
                model=MODEL_NAME,
                contents=[prompt]
            )
            return resp.text.strip()
        except Exception:
            logger.exception("Failed to generate answer.")
            return "I'm not sure."

#End-to-End Workflow: Transcribe → Summarize → Evaluate → Generate Slides → Retrieve → Q&A
This function orchestrates the complete MinuteMaker pipeline, showcasing a fully integrated GenAI workflow:

**Transcribe**
Converts raw meeting audio into text using OpenAI Whisper.

**Summarize**
Sends the transcript to Google Gemini with structured prompts to generate a JSON-based summary.

**Evaluate**
Uses a rubric-based GenAI evaluator to score the summary for structure, clarity, and fluency.

**Detect Theme**
Classifies the meeting type (e.g., retrospective, planning, kickoff) for context-aware processing.

**Generate Slides**
Transforms the structured summary into a PowerPoint presentation using python-pptx.

**RAG Q&A**
Embeds the transcript in ChromaDB and performs retrieval-augmented question answering to support context-rich chat or queries.

This design demonstrates a powerful GenAI application pipeline that combines:

🔁 Multimodal processing

🧾 Structured prompting

✅ Automated evaluation

🧠 Knowledge retrieval

🎯 Content generation

In [32]:
def main(audio_bytes):
    """
    Workflow: transcribe -> summarize and RAG Q&A -> detect type -> ppt
    """
    try:
        key = GOOGLE_API_KEY
        gen_client = Client(api_key=key)

        #### Transcription ####
        transcriber = AudioTranscriber()
        path = save_file(audio_bytes)
        transcript = transcriber.transcribe(path)

        #### Summarization ####
        summary_prompt = ("""
            You are a meeting assistant. Carefully analyze the meeting transcript and:
                1. Segment it into distinct topics (whenever the conversation focus shifts).
                2. For each topic:
                  - Assign a meaningful short section_title (max 1 line).
                  - Write a concise 1-sentence summary.
                  - Extract 2-3 bullet points (each bullet under 50 words).

            Think step by step. Identify topic shifts chronologically.
            Return the result strictly in the JSON format. All keys must be in double quotes:
                ```
                  {
                    "section_title": "Team Updates",
                    "summary": "...",
                    "bullets": ["...", "...", "..."]
                  },
                  ...
                ```
                """
        )
        summarizer = MeetingSummarizer(gen_client)
        summary, summary_text = summarizer.summarize(transcript, summary_prompt)
        display(Markdown("### **1. Structured output of the Summary**"))
        display(Markdown(summary_text))

        #### Summary Evaluation ####
        evaluate = SummaryEvaluator(gen_client)
        evaluation = evaluate.eval(summary, transcript, summary_prompt)
        display(Markdown(f"### 2. Rubric-Based Evaluation\n{evaluation}"))

        #### Theme Detection ####
        theme_prompt = ("""
        You're a Meeting assistant. Given the context, understand it and choose the meeting type accordingly from the list:
        ["team_sync", "project_kickoff", "client_review", "retrospective"].

        Reply with just the meeting type.
        """
        )
        theme_resp = gen_client.models.generate_content(
            model=MODEL_NAME, contents=[theme_prompt, transcript]
        )
        mtype = theme_resp.text.strip().lower()
        display(Markdown("### **3. Meeting Type**"))
        display(Markdown(mtype))

        #### PPT generation ####
        ppt = PPTGenerator(MEETING_THEMES)
        ppt.generate(summary, Path(PPTX_FILENAME), mtype)

        #### RAG Q&A using Chroma ####
        rag = RAGEngine(gen_client, CHROMA_STORAGE_PATH, COLLECTION_NAME)
        db = rag.init_db()
        rag.add_document(db, transcript)


        # Example QA
        questions = [
            "What's the main priority for getting people to the underused room?",
            "What audience groups are excluded?",
            "What's the plan for distributing audio recordings or 'CDs' post-session?",
        ]

        display(Markdown("### **4. RAG Q&A**"))
        for question in questions:
            display(Markdown("Q:" + question))
            ans = rag.answer(db, question, k=2)
            display(Markdown("Ans:"+ans))

    except Exception:
        logger.exception("Main workflow error.")
        raise
    display(Markdown("### **Done**"))


if __name__ == "__main__":
    audio_bytes = src_path.read_bytes()
    main(audio_bytes)




### **1. Structured output of the Summary**

```json
[
  {
    "section_title": "Initial Proposal Discussion",
    "summary": "The team acknowledges Paul's proposal but expresses initial concerns about inflexible core hours and suggests further individual review before a deeper discussion.",
    "bullets": [
      "Paul's proposal is acknowledged as a starting point.",
      "Concerns are raised about the rigidity of core hours.",
      "Team members need time to review the proposal individually."
    ]
  }
]
```

### 2. Rubric-Based Evaluation
## Evaluation

**STEP 1: Assess the summary for presentation-readiness using the 4 criteria.**

*   **Structure for Slides:** The summary is structured with a section title, a summary sentence, and bullet points, which aligns with the prompt instructions and is suitable for slide creation.
*   **Groundedness:** The summary and bullet points accurately reflect the content of the original transcript. No hallucination is present.
*   **Conciseness and Slide-Readiness:** The summary sentence is concise and the bullet points are short and to the point, making them ready to be placed directly onto slides.
*   **Fluency and Readability:** The language used is clear, grammatically correct, and easy to understand.

**STEP 2: Score the summary using the rubric.**

*   **Score:** 5 (Excellent)

The summary is well-structured, fully grounded, concise, presentation-ready, and fluent. It perfectly follows the instructions and delivers a high-quality slide-ready summary.


### **3. Meeting Type**

team_sync

### **4. RAG Q&A**

Q:What's the main priority for getting people to the underused room?

Ans:I'm not sure.

Q:What audience groups are excluded?

Ans:I'm not sure.

Q:What's the plan for distributing audio recordings or 'CDs' post-session?

Ans:I'm not sure.

### **Done**