# Day 5 - Combining Frontier & Open-Source Models for Audio-to-Text Summarization

## **Summary**

This lesson focuses on building a practical application that automatically generates meeting minutes from audio recordings. The project combines the strengths of frontier models (via APIs for audio-to-text transcription) and open-source models (hosted locally or via Hugging Face for text summarization and action item extraction) to create a useful business tool.

## **Highlights**

- 🚀 **Project Goal:** Build an AI system to generate structured meeting minutes (discussion points, takeaways, action items) from audio recordings. This is relevant for automating a common business task, saving time and improving record-keeping.
- 🤝 **Hybrid Approach:** Utilize both frontier models (for accurate speech-to-text conversion via API) and open-source models (using Hugging Face Transformers for text processing). This demonstrates a practical pattern of leveraging different model types based on their strengths and accessibility for specific tasks.
- 🔊 **Input Data:** Use publicly available audio recordings of council meetings from Hugging Face datasets as the source material. This provides realistic data for development and testing.
- ⚙️ **Core Skills:** Solidify understanding of using pipelines, tokenizers, and models within the Hugging Face ecosystem for inference with open-source LLMs. This is crucial for customizing and deploying AI solutions beyond API-only approaches.
- 📈 **Business Relevance:** The project simulates building a real-world feature found in many productivity applications, giving practical experience in developing end-to-end AI solutions.

## **Reflective Questions**

- **How can I apply this concept in my daily data science work or learning?**
    - You can apply this by identifying tasks that can be broken down into sub-problems solvable by different types of models (e.g., using a powerful API for initial data processing like transcription, then a fine-tuned open-source model for specific analysis or summarization). This hybrid approach is often cost-effective and flexible.
- **Can I explain this concept to a beginner in one sentence?**
    - We're building a tool that listens to meeting recordings, automatically types out what was said using one AI, and then uses another AI to summarize the key points and list who needs to do what.
- **Which type of project or domain would this concept be most relevant to?**
    - This concept is highly relevant for productivity tools, business process automation, corporate knowledge management, legal tech (meeting/deposition summarization), and accessibility tools (providing summaries for hearing-impaired individuals).

# Day 5 - Using Hugging Face & OpenAI for AI-Powered Meeting Minutes Generation

## **Summary**

This section details the step-by-step implementation of the meeting minutes generator within a Google Colab environment. It showcases integrating Google Drive for file access, using the OpenAI Whisper API for audio transcription, and employing a quantized Llama 3.1 model via Hugging Face Transformers for summarizing the transcript into structured minutes with action items.

## **Highlights**

- 💾 **Data Source & Access:** Utilizes the "Meeting Bank" dataset (specifically a segment from a Denver City Council meeting) stored on Google Drive, demonstrating how to mount and access Drive files directly within Colab. This is relevant for projects requiring access to personal or team data stored in the cloud.
- 🔊 **Audio Transcription:** Leverages OpenAI's `whisper-1` model via its API to convert the meeting audio file (`.mp3`) into text. This highlights the practical use of specialized frontier models for high-accuracy tasks like speech-to-text.
- 📝 **Text Generation with Open Source LLM:** Employs the `meta-llama/Meta-Llama-3.1-8B-Instruct` model from Hugging Face to process the transcript. This involves crafting specific system and user prompts to guide the model in generating minutes in Markdown format, including summary, discussion points, takeaways, and action items.
- ⚙️ **Efficient Inference:** Applies 4-bit quantization (`BitsAndBytesConfig`) to the Llama 3.1 model before loading it onto the GPU (T4). This significantly reduces memory requirements, making it feasible to run large models on standard Colab GPUs, a crucial technique for resource-constrained environments.
- 🔄 **Streaming Output:** Uses the `TextStreamer` from Hugging Face `transformers` to display the generated text token by token in real-time. This improves user experience for long generations.
- 📄 **Markdown Formatting:** The generated output is explicitly requested and formatted in Markdown, which is then rendered nicely within the Colab notebook using `display(Markdown(...))`. This is useful for creating well-structured, readable reports.
- 💡 **Next Steps:** Proposes an exercise to wrap the implemented logic into a Gradio user interface, allowing users to input a filename from Google Drive and generate minutes. This encourages building practical, user-facing applications.

## **Code Examples**

**Python**

```python
# 1. Mount Google Drive in Colab
from google.colab import drive
drive.mount('/content/drive')
audio_file_path = "/content/drive/My Drive/LLMs/Denver_Extract.mp3" # Example path

# 2. Setup OpenAI client
from openai import OpenAI
client = OpenAI(api_key=openai_api_key) # Assuming key is retrieved

# 3. Transcribe Audio using OpenAI Whisper
audio_file= open(audio_file_path, "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="text"
)
print(transcription)

# 4. Define Prompts for Llama 3.1
SYSTEM_PROMPT = """You are an assistant that produces meeting minutes from transcripts... in markdown."""
USER_PROMPT = f"""Below is the transcript...\nPlease write minutes in markdown...\n\n{transcription}"""
messages = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": USER_PROMPT}]

# 5. Setup Quantization Config
import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

# 6. Load Tokenizer and Model (Quantized)
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct" # Or the specific model used
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token # Set pad token if needed

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Or appropriate dtype
    device_map="auto",         # Automatically use GPU if available
    quantization_config=quantization_config
)

# 7. Prepare Inputs and Streamer
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
streamer = TextStreamer(tokenizer, skip_prompt=True)

# 8. Generate Text (Streaming)
outputs = model.generate(
    inputs,
    max_new_tokens=2000,
    streamer=streamer
)

# 9. Decode the full response (optional, if not just relying on streamer)
response_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# 10. Display as Markdown
from IPython.display import display, Markdown
# Assuming 'response' variable holds the markdown string extracted from response_text
# Example: Extracting the part after the prompt if needed
# response = response_text.split(USER_PROMPT)[-1].strip() # Basic split logic, might need refinement
# display(Markdown(response))
# Note: The actual extraction logic might differ based on exact model output format.
# In the video, the streamed output was directly usable markdown.

```

## **Reflective Questions**

- **How can I apply this concept in my daily data science work or learning?**
    - You can integrate external data sources like Google Drive into your Colab workflows, combine API-based models (like Whisper) for specialized tasks with open-source models (like Llama 3) for custom processing, and use quantization to run larger models on available hardware.
- **Can I explain this concept to a beginner in one sentence?**
    - We're connecting Google Drive to our code notebook, using one AI service (OpenAI) to turn spoken audio from a file into text, and then feeding that text to another, locally-run AI (Llama 3) to automatically write meeting notes in a neat format.
- **Which type of project or domain would this concept be most relevant to?**
    - This is highly relevant for building productivity tools, automating reporting in business intelligence, enhancing accessibility by summarizing long audio/video content, and creating customized NLP applications where combining different AI models offers the best performance or cost-effectiveness.

# Day 5 - Build a Synthetic Test Data Generator: Open-Source AI Model for Business

## **Summary**

This segment introduces the main end-of-week challenge: building a synthetic test data generator using an open-source model, emphasizing its broad applicability and value. It also recaps the key skills mastered in week 3, covering the combined use of frontier and open-source models and proficiency with Hugging Face tools, before previewing the focus of week 4 on LLM selection strategies and code generation.

## **Highlights**

- 🎯 **End-of-Week Challenge:** Create a tool that generates synthetic test data based on user descriptions (e.g., product descriptions, job postings) using an open-source model, preferably with a Gradio UI. This is highly relevant as synthetic data generation is crucial for augmenting datasets, testing systems, and bootstrapping projects when real data is scarce.
- 🛠️ **Value Proposition:** The synthetic data generator is presented as a universally applicable tool for any business vertical, useful for future projects within the course and beyond. Investing time in building it provides a tangible asset.
- ✅ **Week 3 Skills Recap:** Reinforces the ability to confidently code with frontier models, build complex AI assistants (multimodal, tool-using, multi-agent), and integrate frontier and open-source models within a single solution using Hugging Face `pipeline`, `tokenizer`, and `model` APIs for inference.
- 🤔 **Week 4 Preview: Model Selection:** Addresses the critical question of how to choose the right LLM (closed vs. open source, specific models) for a given task. Techniques like using leaderboards and arenas for comparison will be explored.
- 💻 **Week 4 Preview: Code Generation:** Introduces code generation as the practical focus for the upcoming week, utilizing both frontier and open-source models to tackle programming tasks.

## **Reflective Questions**

- **How can I apply this concept (synthetic data generation) in my daily data science work or learning?**
    - You can use synthetic data generation to create larger, more diverse datasets for training machine learning models (especially when real data is limited or sensitive), generate edge cases for robust testing of applications, or create realistic mock data for demos and prototyping.
- **Can I explain this concept (synthetic data generation challenge) to a beginner in one sentence?**
    - The challenge is to build a tool where you tell an AI what kind of data you need (like fake customer reviews or product names), and it automatically creates lots of realistic examples for you to use.
- **Which type of project or domain would this concept (synthetic data generation) be most relevant to?**
    - It's relevant across many domains, including software testing (generating test inputs/outputs), machine learning (data augmentation, tackling class imbalance), privacy-preserving analytics (generating data without real user info), simulation, and any scenario needing varied data for development or evaluation where real data is insufficient or restricted.