<a href="https://colab.research.google.com/github/Troyanovsky/Building-with-GenAI/blob/main/podcast_summary_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build with GenAI: Summarize Podcasts using Whisper & LLM

- Completely local (not using OpenAI's API). You can run the code on your own computer and keep everything private. Or you can use Google Colab's free T4 GPU (just hit Runtime - Run All)
- You can adapt the code easily to perform other tasks like summarzing meetings, taking lecture notes, synthesizing user research/interview, etc.

This Colab notebook is the accompanying code for my article at:

This is part of the "Build with GenAI" series. Other tutorial projects can be found at:

In [1]:
# Installing libraries & setting up
!pip install -q --upgrade torch torchvision torchaudio
!pip install -q git+https://github.com/huggingface/transformers
!pip install -q accelerate optimum # These packages are for accelerating transformers module
!pip install feedparser mutagen

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.5/755.5 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m88.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m91.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.17.1 requires torch==2.2.1, but you have torch 2.2.2 which is incompatible.[0m[31m
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.4/297.4 kB[0m [31m6.7 MB/s[0m eta [36m

In [2]:
# Install llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

%cd /content
!apt-get update -qq && apt-get install -y -qq aria2

# Download a local large language model, I'm using OpernHermes-2.5-Mistral-7B-16K-GGUF which has a longer context size and has pretty good quality
# If you want to use other local models that can easily run on consumer hardware, check ou this repo: https://github.com/Troyanovsky/Local-LLM-Comparison-Colab-UI/
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-16k-GGUF/resolve/main/openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf?download=true -d /content/model/ -o openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.61.tar.gz (37.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.4/37.4 MB[0m [31m156.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m235.0 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.61-cp310-cp310-manylinux_2_35_x86_64.whl size=38844314 sha256=2c955f66fb74f11d77992fbae79552a8aaa7be9a0273aae8945093c8b7ca0643
  Stor

In [3]:
# Setting up Whisper for transcribing audio to text
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available
import gc
import time
from mutagen.mp3 import MP3

pipe = None

def load_whisper():
    global pipe
    # Set up ASR pipeline
    pipe = pipeline("automatic-speech-recognition",
                    "openai/whisper-large-v3", # you can change to other models liek whisper-large-v2, whisper-medium, whisper-small for faster (but lower quality) results
                    torch_dtype=torch.float16,
                    device="cuda:0", # or mps for Mac
                    model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"}
                    )

def unload_whisper():
    global pipe
    if pipe:
        del pipe  # Delete the pipeline object
        pipe = None
    torch.cuda.empty_cache()  # Clear the cache on the CUDA device
    gc.collect()  # Collect garbage to free up memory

def transcribe_local_mp3(file_path):
    start_time = time.time()
    if os.path.exists(file_path):
        try:
            audio = MP3(file_path)
            duration = audio.info.length
            minutes = int(duration // 60)
            seconds = int(duration % 60)
            print(f"Duration of file: {minutes}:{seconds:02}")
        except Exception as e:
            print("Error processing file:", e)
            return None

        # Transcribe audio file using Whisper
        outputs = pipe(file_path,
                       chunk_length_s=30,
                       batch_size=24,
                       return_timestamps=True,
                       generate_kwargs = {"task":"transcribe",
                       "language": "en", #"zh"
                       })

        end_time = time.time()
        print(f"Time used for transcription: {end_time - start_time}")
        return outputs["text"]
    else:
        raise FileNotFoundError(f"The file at {file_path} was not found.")

In [4]:
# Setting up a local LLM for summarization or chat
from llama_cpp import Llama

llm = None

def load_llama():
    global llm
    llm = Llama(
            model_path="/content/model/openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf", # if you're using another model, change the name
            chat_format="chatml", # use the chat_format that matches the model
            n_gpu_layers=-1, # use -1 for all layers on GPU
            n_ctx=12288 # context size
    )

def summarize_text(input: str) -> str:
    output = llm.create_chat_completion(
        messages=[
            {
                "role": "system",
                "content": "Summarize the user input in three to five detailed bullet points. Reply bullet points only. Bullet points start with '- '.",
            }, # Feel free to modify the prompt to suit your own formatting needs
            {"role": "user", "content": input},
        ],
        temperature=0.7,
    )
    output_text = output['choices'][0]['message']['content']
    return output_text

def summarize_long_text(input_text: str) -> str:
    MAX_CHUNK_SIZE = 36864
    summaries = []

    # Break down the input text into chunks
    for i in range(0, len(input_text), MAX_CHUNK_SIZE):
        chunk = input_text[i:i+MAX_CHUNK_SIZE]
        summaries.append(summarize_text(chunk))

    # Concatenate all summaries into one text
    concatenated_summaries = "\n".join(summaries)

    # Use summarize_text to generate a final summary of all the chunk summaries
    final_summary = summarize_text(concatenated_summaries)

    return final_summary

In [5]:
# Download the latest episode of the podcast
from datetime import datetime
import requests
import os
import feedparser

def download_latest_episode(podcast_rss_url: str) -> str:
    # Parse the feed
    feed = feedparser.parse(podcast_rss_url)

    if feed.bozo == 1 or not feed.entries:
        print(f"The URL provided is not a valid RSS feed or no entries found: {podcast_rss_url}")
        return None

    # Find the latest episode entry
    latest_episode = feed.entries[0]

    # Find the audio file URL (assuming it's in the enclosure of the latest entry)
    audio_url = None
    for link in latest_episode.links:
        if link.type == 'audio/mpeg':
            audio_url = link.href
            break

    if audio_url is None:
        for enclosure in latest_episode.enclosures:
            if enclosure.type == 'audio/mpeg':
                audio_url = enclosure.href
                break

    # Check if the audio URL was found
    if not audio_url:
        return None

    # Define the path where the file will be saved, for simplicity just use the timestamp as the temp file name
    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
    filename = f"{timestamp}.mp3"
    directory = "/content/temp/"
    if not os.path.exists(directory):
        os.makedirs(directory)
    filepath = os.path.join(directory, filename)

    try:
        # Download the file
        response = requests.get(audio_url, stream=True)
        response.raise_for_status()
    except requests.exceptions.RequestException as e:
        print(f"Error downloading audio file: {e}")
        return None

    # Save the file
    with open(filepath, 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                file.write(chunk)

    return filepath

def get_podcast_info(podcast_rss_url: str):
    # Parse the RSS feed
    feed = feedparser.parse(podcast_rss_url)

    # Extract podcast and latest episode information
    podcast_title = feed.channel.title if 'title' in feed.channel else None
    latest_episode = feed.entries[0] if feed.entries else None
    episode_title = latest_episode.title if latest_episode else None

    return podcast_title, episode_title, latest_episode

In [6]:
# Glue all components together
def summarize_podcast(podcast_url):
    # Get podcast info
    podcast_title, episode_title, latest_episode = get_podcast_info(podcast_url)
    if not latest_episode:
        return f"No episodes found for the RSS feed at {podcast_url}"

    # Transcribe the podcast
    load_whisper()
    podcast_file = download_latest_episode(podcast_url)
    if podcast_file is None:
        unload_whisper()
        return f"Can't download the latest episode from the RSS feed at {podcast_url}"

    try:
        transcript = transcribe_local_mp3(podcast_file)
    except Exception as e:
        unload_whisper()
        print(f"Error transcribing audio file: {e}")
        return None

    unload_whisper()

    if transcript is None:
        return "Error: Unable to generate transcript"

    # Summarize the content
    load_llama()
    try:
        summary = summarize_long_text(transcript)
    except Exception as e:
        print(f"Error summarizing text: {e}")
        summary = None

    # Clean up MP3 file
    os.remove(podcast_file)

    # Unload LLM
    del llm

    return podcast_title, episode_title, summary

In [7]:
# Call the function
# Replace with the RSS url of the podcast. You can look up podcasts here: https://castos.com/tools/find-podcast-rss-feed/
podcast_title, episode_title, summary = summarize_podcast("https://api.substack.com/feed/podcast/10845.rss")
print(podcast_title)
print(episode_title)
print(summary)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/3.90k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/283k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.07k [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


preprocessor_config.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

Duration of file: 87:11
Time used for transcription: 168.79129147529602


llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /content/model/openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = nurtureai_openhermes-2.5-mistral-7b-16k
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - 

Lenny's Podcast: Product | Growth | Career
A framework for finding product-market fit | Todd Jackson (First Round Capital)
- Product-market fit is crucial for startups during their initial years.
- Todd Jackson's framework helps B2B founders with sales-led companies find product-market fit through four levels: nascent, developing, strong, and extreme.
- Nascent level focuses on finding a specific problem worth solving with an inefficient solution; signs of being stuck include disappearing products and slow growth.
- Developing level emphasizes identifying shared critical needs, improving efficiency, and scaling sales. Signs of stagnation are high churn and difficulty scaling.
- Strong product-market fit involves optimizing for scale and maintaining satisfaction; indicators of getting stuck are low NRR and slow growth.
- Extreme product-market fit targets expanding TAM via new products or markets.
- The framework highlights the importance of continuous improvement in customer satisfacti