In [5]:
! pip install yt-dlp
!pip install git+https://github.com/openai/whisper.git
! pip install transformers
! pip install torch


Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-tbqrts28
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-tbqrts28
  Resolved https://github.com/openai/whisper.git to commit 517a43ecd132a2089d85f4ebc044728a71d49f6e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [3]:
!apt update && apt install ffmpeg -y


[33m0% [Working][0m            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:10 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:11 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,542 kB]
Get:12 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [3,140 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64

In [1]:
import subprocess
import whisper
import tempfile
import os
from io import BytesIO
from transformers import pipeline

def stream_youtube_audio(video_url):
    print("🔊 Streaming audio from YouTube...")
    command = [
        "yt-dlp",
        "-f", "bestaudio",
        "--extract-audio",
        "--audio-format", "mp3",
        "-o", "-",
        video_url
    ]

    # Run yt-dlp to get the audio in memory (stdout)
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()

    if process.returncode != 0:
        raise RuntimeError(f"❌ Error downloading audio: {stderr.decode()}")

    return BytesIO(stdout)  # Return the audio as a BytesIO stream

def transcribe_audio(audio_stream):
    print("🗣️ Transcribing with Whisper...")
    model = whisper.load_model("base")

    # Create a temporary file to save the audio data
    with tempfile.NamedTemporaryFile(delete=False, suffix='.mp3') as tmp_audio_file:
        tmp_audio_file.write(audio_stream.read())
        tmp_audio_file_path = tmp_audio_file.name

    # Transcribe audio from the temporary file
    result = model.transcribe(tmp_audio_file_path)

    # Clean up the temporary file
    os.remove(tmp_audio_file_path)

    return result['text']

def summarize_text(text):
    print("🧠 Summarizing transcript...")
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    chunks = [text[i:i+1000] for i in range(0, len(text), 1000)]
    summary = ""
    for chunk in chunks:
        out = summarizer(chunk, max_length=130, min_length=30, do_sample=False)
        summary += out[0]['summary_text'] + " "
    return summary.strip()

def summarize_youtube_video(video_url):
    audio_stream = stream_youtube_audio(video_url)
    try:
        transcript = transcribe_audio(audio_stream)
        summary = summarize_text(transcript)
        return summary
    finally:
        audio_stream.close()




Testing

In [2]:
# Example usage:
video_url = "https://youtu.be/wo_e0EvEZn8?si=baQuTnFySjjK4KPb"
summary = summarize_youtube_video(video_url)
print(f"Summary: {summary}")
# tHe script takes about 7 minutes to run

🔊 Streaming audio from YouTube...
🗣️ Transcribing with Whisper...




🧠 Summarizing transcript...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu
Your max_length is set to 130, but your input_length is only 99. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=49)


Summary: The world you see is not real. You're not living in this very moment that you're experiencing. It turns out your brain constructs your reality. It edits your memories as they happen. It lives in totally different timesfares and tells you a story about the world that feels real. Each day for around two hours you're completely blind. Your brain fills this time with its best guesses of what happened during the blackness. If you could actually see what your eyes see, it would look something like this. What you feel is now is in fact a selectively edited version of the past. You really only consciously experience the world 0.3 to 0.5 seconds after things happened. If your brain showed you the past whether ball was 100 milliseconds ago it would hit you before you could react. So instead your brain takes its location, speed and direction and calculates where the ball should be in the future. By the time the information reaches you and then it creates a fictional version of it. Your c

Making the output legible

In [3]:
import textwrap

print("📋 Summary:\n")
wrapped_summary = textwrap.fill(summary, width=100)  # Adjust width as needed
print(wrapped_summary)

📋 Summary:

The world you see is not real. You're not living in this very moment that you're experiencing. It
turns out your brain constructs your reality. It edits your memories as they happen. It lives in
totally different timesfares and tells you a story about the world that feels real. Each day for
around two hours you're completely blind. Your brain fills this time with its best guesses of what
happened during the blackness. If you could actually see what your eyes see, it would look something
like this. What you feel is now is in fact a selectively edited version of the past. You really only
consciously experience the world 0.3 to 0.5 seconds after things happened. If your brain showed you
the past whether ball was 100 milliseconds ago it would hit you before you could react. So instead
your brain takes its location, speed and direction and calculates where the ball should be in the
future. By the time the information reaches you and then it creates a fictional version of it. You