<a href="https://colab.research.google.com/github/BinayPrad/Youtube-Video-Summarzation/blob/main/colabs/youGPTube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouGPTube 🦾

## TL;DR 👇

* Summarize any YouTube video using whisper and chatGPT

## How it works 🤔

![yougptube](https://user-images.githubusercontent.com/18450628/229377710-95fb8645-3d71-47d0-b3ba-0fd05941b083.png)

Here are the main steps:

1) Extract the audio using youtube-dl
2) Process the audio into smaller chunks
3) Each chunk is transcribed using whisper, OpenAI's powerful speech2text model
4) Each transcription is summarized using ChatGPT

## Imports and dependencies️ ⚙️

In [73]:
!pip install openai



In [74]:
!pip install --upgrade openai



In [75]:
!pip install youtube-dl



In [76]:
import os
import shutil

import librosa
import openai
import soundfile as sf
import youtube_dl
from youtube_dl.utils import DownloadError

openai.api_key=""

## Utility functions 🔋

In [77]:
def find_audio_files(path, extension=".mp3"):
    """Recursively find all files with extension in path."""
    audio_files = []
    for root, dirs, files in os.walk(path):
        for f in files:
            if f.endswith(extension):
                audio_files.append(os.path.join(root, f))

    return audio_files

## Download youtube audio 🔈

In [78]:
!pip install yt-dlp

import yt_dlp



In [80]:
def youtube_to_mp3(youtube_url: str, output_dir: str) -> str:
    """Download the audio from a youtube video, save it to output_dir as an .mp3 file.

    Returns the filename of the savied video.
    """

    # config
    ydl_config = {
        "format": "bestaudio/best",
        "postprocessors": [
            {
                "key": "FFmpegExtractAudio",
                "preferredcodec": "mp3",
                "preferredquality": "192",
            }
        ],
        "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
        "verbose": True,
    }

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    print(f"Downloading video from {youtube_url}")

    try:
        with yt_dlp.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])
    except DownloadError:
        # weird bug where youtube-dl fails on the first download, but then works on second try... hacky ugly way around it.
        with yt_dlp.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])

    audio_filename = find_audio_files(output_dir)[0]
    return audio_filename

## Chunk the audio 🍪

Chunking is necessary in the case where we have very long audio files, since both whisper and ChatGPT have limits of how much audio/text you can process in one go.
It is not necessary for shorter videos.

In [81]:
def chunk_audio(filename, segment_length: int, output_dir):
    """segment lenght is in seconds"""

    print(f"Chunking audio to {segment_length} second segments...")

    if not os.path.isdir(output_dir):
        os.mkdir(output_dir)

    # load audio file
    audio, sr = librosa.load(filename, sr=44100)

    # calculate duration in seconds
    duration = librosa.get_duration(y=audio, sr=sr)

    # calculate number of segments
    num_segments = int(duration / segment_length) + 1

    print(f"Chunking {num_segments} chunks...")

    # iterate through segments and save them
    for i in range(num_segments):
        start = i * segment_length * sr
        end = (i + 1) * segment_length * sr
        segment = audio[start:end]
        sf.write(os.path.join(output_dir, f"segment_{i}.mp3"), segment, sr)

    chunked_audio_files = find_audio_files(output_dir)
    return sorted(chunked_audio_files)

## Speech2text 🗣

Here we use OpenAI's whisper model to transcribe audio files to text.

In [90]:
import openai


def transcribe_audio(audio_files, output_file, model="whisper-1"):
    transcripts = []
    for audio_file in audio_files:
        with open(audio_file, "rb") as audio:
            # Use the new transcription method
            transcript = openai.audio.transcriptions.create(model=model, file=audio)  # Pass 'model' as an argument
            transcripts.append(transcript.text)

    if output_file is not None:
        with open(output_file, "w") as file:
            for transcript in transcripts:
                file.write(transcript + "\n")

    return transcripts

## Summarize 📝

Here we ask chatGPT to take the raw transcripts and transcribe them for us to short bullet points.

In [97]:
def summarize(
    chunks: list[str], system_prompt: str, model="gpt-3.5-turbo", output_file=None
):

    print(f"Summarizing with {model=}")

    summaries = []
    for chunk in chunks:
        response = openai.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": chunk},
            ],
        )
        # Access the summary using the .choices attribute (assuming it's a list)
        summary = response.choices[0].message.content
        summaries.append(summary)

    if output_file is not None:
        # save all transcripts to a .txt file
        with open(output_file, "w") as file:
            for summary in summaries:
                file.write(summary + "\n")

    return summaries

## Putting it all together 🍱

In [98]:
def summarize_youtube_video(youtube_url, outputs_dir):
    raw_audio_dir = f"{outputs_dir}/raw_audio/"
    chunks_dir = f"{outputs_dir}/chunks"
    transcripts_file = f"{outputs_dir}/transcripts.txt"
    summary_file = f"{outputs_dir}/summary.txt"
    segment_length = 10 * 60  # chunk to 10 minute segments

    if os.path.exists(outputs_dir):
        # delete the outputs_dir folder and start from scratch
        shutil.rmtree(outputs_dir)
        os.mkdir(outputs_dir)

    # download the video using youtube-dl
    audio_filename = youtube_to_mp3(youtube_url, output_dir=raw_audio_dir)

    # chunk each audio file to shorter audio files (not necessary for shorter videos...)
    chunked_audio_files = chunk_audio(
        audio_filename, segment_length=segment_length, output_dir=chunks_dir
    )

    # transcribe each chunked audio file using whisper speech2text
    transcriptions = transcribe_audio(chunked_audio_files, transcripts_file)

    # summarize each transcription using chatGPT
    system_prompt = """
    You are a helpful assistant that summarizes youtube videos.
    You are provided chunks of raw audio that were transcribed from the video's audio.
    Summarize the current chunk to succint and clear bullet points of its contents.
    """
    summaries = summarize(
        transcriptions, system_prompt=system_prompt, output_file=summary_file
    )

    system_prompt_tldr = """
    You are a helpful assistant that summarizes youtube videos.
    Someone has already summarized the video to key points.
    Summarize the key points to one or two sentences that capture the essence of the video.
    """
    # put the entire summary to a single entry
    long_summary = "\n".join(summaries)
    short_summary = summarize(
        [long_summary], system_prompt=system_prompt_tldr, output_file=summary_file
    )[0]

    return long_summary, short_summary

In [99]:
youtube_url = "https://www.youtube.com/watch?v=g1pb2aK2we4"
outputs_dir = "outputs/"

long_summary, short_summary = summarize_youtube_video(youtube_url, outputs_dir)

print("Summaries:")
print("=" * 80)
print("Long summary:")
print("=" * 80)
print(long_summary)
print()

print("=" * 80)
print("Video - TL;DR")
print("=" * 80)
print(short_summary)

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.08.06 from yt-dlp/yt-dlp [4d9231208] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'outputs//raw_audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, 

Downloading video from https://www.youtube.com/watch?v=g1pb2aK2we4
[youtube] Extracting URL: https://www.youtube.com/watch?v=g1pb2aK2we4
[youtube] g1pb2aK2we4: Downloading webpage
[youtube] g1pb2aK2we4: Downloading ios player API JSON
[youtube] g1pb2aK2we4: Downloading web creator player API JSON


[debug] Loading youtube-nsig.6db2bd17 from cache
[debug] [youtube] Decrypted nsig pR86PKsGT2c666_MhF- => DancpcuqlzqbmQ


[youtube] g1pb2aK2we4: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] g1pb2aK2we4: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr2---sn-q4fl6ns6.googlevideo.com/videoplayback?expire=1724273872&ei=cADGZvqiFI29sfIPmPjr6QY&ip=34.125.244.69&id=o-AF7oR1kVdOXrWWONc-nrJazCc54Ra_dROrnzffpNX8yd&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=JU&mm=31%2C26&mn=sn-q4fl6ns6%2Csn-a5m7lnl6&ms=au%2Conr&mv=m&mvi=2&pl=20&vprv=1&svpuc=1&mime=audio%2Fwebm&rqh=1&gir=yes&clen=5840382&dur=302.621&lmt=1685852585630421&mt=1724251537&fvip=4&keepalive=yes&c=IOS&txp=5532434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cvprv%2Csvpuc%2Cmime%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRQIgFQmxmvr3DoNSDuBNgeX5lMQ6vcAU1vVpzjM21lLfcEsCIQDG2e9NAhdRjwaUfDH7S8MRAaa3OCzTz29S_zz7U3ERbQ%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl&lsig=AGtxev0wRQIgGlFbjWMKBVHuUWerr9fW9MMF6jvLfvoo7FMT3boI2O0CIQC-svORgAs0nhqSWwyP3cl1BBpArTZpwpZ88WtmVv0bpw%3D%3D"


[download] Destination: outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.webm
[download] 100% of    5.57MiB in 00:00:00 at 14.10MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.webm'


[ExtractAudio] Destination: outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.mp3'


Deleting original file outputs//raw_audio/How stretching actually changes your muscles - Malachy McHugh.webm (pass -k to keep)
Chunking audio to 600 second segments...
Chunking 1 chunks...
Summarizing with model='gpt-3.5-turbo'
Summarizing with model='gpt-3.5-turbo'
Summaries:
Long summary:
- Athletes typically stretch before physical activity to prevent injuries like muscle strains and tears.
- There are two types of stretches: dynamic stretches involve controlled movements engaging multiple muscles, while static stretches involve holding a position to maintain a muscle's fixed length and tension.
- Muscles are viscoelastic, not like a rubber band, which means they change under stress from stretching.
- Stretching elongates layers of protective tissue and tendons around muscle fibers, containing elastic proteins like collagen and elastin.
- Muscle fibers are made up of sarcomeres, the smallest contracting units, which can relax to elongate or contract to shorten muscle fibers.
- Impro

In [100]:
youtube_url = "https://www.youtube.com/watch?v=KKNCiRWd_j0"
outputs_dir = "outputs/"

long_summary, short_summary = summarize_youtube_video(youtube_url, outputs_dir)

print("Summaries:")
print("=" * 80)
print("Long summary:")
print("=" * 80)
print(long_summary)
print()

print("=" * 80)
print("Video - TL;DR")
print("=" * 80)
print(short_summary)

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.08.06 from yt-dlp/yt-dlp [4d9231208] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'outputs//raw_audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, 

Downloading video from https://www.youtube.com/watch?v=KKNCiRWd_j0
[youtube] Extracting URL: https://www.youtube.com/watch?v=KKNCiRWd_j0
[youtube] KKNCiRWd_j0: Downloading webpage
[youtube] KKNCiRWd_j0: Downloading ios player API JSON
[youtube] KKNCiRWd_j0: Downloading web creator player API JSON


[debug] Loading youtube-nsig.6db2bd17 from cache
[debug] [youtube] Decrypted nsig zd7arvoWJq8AxVASwmA => rbdi9Pg01_2yIg


[youtube] KKNCiRWd_j0: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] KKNCiRWd_j0: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr2---sn-q4fl6n6y.googlevideo.com/videoplayback?expire=1724273990&ei=5gDGZqGVNPaVsfIPxvaS2Q8&ip=34.125.244.69&id=o-APDofa9H07VOhpYHo7-V3PKmSGh6WSvfoxCDU7t8VmMo&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=DS&mm=31%2C26&mn=sn-q4fl6n6y%2Csn-a5mlrnlz&ms=au%2Conr&mv=m&mvi=2&pl=20&vprv=1&svpuc=1&mime=audio%2Fwebm&rqh=1&gir=yes&clen=18002272&dur=1321.561&lmt=1716840099826581&mt=1724251537&fvip=1&keepalive=yes&c=IOS&txp=4532434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cvprv%2Csvpuc%2Cmime%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRQIgTpNRQ2x1Dmmria5QLD6S8F1bk9D16ONFMoaJ4jeXHL8CIQDr71N1JsvwHd7rLBC8Lo-NTA_enVvQ1ljaH4WwF2zA-g%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl&lsig=AGtxev0wRQIgUvGVprxeMGqA7k--urQgbJ600Zuzi4kfiQLQ7xUy2JQCIQCSVql4cLgPVLGrUk-UA7-JjlCHfARUiOchYeWU5WUnbQ%3D%3D"


[download] Destination: outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm
[download] 100% of   17.17MiB in 00:00:00 at 25.93MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm'


[ExtractAudio] Destination: outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.mp3'


Deleting original file outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm (pass -k to keep)
Chunking audio to 600 second segments...
Chunking 3 chunks...
Summarizing with model='gpt-3.5-turbo'
Summarizing with model='gpt-3.5-turbo'
Summaries:
Long summary:
- The speaker has been working on AI for almost 15 years and reflects on the evolution of AI from being fringe to mainstream.
- Initially, people were skeptical about AI, thinking it was something out of science fiction.
- AI has surpassed human capabilities in tasks like image recognition, language translation, and playing games like Go and chess.
- People are now realizing the potential impact of AI on society and asking questions about its implications.
- The speaker's six-year-old nephew asked a simple but profound question: "What is an AI anyway?"
- The speaker believes that AI should be understood as a new digital species, which will become companions in people's lives.
- The history of technological innovat