<a href="https://colab.research.google.com/github/anupmalh/demo/blob/master/colabs/youGPTube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouGPTube 🦾

## TL;DR 👇

* Summarize any YouTube video using whisper and chatGPT

## How it works 🤔

![yougptube](https://user-images.githubusercontent.com/18450628/229377710-95fb8645-3d71-47d0-b3ba-0fd05941b083.png)

Here are the main steps:

1) Extract the audio using youtube-dl
2) Process the audio into smaller chunks
3) Each chunk is transcribed using whisper, OpenAI's powerful speech2text model
4) Each transcription is summarized using ChatGPT

## Imports and dependencies️ ⚙️

In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.42.0-py3-none-any.whl.metadata (22 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.42.0-py3-none-any.whl (362 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m362.9/362.9 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K   [90m━━

In [2]:
!pip install --upgrade openai



In [3]:
!pip install youtube-dl

Collecting youtube-dl
  Downloading youtube_dl-2021.12.17-py2.py3-none-any.whl.metadata (1.5 kB)
Downloading youtube_dl-2021.12.17-py2.py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: youtube-dl
Successfully installed youtube-dl-2021.12.17


In [4]:
import os
import shutil

import librosa
import openai
import soundfile as sf
import youtube_dl
from youtube_dl.utils import DownloadError

openai.api_key=""

## Utility functions 🔋

In [5]:
def find_audio_files(path, extension=".mp3"):
    """Recursively find all files with extension in path."""
    audio_files = []
    for root, dirs, files in os.walk(path):
        for f in files:
            if f.endswith(extension):
                audio_files.append(os.path.join(root, f))

    return audio_files

## Download youtube audio 🔈

In [6]:
!pip install yt-dlp

import yt_dlp

Collecting yt-dlp
  Downloading yt_dlp-2024.8.6-py3-none-any.whl.metadata (170 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/170.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m163.8/170.1 kB[0m [31m4.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m170.1/170.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting brotli (from yt-dlp)
  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting websockets>=12.0 (from yt-dlp)
  Downloading websockets-13.0-cp310-cp310-manylinux_2_5_x86_64.manylinu

In [7]:
def youtube_to_mp3(youtube_url: str, output_dir: str) -> str:
    """Download the audio from a youtube video, save it to output_dir as an .mp3 file.

    Returns the filename of the savied video.
    """

    # config
    ydl_config = {
        "format": "bestaudio/best",
        "postprocessors": [
            {
                "key": "FFmpegExtractAudio",
                "preferredcodec": "mp3",
                "preferredquality": "192",
            }
        ],
        "outtmpl": os.path.join(output_dir, "%(title)s.%(ext)s"),
        "verbose": True,
    }

    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    print(f"Downloading video from {youtube_url}")

    try:
        with yt_dlp.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])
    except DownloadError:
        # weird bug where youtube-dl fails on the first download, but then works on second try... hacky ugly way around it.
        with yt_dlp.YoutubeDL(ydl_config) as ydl:
            ydl.download([youtube_url])

    audio_filename = find_audio_files(output_dir)[0]
    return audio_filename

## Chunk the audio 🍪

Chunking is necessary in the case where we have very long audio files, since both whisper and ChatGPT have limits of how much audio/text you can process in one go.
It is not necessary for shorter videos.

In [8]:
def chunk_audio(filename, segment_length: int, output_dir):
    """segment lenght is in seconds"""

    print(f"Chunking audio to {segment_length} second segments...")

    if not os.path.isdir(output_dir):
        os.mkdir(output_dir)

    # load audio file
    audio, sr = librosa.load(filename, sr=44100)

    # calculate duration in seconds
    duration = librosa.get_duration(y=audio, sr=sr)

    # calculate number of segments
    num_segments = int(duration / segment_length) + 1

    print(f"Chunking {num_segments} chunks...")

    # iterate through segments and save them
    for i in range(num_segments):
        start = i * segment_length * sr
        end = (i + 1) * segment_length * sr
        segment = audio[start:end]
        sf.write(os.path.join(output_dir, f"segment_{i}.mp3"), segment, sr)

    chunked_audio_files = find_audio_files(output_dir)
    return sorted(chunked_audio_files)

## Speech2text 🗣

Here we use OpenAI's whisper model to transcribe audio files to text.

In [9]:
import openai


def transcribe_audio(audio_files, output_file, model="whisper-1"):
    transcripts = []
    for audio_file in audio_files:
        with open(audio_file, "rb") as audio:
            # Use the new transcription method
            transcript = openai.audio.transcriptions.create(model=model, file=audio)  # Pass 'model' as an argument
            transcripts.append(transcript.text)

    if output_file is not None:
        with open(output_file, "w") as file:
            for transcript in transcripts:
                file.write(transcript + "\n")

    return transcripts

## Summarize 📝

Here we ask chatGPT to take the raw transcripts and transcribe them for us to short bullet points.

In [10]:
def summarize(
    chunks: list[str], system_prompt: str, model="gpt-3.5-turbo", output_file=None
):

    print(f"Summarizing with {model=}")

    summaries = []
    for chunk in chunks:
        response = openai.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": chunk},
            ],
        )
        # Access the summary using the .choices attribute (assuming it's a list)
        summary = response.choices[0].message.content
        summaries.append(summary)

    if output_file is not None:
        # save all transcripts to a .txt file
        with open(output_file, "w") as file:
            for summary in summaries:
                file.write(summary + "\n")

    return summaries

## Putting it all together 🍱

In [11]:
def summarize_youtube_video(youtube_url, outputs_dir):
    raw_audio_dir = f"{outputs_dir}/raw_audio/"
    chunks_dir = f"{outputs_dir}/chunks"
    transcripts_file = f"{outputs_dir}/transcripts.txt"
    summary_file = f"{outputs_dir}/summary.txt"
    segment_length = 10 * 60  # chunk to 10 minute segments

    if os.path.exists(outputs_dir):
        # delete the outputs_dir folder and start from scratch
        shutil.rmtree(outputs_dir)
        os.mkdir(outputs_dir)

    # download the video using youtube-dl
    audio_filename = youtube_to_mp3(youtube_url, output_dir=raw_audio_dir)

    # chunk each audio file to shorter audio files (not necessary for shorter videos...)
    chunked_audio_files = chunk_audio(
        audio_filename, segment_length=segment_length, output_dir=chunks_dir
    )

    # transcribe each chunked audio file using whisper speech2text
    transcriptions = transcribe_audio(chunked_audio_files, transcripts_file)

    # summarize each transcription using chatGPT
    system_prompt = """
    You are a helpful assistant that summarizes youtube videos.
    You are provided chunks of raw audio that were transcribed from the video's audio.
    Summarize the current chunk to succint and clear bullet points of its contents.
    """
    summaries = summarize(
        transcriptions, system_prompt=system_prompt, output_file=summary_file
    )

    system_prompt_tldr = """
    You are a helpful assistant that summarizes youtube videos.
    Someone has already summarized the video to key points.
    Summarize the key points to one or two sentences that capture the essence of the video.
    """
    # put the entire summary to a single entry
    long_summary = "\n".join(summaries)
    short_summary = summarize(
        [long_summary], system_prompt=system_prompt_tldr, output_file=summary_file
    )[0]

    return long_summary, short_summary

In [12]:
youtube_url = "https://www.youtube.com/watch?v=QpzykxnCtvM" #"https://www.youtube.com/watch?v=g1pb2aK2we4"
outputs_dir = "outputs/"

long_summary, short_summary = summarize_youtube_video(youtube_url, outputs_dir)

print("Summaries:")
print("=" * 80)
print("Long summary:")
print("=" * 80)
print(long_summary)
print()

print("=" * 80)
print("Video - TL;DR")
print("=" * 80)
print(short_summary)

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.08.06 from yt-dlp/yt-dlp [4d9231208] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'outputs//raw_audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.70 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)


Downloading video from https://www.youtube.com/watch?v=QpzykxnCtvM


[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, mutagen-1.47.0, requests-2.32.3, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-2.0.7, websockets-13.0
[debug] Proxy map: {'colab_language_server': '/usr/colab/bin/language_service'}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1830 extractors


[youtube] Extracting URL: https://www.youtube.com/watch?v=QpzykxnCtvM
[youtube] QpzykxnCtvM: Downloading webpage
[youtube] QpzykxnCtvM: Downloading ios player API JSON
[youtube] QpzykxnCtvM: Downloading web creator player API JSON
[youtube] QpzykxnCtvM: Downloading player a87a9450


[debug] Saving youtube-nsig.a87a9450 to cache
[debug] [youtube] Decrypted nsig TRkdNrEIvE3LnP3q => rOnWPkJdl_VdSA
[debug] Loading youtube-nsig.a87a9450 from cache
[debug] [youtube] Decrypted nsig Te0e_9ohIDdVFSer => pYrzRLJCJMoecg


[youtube] QpzykxnCtvM: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] QpzykxnCtvM: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr2---sn-5ualdnle.googlevideo.com/videoplayback?expire=1724425471&ei=n1DIZq7IDY34zLUPg7Kt2Ao&ip=34.73.47.219&id=o-AJ17zmqeGqWax3FfXqgnnCf_usq2ryizhOI4cRXF3t5g&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=NB&mm=31%2C26&mn=sn-5ualdnle%2Csn-a5m7lnld&ms=au%2Conr&mv=m&mvi=2&pl=20&initcwndbps=5438750&bui=AQmm2ez4N0uIVV78FMeoKS9S8-RUUSkKbZX9EVRWgghPCCejJ54bY18lIg9sI2jy4_hWlGXYXOzjoEXq&spc=Mv1m9o2ipcNJjTDshVowNE5AhmJ--LtM_wLZGQAsj_9rLRuP3Hr1&vprv=1&svpuc=1&mime=audio%2Fwebm&ns=GMOTxtmkLJS9DR91vd8_HrEQ&rqh=1&gir=yes&clen=9993394&dur=743.341&lmt=1724292169501395&mt=1724403653&fvip=2&keepalive=yes&c=WEB_CREATOR&sefc=1&txp=5532434&n=pYrzRLJCJMoecg&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cbui%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRQIgVtyR1gdQwf-k9Zss-fJkO9P1kr6GmPaRuafWhPMnY2UCIQDN_nlUy7J0WM8FEeZ8Ety5kAUzz5UWHFtVvo4fZjJCQA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2C

[download] Destination: outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.webm
[download] 100% of    9.53MiB in 00:00:00 at 12.94MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.webm'


[ExtractAudio] Destination: outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.mp3'


Deleting original file outputs//raw_audio/How To End Malaria Once and for All ｜ Abdoulaye Diabaté ｜ TED.webm (pass -k to keep)
Chunking audio to 600 second segments...
Chunking 2 chunks...
Summarizing with model='gpt-3.5-turbo'
Summarizing with model='gpt-3.5-turbo'
Summaries:
Long summary:
- Discussion about the challenges faced by people in certain regions
- Mention of Abdulla Diabati and his activities
- Reference to a large number of various items or entities with numerical values
- Mention of Africa and Asia in relation to electric company operations
- Some numbers listed in a sequence of counts
- Addressing global denial of current issues
- Emphasizing the need for Africans to take ownership and drive change in Africa
- Highlighting the poor technical platform in Africa and the need to improve it
- Establishing a World Bank-funded center in Burkina Faso to tackle vector-borne diseases like malaria
- Building a team of next-generation scientists across Africa with support from the

In [13]:
youtube_url = "https://www.youtube.com/watch?v=KKNCiRWd_j0"
outputs_dir = "outputs/"

long_summary, short_summary = summarize_youtube_video(youtube_url, outputs_dir)

print("Summaries:")
print("=" * 80)
print("Long summary:")
print("=" * 80)
print(long_summary)
print()

print("=" * 80)
print("Video - TL;DR")
print("=" * 80)
print(short_summary)

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2024.08.06 from yt-dlp/yt-dlp [4d9231208] (pip) API
[debug] params: {'format': 'bestaudio/best', 'postprocessors': [{'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192'}], 'outtmpl': 'outputs//raw_audio/%(title)s.%(ext)s', 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.70 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.85+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.07.04, m

Downloading video from https://www.youtube.com/watch?v=KKNCiRWd_j0
[youtube] Extracting URL: https://www.youtube.com/watch?v=KKNCiRWd_j0
[youtube] KKNCiRWd_j0: Downloading webpage
[youtube] KKNCiRWd_j0: Downloading ios player API JSON
[youtube] KKNCiRWd_j0: Downloading web creator player API JSON


[debug] Loading youtube-nsig.a87a9450 from cache
[debug] [youtube] Decrypted nsig Mfg279zVdbiruznw => FOI5IYCDqXH9rw


[youtube] KKNCiRWd_j0: Downloading m3u8 information


[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id


[info] KKNCiRWd_j0: Downloading 1 format(s): 251


[debug] Invoking http downloader on "https://rr1---sn-5ualdnsr.googlevideo.com/videoplayback?expire=1724425642&ei=SlHIZviFNOaokucPxvfN0QE&ip=34.73.47.219&id=o-AJlyVrE0fTyHpUhwhEydvxpwjnXNCmt3b5O4BclwzPns&itag=251&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=DS&mm=31%2C26&mn=sn-5ualdnsr%2Csn-a5mlrnlz&ms=au%2Conr&mv=m&mvi=1&pl=20&initcwndbps=5438750&vprv=1&svpuc=1&mime=audio%2Fwebm&rqh=1&gir=yes&clen=18002272&dur=1321.561&lmt=1716840099826581&mt=1724403653&fvip=1&keepalive=yes&c=IOS&txp=4532434&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cvprv%2Csvpuc%2Cmime%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&sig=AJfQdSswRQIgBMRqPkjm2bVQOI53OiPeD094MJAKmOa20NBxTaXLvKACIQCGStKWM9n_iECRuucnRvrdeQlP6FeWRA9f4YocrOD8hA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AGtxev0wRAIgLgjfOSMQ6xrX-hf5C3Z66a-rsRycRxn8MsyS8uf3pLICICL2h8IOilch0PrG0tS6mnOzY6iZC2zZBR6xcTzdwuv-"


[download] Destination: outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm
[download] 100% of   17.17MiB in 00:00:00 at 39.05MiB/s  


[debug] ffmpeg command line: ffprobe -show_streams 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm'


[ExtractAudio] Destination: outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.mp3


[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm' -vn -acodec libmp3lame -b:a 192.0k -movflags +faststart 'file:outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.mp3'


Deleting original file outputs//raw_audio/What Is an AI Anyway？ ｜ Mustafa Suleyman ｜ TED.webm (pass -k to keep)
Chunking audio to 600 second segments...
Chunking 3 chunks...


KeyboardInterrupt: 