# Transcribe the hooks of youtube videos




1.   Use Youtube's data API, search for the relevant videos, or specify a list of video IDs
2.   Download the videos
3.   Extract the first 10 seconds of audio
4.   Use OpenAI's Whisper model to transcribe the audio and get the data


To use youtube's search API you have to create a key as described here https://developers.google.com/youtube/v3/quickstart/python

However, you can skip this part and just provide the list of video ids directly



In [None]:
!pip install --upgrade google-api-python-client
!pip install --upgrade google-auth-oauthlib google-auth-httplib2
!pip install --upgrade --force-reinstall "git+https://github.com/ytdl-org/youtube-dl.git"
!pip install git+https://github.com/openai/whisper.git
!sudo apt update && sudo apt install ffmpeg
!pip install moviepy

In [25]:
# IMPORTANT - add the API key to colab secrets managed on the left
# follow instructions here https://developers.google.com/youtube/v3/quickstart/python

# step 1: search for relevant youtube videos and and get the list of video ids
from google.colab import userdata
from googleapiclient.discovery import build
from datetime import datetime, timedelta

youtube = build('youtube', 'v3', developerKey=userdata.get('YOUTUBE_API_KEY'))

SEARCH_PARAM = "Fireship"
request = youtube.search().list(
    part='snippet',
    q=SEARCH_PARAM,
    type='video',
    order='viewCount',
    maxResults=25,
    publishedAfter=datetime(2024, 1, 1).isoformat() + 'Z',
    videoDuration='short',
    regionCode='US',
    relevanceLanguage='en'
)
response = request.execute()

In [26]:
for item in response["items"]:
    title = item["snippet"]["title"]
    if title.isascii():
        date = datetime.strptime(
            item["snippet"]["publishTime"], "%Y-%m-%dT%H:%M:%SZ"
        ).strftime("%Y-%m-%d")
        print(f"Published at: {date} | {title=}, Video ID: {item['id']['videoId']}")

Published at: 2024-02-23 | title='Google has the best AI now, but there&#39;s a problem...', Video ID: xPA0LFzUDiE
Published at: 2024-02-13 | title='how god programmed birds probably', Video ID: X8LglXSG53A
Published at: 2024-03-07 | title='Nvidia CUDA in 100 Seconds', Video ID: pPStdjuYzSI
Published at: 2024-03-02 | title='Elon&#39;s bombshell lawsuit against OpenAI', Video ID: KbzGy3whpy0
Published at: 2024-01-24 | title='real HTML programmers debug in 3D', Video ID: gGWQfV1FCis
Published at: 2024-03-08 | title='Apple drops ban hammer on Epic Games over mean tweet', Video ID: wbQwD3QS19I
Published at: 2024-01-04 | title='Pascal in 100 Seconds', Video ID: K9mzg8ueiYA
Published at: 2024-02-22 | title='Expo in 100 Seconds', Video ID: vFW_TxKLyrE
Published at: 2024-02-29 | title='Drizzle ORM in 100 Seconds', Video ID: i_mAHOhpBSA
Published at: 2024-03-18 | title='Erlang in 100 Seconds', Video ID: M7uo5jmFDUw
Published at: 2024-01-31 | title='History of Entire Frontend under a minute - In

In [27]:
video_ids = [item['id']['videoId'] for item in response['items'] if item["snippet"]["title"].isascii()]

In [7]:
# using youtube-dl yields this error
# ERROR: WARNING: unable to obtain file audio codec with ffprobe
#!youtube-dl -x --audio-format mp3 https://www.youtube.com/watch?v=07x_FurAq5s

In [28]:
# step 2: download the videos
import youtube_dl

ydl_opts = { 'outtmpl': './videos/%(title)s-%(id)s.%(ext)s'}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    ydl.download([f'https://www.youtube.com/watch?v={id}' for id in video_ids])

[youtube] xPA0LFzUDiE: Downloading webpage




[dashsegments] Total fragments: 14
[download] Destination: ./videos/Google has the best AI now, but there's a problem...-xPA0LFzUDiE.f313.webm
[download] 100% of 139.53MiB in 00:05
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Google has the best AI now, but there's a problem...-xPA0LFzUDiE.f140.m4a
[download] 100% of 3.63MiB in 00:00
[ffmpeg] Merging formats into "./videos/Google has the best AI now, but there's a problem...-xPA0LFzUDiE.mkv"
Deleting original file ./videos/Google has the best AI now, but there's a problem...-xPA0LFzUDiE.f313.webm (pass -k to keep)
Deleting original file ./videos/Google has the best AI now, but there's a problem...-xPA0LFzUDiE.f140.m4a (pass -k to keep)
[youtube] X8LglXSG53A: Downloading webpage




[dashsegments] Total fragments: 2
[download] Destination: ./videos/how god programmed birds probably-X8LglXSG53A.f137.mp4
[download] 100% of 12.72MiB in 00:00
[dashsegments] Total fragments: 1
[download] Destination: ./videos/how god programmed birds probably-X8LglXSG53A.f251.webm
[download] 100% of 716.12KiB in 00:00
[ffmpeg] Merging formats into "./videos/how god programmed birds probably-X8LglXSG53A.mkv"
Deleting original file ./videos/how god programmed birds probably-X8LglXSG53A.f137.mp4 (pass -k to keep)
Deleting original file ./videos/how god programmed birds probably-X8LglXSG53A.f251.webm (pass -k to keep)
[youtube] pPStdjuYzSI: Downloading webpage




[dashsegments] Total fragments: 10
[download] Destination: ./videos/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.f313.webm
[download] 100% of 90.98MiB in 00:03
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.f140.m4a
[download] 100% of 2.97MiB in 00:00
[ffmpeg] Merging formats into "./videos/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.mkv"
Deleting original file ./videos/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.f313.webm (pass -k to keep)
Deleting original file ./videos/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.f140.m4a (pass -k to keep)
[youtube] KbzGy3whpy0: Downloading webpage




[dashsegments] Total fragments: 13
[download] Destination: ./videos/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.f313.webm
[download] 100% of 129.01MiB in 00:05
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.f140.m4a
[download] 100% of 3.38MiB in 00:00
[ffmpeg] Merging formats into "./videos/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.mkv"
Deleting original file ./videos/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.f313.webm (pass -k to keep)
Deleting original file ./videos/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.f140.m4a (pass -k to keep)
[youtube] gGWQfV1FCis: Downloading webpage
[dashsegments] Total fragments: 1
[download] Destination: ./videos/real HTML programmers debug in 3D-gGWQfV1FCis.f137.mp4
[download] 100% of 7.05MiB in 00:00
[dashsegments] Total fragments: 1
[download] Destination: ./videos/real HTML programmers debug in 3D-gGWQfV1FCis.f140.m4a
[download] 100% of 780.12



[dashsegments] Total fragments: 13
[download] Destination: ./videos/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.f313.webm
[download] 100% of 126.10MiB in 00:16
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.f140.m4a
[download] 100% of 3.34MiB in 00:00
[ffmpeg] Merging formats into "./videos/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.mkv"
Deleting original file ./videos/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.f313.webm (pass -k to keep)
Deleting original file ./videos/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.f140.m4a (pass -k to keep)
[youtube] K9mzg8ueiYA: Downloading webpage
[dashsegments] Total fragments: 8
[download] Destination: ./videos/Pascal in 100 Seconds-K9mzg8ueiYA.f313.webm
[download] 100% of 78.15MiB in 00:02
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Pascal in 100 Seconds-K9m



[dashsegments] Total fragments: 7
[download] Destination: ./videos/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.f313.webm
[download] 100% of 62.64MiB in 00:01
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.f140.m4a
[download] 100% of 2.69MiB in 00:00
[ffmpeg] Merging formats into "./videos/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.mkv"
Deleting original file ./videos/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.f313.webm (pass -k to keep)
Deleting original file ./videos/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.f140.m4a (pass -k to keep)
[youtube] M7uo5jmFDUw: Downloading webpage




[dashsegments] Total fragments: 9
[download] Destination: ./videos/Erlang in 100 Seconds-M7uo5jmFDUw.f313.webm
[download] 100% of 83.58MiB in 00:03
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Erlang in 100 Seconds-M7uo5jmFDUw.f140.m4a
[download] 100% of 2.53MiB in 00:00
[ffmpeg] Merging formats into "./videos/Erlang in 100 Seconds-M7uo5jmFDUw.mkv"
Deleting original file ./videos/Erlang in 100 Seconds-M7uo5jmFDUw.f313.webm (pass -k to keep)
Deleting original file ./videos/Erlang in 100 Seconds-M7uo5jmFDUw.f140.m4a (pass -k to keep)
[youtube] qnYvLh54PYE: Downloading webpage




[dashsegments] Total fragments: 3
[download] Destination: ./videos/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.f299.mp4
[download] 100% of 22.18MiB in 00:08
[dashsegments] Total fragments: 1
[download] Destination: ./videos/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.f251.webm
[download] 100% of 960.35KiB in 00:00
[ffmpeg] Merging formats into "./videos/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.mkv"
Deleting original file ./videos/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.f299.mp4 (pass -k to keep)
Deleting original file ./videos/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.f251.webm (pass -k to keep)
[youtube] 580FarXeIPU: Downloading webpage




[dashsegments] Total fragments: 1
[download] Destination: ./videos/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.f136.mp4
[download] 100% of 3.52MiB in 00:00
[dashsegments] Total fragments: 1
[download] Destination: ./videos/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.f251.webm
[download] 100% of 295.95KiB in 00:00
[ffmpeg] Merging formats into "./videos/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.mkv"
Deleting original file ./videos/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.f136.mp4 (pass -k to keep)
Deleting original file ./videos/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.f251.webm (pass -k to keep)
[youtube] xrdyWRr5OGQ: Downloading webpage




[dashsegments] Total fragments: 2
[download] Destination: ./videos/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.f136.mp4
[download] 100% of 13.28MiB in 00:03
[dashsegments] Total fragments: 1
[download] Destination: ./videos/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.f251.webm
[download] 100% of 1.03MiB in 00:00
[ffmpeg] Merging formats into "./videos/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.mkv"
Deleting original file ./videos/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.f136.mp4 (pass -k to keep)
Deleting original file ./videos/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.f251.webm (pass -k to keep)
[youtube] Iu8XMUPJ6gE: Downloading webpage
[dashsegments] Total fragments: 1
[download] Destination: ./vi



[dashsegments] Total fragments: 1
[download] Destination: ./videos/gangastar fireship #gangastergang-qQ-meCF0478.f137.mp4
[download] 100% of 8.59MiB in 00:03
[dashsegments] Total fragments: 1
[download] Destination: ./videos/gangastar fireship #gangastergang-qQ-meCF0478.f251.webm
[download] 100% of 248.30KiB in 00:00
[ffmpeg] Merging formats into "./videos/gangastar fireship #gangastergang-qQ-meCF0478.mkv"
Deleting original file ./videos/gangastar fireship #gangastergang-qQ-meCF0478.f137.mp4 (pass -k to keep)
Deleting original file ./videos/gangastar fireship #gangastergang-qQ-meCF0478.f251.webm (pass -k to keep)


In [30]:
# step 3: extract the audio after clipping to the first 10 seconds
import os
from moviepy.editor import VideoFileClip

os.makedirs("audio", exist_ok=True)

MAX_DURATION = 8  # seconds
for file_name in os.listdir("./videos"):
    video = VideoFileClip(os.path.join("videos", file_name)).subclip(0, MAX_DURATION)
    audio = video.audio
    audio.write_audiofile(os.path.join("audio",
                                       "".join(file_name.split(".")[:-1]) + ".mp3"))

chunk:  95%|█████████▍| 209/221 [00:33<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Pascal in 100 Seconds-K9mzg8ueiYA.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  27%|██▋       | 48/177 [00:00<00:00, 473.59it/s, now=None][A
chunk:  54%|█████▍    | 96/177 [00:00<00:00, 386.00it/s, now=None][A
chunk:  77%|███████▋  | 136/177 [00:00<00:00, 319.50it/s, now=None][A
chunk:  96%|█████████▌| 170/177 [00:00<00:00, 319.60it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:34<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:34<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Google has the best AI now, but there's a problem-xPA0LFzUDiE.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  34%|███▍      | 60/177 [00:00<00:00, 587.29it/s, now=None][A
chunk:  67%|██████▋   | 119/177 [00:00<00:00, 494.28it/s, now=None][A
chunk:  96%|█████████▌| 170/177 [00:00<00:00, 466.69it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:35<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:35<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Erlang in 100 Seconds-M7uo5jmFDUw.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  20%|██        | 36/177 [00:00<00:00, 281.61it/s, now=None][A
chunk:  58%|█████▊    | 102/177 [00:00<00:00, 464.74it/s, now=None][A
chunk:  96%|█████████▌| 170/177 [00:00<00:00, 549.85it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:36<00:00, 735.73it/s, now=None]

MoviePy - Done.
MoviePy - Writing audio in audio/Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  46%|████▌     | 81/177 [00:00<00:00, 781.70it/s, now=None][A
chunk:  90%|█████████ | 160/177 [00:00<00:00, 664.12it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:36<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:37<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  57%|█████▋    | 101/177 [00:00<00:00, 994.41it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:37<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:37<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  56%|█████▌    | 99/177 [00:00<00:00, 988.13it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:37<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:38<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/gangastar fireship #gangastergang-qQ-meCF0478.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  44%|████▎     | 77/177 [00:00<00:00, 759.77it/s, now=None][A
chunk:  86%|████████▋ | 153/177 [00:00<00:00, 615.32it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:38<00:00, 735.73it/s, now=None]

MoviePy - Done.
MoviePy - Writing audio in audio/Wings of fire ships I despise basically _ #banwhirlpoolxanemone #ilovemycat #glory #wingsoffire-Iu8XMUPJ6gE.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  33%|███▎      | 58/177 [00:00<00:00, 578.87it/s, now=None][A
chunk:  66%|██████▌   | 116/177 [00:00<00:00, 530.98it/s, now=None][A
chunk:  96%|█████████▌| 170/177 [00:00<00:00, 517.00it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:39<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:39<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Drizzle ORM in 100 Seconds-i_mAHOhpBSA.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  37%|███▋      | 66/177 [00:00<00:00, 656.71it/s, now=None][A
chunk:  75%|███████▍  | 132/177 [00:00<00:00, 622.40it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:39<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:40<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Nvidia CUDA in 100 Seconds-pPStdjuYzSI.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  41%|████      | 72/177 [00:00<00:00, 705.00it/s, now=None][A
chunk:  81%|████████  | 143/177 [00:00<00:00, 647.40it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:40<00:00, 735.73it/s, now=None]

MoviePy - Done.
MoviePy - Writing audio in audio/real HTML programmers debug in 3D-gGWQfV1FCis.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  52%|█████▏    | 92/177 [00:00<00:00, 899.19it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:40<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:41<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/Expo in 100 Seconds-vFW_TxKLyrE.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  40%|████      | 71/177 [00:00<00:00, 708.43it/s, now=None][A
chunk:  80%|████████  | 142/177 [00:00<00:00, 577.06it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:41<00:00, 735.73it/s, now=None]

MoviePy - Done.
MoviePy - Writing audio in audio/how god programmed birds probably-X8LglXSG53A.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  30%|██▉       | 53/177 [00:00<00:00, 527.11it/s, now=None][A
chunk:  60%|█████▉    | 106/177 [00:00<00:00, 401.53it/s, now=None][A
chunk:  84%|████████▎ | 148/177 [00:00<00:00, 401.51it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:42<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:42<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  28%|██▊       | 50/177 [00:00<00:00, 467.04it/s, now=None][A
chunk:  55%|█████▍    | 97/177 [00:00<00:00, 411.71it/s, now=None][A
chunk:  79%|███████▊  | 139/177 [00:00<00:00, 342.82it/s, now=None][A
chunk:  99%|█████████▉| 175/177 [00:00<00:00, 340.02it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:42<00:00, 735.73it/s, now=None]

MoviePy - Done.


chunk:  95%|█████████▍| 209/221 [00:43<00:00, 735.73it/s, now=None]

MoviePy - Writing audio in audio/History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.mp3



chunk:   0%|          | 0/177 [00:00<?, ?it/s, now=None][A
chunk:  20%|██        | 36/177 [00:00<00:00, 327.09it/s, now=None][A
chunk:  39%|███▉      | 69/177 [00:00<00:00, 328.28it/s, now=None][A
chunk:  58%|█████▊    | 102/177 [00:00<00:00, 288.37it/s, now=None][A
chunk:  81%|████████  | 143/177 [00:00<00:00, 324.26it/s, now=None][A
chunk:  99%|█████████▉| 176/177 [00:00<00:00, 305.06it/s, now=None][A
chunk:  95%|█████████▍| 209/221 [00:43<00:00, 735.73it/s, now=None]

MoviePy - Done.


In [10]:
import os
import numpy as np

try:
    import tensorflow  # required in Colab to avoid protobuf compatibility issues
except ImportError:
    pass

import torch
import pandas as pd
import whisper
import torchaudio

from tqdm import tqdm


DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

In [15]:
model = whisper.load_model("base.en")
print(
    f"Model is {'multilingual' if model.is_multilingual else 'English-only'} "
    f"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters."
)

Model is English-only and has 71,825,408 parameters.


In [31]:
transcribed_result = {}
for file_name in tqdm(os.listdir("./audio")):
  print(f"transcribing {file_name=}")
  text = model.transcribe(os.path.join("audio", file_name))['text']
  transcribed_result[file_name] = text

  0%|          | 0/15 [00:00<?, ?it/s]




transcribing file_name='Pascal in 100 Seconds-K9mzg8ueiYA.mp3'





transcribing file_name='4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.mp3'





transcribing file_name='how god programmed birds probably-X8LglXSG53A.mp3'





transcribing file_name='History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.mp3'





transcribing file_name='real HTML programmers debug in 3D-gGWQfV1FCis.mp3'





transcribing file_name='Drizzle ORM in 100 Seconds-i_mAHOhpBSA.mp3'





transcribing file_name='Nvidia CUDA in 100 Seconds-pPStdjuYzSI.mp3'





transcribing file_name="Google has the best AI now, but there's a problem-xPA0LFzUDiE.mp3"





transcribing file_name='Erlang in 100 Seconds-M7uo5jmFDUw.mp3'





transcribing file_name='Apple drops ban hammer on Epic Games over mean tweet-wbQwD3QS19I.mp3'





transcribing file_name='Wings of fire ships I despise basically _ #banwhirlpoolxanemone #ilovemycat #glory #wingsoffire-Iu8XMUPJ6gE.mp3'





transcribing file_name="Elon's bombshell lawsuit against OpenAI-KbzGy3whpy0.mp3"





transcribing file_name='gangastar fireship #gangastergang-qQ-meCF0478.mp3'





transcribing file_name='Fireships sent to sea by the Vikings  - Symbol of a Journey After Death #shorts #history-xrdyWRr5OGQ.mp3'





transcribing file_name='Expo in 100 Seconds-vFW_TxKLyrE.mp3'


In [32]:
transcribed_result

{'Pascal in 100 Seconds-K9mzg8ueiYA.mp3': ' Pascal, a procedural high-level programming language, famous for teaching a generation of kids from the 70s and 80s how to code. It was created by Nick-',
 '4  time fire #ship #water #bass #vikings #acapella #new #sad #sunset #trending #lyrics #music-580FarXeIPU.mp3': ' You All hands',
 'how god programmed birds probably-X8LglXSG53A.mp3': " Just look at this flock of birds. It's so majestic, but how do they all fly together in unison like that? It's not magic. It's an algorithm built into nature and we can",
 'History of Entire Frontend under a minute - Inspired from @Fireship  _ Tamil-qnYvLh54PYE.mp3': " I don't want to put this, it leads to This Choice isbay dock Well if you're more than 60 refreshments those days",
 'real HTML programmers debug in 3D-gGWQfV1FCis.mp3': " If you're an HTML programmer, you've likely seen crazy nested code like this. Ugly code is one thing, but the real problem is that your UI disappeared because you've got a 

In [37]:
t = []
for k, v in transcribed_result.items():
  for vid in video_ids:
    if vid in k and vid not in ["580FarXeIPU", "Iu8XMUPJ6gE"]:
      url = f'https://www.youtube.com/watch?v={vid}'
      title = k.split("-")[0]
      t.append({"title": title, "url": url, "hook": v})
      print(title)
      print(url)

Pascal in 100 Seconds
https://www.youtube.com/watch?v=K9mzg8ueiYA
how god programmed birds probably
https://www.youtube.com/watch?v=X8LglXSG53A
History of Entire Frontend under a minute 
https://www.youtube.com/watch?v=qnYvLh54PYE
real HTML programmers debug in 3D
https://www.youtube.com/watch?v=gGWQfV1FCis
Drizzle ORM in 100 Seconds
https://www.youtube.com/watch?v=i_mAHOhpBSA
Nvidia CUDA in 100 Seconds
https://www.youtube.com/watch?v=pPStdjuYzSI
Google has the best AI now, but there's a problem
https://www.youtube.com/watch?v=xPA0LFzUDiE
Erlang in 100 Seconds
https://www.youtube.com/watch?v=M7uo5jmFDUw
Apple drops ban hammer on Epic Games over mean tweet
https://www.youtube.com/watch?v=wbQwD3QS19I
Elon's bombshell lawsuit against OpenAI
https://www.youtube.com/watch?v=KbzGy3whpy0
gangastar fireship #gangastergang
https://www.youtube.com/watch?v=qQ-meCF0478
Fireships sent to sea by the Vikings  
https://www.youtube.com/watch?v=xrdyWRr5OGQ
Expo in 100 Seconds
https://www.youtube.com/wat

In [41]:
import pandas as pd
pd.DataFrame(t)

Unnamed: 0,title,url,hook
0,Pascal in 100 Seconds,https://www.youtube.com/watch?v=K9mzg8ueiYA,"Pascal, a procedural high-level programming l..."
1,how god programmed birds probably,https://www.youtube.com/watch?v=X8LglXSG53A,Just look at this flock of birds. It's so maj...
2,History of Entire Frontend under a minute,https://www.youtube.com/watch?v=qnYvLh54PYE,"I don't want to put this, it leads to This Ch..."
3,real HTML programmers debug in 3D,https://www.youtube.com/watch?v=gGWQfV1FCis,"If you're an HTML programmer, you've likely s..."
4,Drizzle ORM in 100 Seconds,https://www.youtube.com/watch?v=i_mAHOhpBSA,"Drizzle ORM, a lightweight set of tools that ..."
5,Nvidia CUDA in 100 Seconds,https://www.youtube.com/watch?v=pPStdjuYzSI,"CUDA, a parallel computing platform that allo..."
6,"Google has the best AI now, but there's a problem",https://www.youtube.com/watch?v=xPA0LFzUDiE,This has been the craziest week ever. That is...
7,Erlang in 100 Seconds,https://www.youtube.com/watch?v=M7uo5jmFDUw,"Erlang, a functional fault-tolerant programmi..."
8,Apple drops ban hammer on Epic Games over mean...,https://www.youtube.com/watch?v=wbQwD3QS19I,"Earlier this week, Apple took a massive L. It..."
9,Elon's bombshell lawsuit against OpenAI,https://www.youtube.com/watch?v=KbzGy3whpy0,It's no secret that OpenAI is not open based ...
