# **Youtube Videos Transcription with OpenAI's Whisper**

[![blog post shield](https://img.shields.io/static/v1?label=&message=Blog%20post&color=blue&style=for-the-badge&logo=openai&link=https://openai.com/blog/whisper)](https://openai.com/blog/whisper)
[![notebook shield](https://img.shields.io/static/v1?label=&message=Notebook&color=blue&style=for-the-badge&logo=googlecolab&link=https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)](https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)
[![repository shield](https://img.shields.io/static/v1?label=&message=Repository&color=blue&style=for-the-badge&logo=github&link=https://github.com/openai/whisper)](https://github.com/openai/whisper)
[![paper shield](https://img.shields.io/static/v1?label=&message=Paper&color=blue&style=for-the-badge&link=https://cdn.openai.com/papers/whisper.pdf)](https://cdn.openai.com/papers/whisper.pdf)
[![model card shield](https://img.shields.io/static/v1?label=&message=Model%20card&color=blue&style=for-the-badge&link=https://github.com/openai/whisper/blob/main/model-card.md)](https://github.com/openai/whisper/blob/main/model-card.md)

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

This Notebook will guide you through the transcription of a Youtube video using Whisper. You'll be able to explore most inference parameters or use the Notebook as-is to store the transcript and video audio in your Google Drive.

In [9]:
! pip install -Uq yt-dlp
! pip install -Uq transformers
! pip install -Uq faster-whisper

import sys
import warnings
# import whisper
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.5/39.5 MB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.6/38.6 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m111.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25h

Using device: cuda:0


In [22]:

#@markdown ---
#@markdown #### **Youtube video or playlist**
URL = "https://www.youtube.com/watch?v=UdxSCFmUk9o" #@param {type:"string"}

video_path_local_list = []

ydl_opts = {
    'format': 'm4a/bestaudio/best',
    'outtmpl': '%(title)s.%(ext)s',
    # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
    'postprocessors': [{  # Extract audio using ffmpeg
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'wav',
    }]
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    error_code = ydl.download([URL])
    list_video_info = [ydl.extract_info(URL, download=False)]

for video_info in list_video_info:
    video_path_local_list.append(Path(f"{video_info['id']}.wav"))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


[youtube] Extracting URL: https://www.youtube.com/watch?v=UdxSCFmUk9o
[youtube] UdxSCFmUk9o: Downloading webpage
[youtube] UdxSCFmUk9o: Downloading tv client config
[youtube] UdxSCFmUk9o: Downloading player e7567ecf
[youtube] UdxSCFmUk9o: Downloading tv player API JSON
[youtube] UdxSCFmUk9o: Downloading ios player API JSON
[youtube] UdxSCFmUk9o: Downloading m3u8 information
[info] UdxSCFmUk9o: Downloading 1 format(s): 140
[download] Destination: Stanford CS153： Infra at Scale - Anthropic Cofounder Ben Mann on Scaling Frontier AI Systems.m4a
[download] 100% of   38.83MiB in 00:00:01 at 24.11MiB/s  
[FixupM4a] Correcting container of "Stanford CS153： Infra at Scale - Anthropic Cofounder Ben Mann on Scaling Frontier AI Systems.m4a"
[ExtractAudio] Destination: Stanford CS153： Infra at Scale - Anthropic Cofounder Ben Mann on Scaling Frontier AI Systems.wav
Deleting original file Stanford CS153： Infra at Scale - Anthropic Cofounder Ben Mann on Scaling Frontier AI Systems.m4a (pass -k to keep

In [13]:

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16")
batched_model = BatchedInferencePipeline(model=model)

In [16]:
segments, info = batched_model.transcribe(str(video_path_local), batch_size=16,
                                          language="en",
                                          )

In [18]:
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

Detected language 'en' with probability 1.000000


In [19]:
results = []
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    results.append([segment.start, segment.end, segment.text])

[1.17s -> 29.09s]  Well, thanks for coming, Ben. Thanks for having me. Give us a sense of the scale that Anthropic's at right now. Yeah, so in terms of actual numbers, I don't want to give specifics, but you can Google it and see what you find. What I can say is that in the last year, we've 10xed our revenue, and we've, in the last three months leading up to December, we 10xed our revenue just in the coding segment.
[29.09s -> 54.61s]  So we're seeing absolutely explosive growth in all areas and having a pretty fun time trying to serve all that traffic. And how did you get to where you are now? In life? Yeah. Yeah. I guess I started thinking about computer science in undergrad and wasn't one of these people who starts coding when they were five.
[54.67s -> 84.34s]  And I just fell in love with it. I originally thought I wanted to be a mechanical engineer and do robotics, but I hated mechanical engineering and I hated robotics when I took the intro classes. And computer science just kin

In [28]:
# save the text to f"{video_info["title"]}.txt
with open(f"{video_info['title']}.txt", "w") as f:
    f.writelines([r[-1] for r in results])