<a href="https://colab.research.google.com/github/bemxio/colab-notebooks/blob/main/FasterWhisperDemo/FasterWhisperDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Faster Whisper (YT-DLP variant)

Faster Whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster than OpenAI's Whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

And this is a Google Colab demo made for it, for fun.

#### Install required dependencies

In [None]:
%pip install faster-whisper srt yt-dlp

#### Set parameters for Faster Whisper and upload the audio file

In [None]:
import pathlib

from moviepy.editor import ipython_display
from google.colab import files

# constants set by the user in the notebook
SOURCE_URL = "" # @param {type: "string"}

MODEL = "large-v3" # @param ["tiny.en", "tiny", "base.en", "base", "small.en", "small", "medium.en", "medium", "large-v1", "large-v2", "large-v3", "distil-large-v2", "distil-medium.en", "distil-small.en", "distil-large-v3", "large-v3-turbo"]
LANGUAGE = "en" # @param ["af", "am", "ar", "as", "az", "ba", "be", "bg", "bn", "bo", "br", "bs", "ca", "cs", "cy", "da", "de", "el", "en", "es", "et", "eu", "fa", "fi", "fo", "fr", "gl", "gu", "ha", "haw", "he", "hi", "hr", "ht", "hu", "hy", "id", "is", "it", "ja", "jw", "ka", "kk", "km", "kn", "ko", "la", "lb", "ln", "lo", "lt", "lv", "mg", "mi", "mk", "ml", "mn", "mr", "ms", "mt", "my", "ne", "nl", "nn", "no", "oc", "pa", "pl", "ps", "pt", "ro", "ru", "sa", "sd", "si", "sk", "sl", "sn", "so", "sq", "sr", "su", "sv", "sw", "ta", "te", "tg", "th", "tk", "tl", "tr", "tt", "uk", "ur", "uz", "vi", "yi", "yo", "zh", "yue"]
TASK = "transcribe" # @param ["transcribe", "translate"]

# download the video
!python3 -m yt_dlp --no-simulate --print-to-file "%(title)s.%(ext)s" filename.txt "{SOURCE_URL}" --output "%(title)s.%(ext)s"

# get the path of the video
with open("filename.txt", "r", encoding="utf-8") as file:
    path = pathlib.Path(file.read().strip())

# delete the filename file
!rm filename.txt

# show a preview of the audio
ipython_display(str(path), filetype="audio", maxduration=300)

#### Process the audio file with Faster Whisper

In [None]:
from datetime import timedelta

from faster_whisper import WhisperModel
import srt

# initialize neccecary variables
model = WhisperModel(MODEL, device="cuda", compute_type="float16")
subtitles = []
transcription = ""

# initialize the transcription generator
segments, _ = model.transcribe(path, language=LANGUAGE, task=TASK)

# transcribe the audio
for index, segment in enumerate(segments):
    start = timedelta(seconds=segment.start)
    end = timedelta(seconds=segment.end)
    text = segment.text.strip()

    print(f"[{start} --> {end}] {text}")

    subtitles.append(srt.Subtitle(
        index=index + 1,
        start=start,
        end=end,
        content=text
    ))
    transcription += text + "\n"

# save the subtitles and the transcription to seperate files
with open(path.with_suffix(".srt"), "w") as file:
    file.write(srt.compose(subtitles))

with open(path.with_suffix(".txt"), "w") as file:
    file.write(transcription)

Congratulations! You can now access all of the files Faster Whisper generated in the Files tab (that little folder on the left bar).

Download stuff you need or generate more stuff if you want.