<a href="https://colab.research.google.com/github/BongeZagh/python-practice/blob/main/Whisper_Youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Youtube Videos Transcription with OpenAI's Whisper**

[![blog post shield](https://img.shields.io/static/v1?label=&message=Blog%20post&color=blue&style=for-the-badge&logo=openai&link=https://openai.com/blog/whisper)](https://openai.com/blog/whisper)
[![notebook shield](https://img.shields.io/static/v1?label=&message=Notebook&color=blue&style=for-the-badge&logo=googlecolab&link=https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)](https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)
[![repository shield](https://img.shields.io/static/v1?label=&message=Repository&color=blue&style=for-the-badge&logo=github&link=https://github.com/openai/whisper)](https://github.com/openai/whisper)
[![paper shield](https://img.shields.io/static/v1?label=&message=Paper&color=blue&style=for-the-badge&link=https://cdn.openai.com/papers/whisper.pdf)](https://cdn.openai.com/papers/whisper.pdf)
[![model card shield](https://img.shields.io/static/v1?label=&message=Model%20card&color=blue&style=for-the-badge&link=https://github.com/openai/whisper/blob/main/model-card.md)](https://github.com/openai/whisper/blob/main/model-card.md)

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

This Notebook will guide you through the transcription of a Youtube video using Whisper. You'll be able to explore most inference parameters or use the Notebook as-is to store the transcript and video audio in your Google Drive.

In [23]:
#@markdown # **Check GPU type** 🕵️

#@markdown The type of GPU you get assigned in your Colab session defined the speed at which the video will be transcribed.
#@markdown The higher the number of floating point operations per second (FLOPS), the faster the transcription.
#@markdown But even the least powerful GPU available in Colab is able to run any Whisper model.
#@markdown Make sure you've selected `GPU` as hardware accelerator for the Notebook (Runtime &rarr; Change runtime type &rarr; Hardware accelerator).

#@markdown |  GPU   |  GPU RAM   | FP32 teraFLOPS |     Availability   |
#@markdown |:------:|:----------:|:--------------:|:------------------:|
#@markdown |  T4    |    16 GB   |       8.1      |         Free       |
#@markdown | P100   |    16 GB   |      10.6      |      Colab Pro     |
#@markdown | V100   |    16 GB   |      15.7      |  Colab Pro (Rare)  |

#@markdown ---
#@markdown **Factory reset your Notebook's runtime if you want to get assigned a new GPU.**

!nvidia-smi -L

!nvidia-smi

GPU 0: Tesla T4 (UUID: GPU-1068d293-1869-ee55-bb25-4c466eb34ce4)
Wed May 15 05:43:28 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P0              28W /  70W |   5235MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
   

In [None]:
#@markdown # **Install libraries** 🏗️
#@markdown This cell will take a little while to download several libraries, including Whisper.

#@markdown ---

! pip install git+https://github.com/openai/whisper.git
! pip install pytube

import sys
import warnings
import whisper
from pathlib import Path
import pytube
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-3xfyjn9b
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-3xfyjn9b
  Resolved https://github.com/openai/whisper.git to commit ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [3]:
#@markdown # **Optional:** Save data in Google Drive 💾
#@markdown Enter a Google Drive path and run this cell if you want to store the results inside Google Drive.

# Uncomment to copy generated images to drive, faster than downloading directly from colab in my experience.
from google.colab import drive
drive_mount_path = Path("/") / "content" / "drive"
drive.mount(str(drive_mount_path))
drive_mount_path /= "My Drive"
#@markdown ---
drive_path = "Colab Notebooks/Whisper Youtube" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change your Google Drive path.**

drive_whisper_path = drive_mount_path / Path(drive_path.lstrip("/"))
drive_whisper_path.mkdir(parents=True, exist_ok=True)

Mounted at /content/drive


In [4]:
#@markdown # **Model selection** 🧠

#@markdown As of the first public release, there are 4 pre-trained options to play with:

#@markdown |  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
#@markdown |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
#@markdown |  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
#@markdown |  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
#@markdown | small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
#@markdown | medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
#@markdown | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |

#@markdown ---
Model = 'medium' #@param ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large']
#@markdown ---
#@markdown **Run this cell again if you change the model.**

whisper_model = whisper.load_model(Model)

if Model in whisper.available_models():
    display(Markdown(
        f"**{Model} model is selected.**"
    ))
else:
    display(Markdown(
        f"**{Model} model is no longer available.**<br /> Please select one of the following:<br /> - {'<br /> - '.join(whisper.available_models())}"
    ))

100%|██████████████████████████████████████| 1.42G/1.42G [00:11<00:00, 135MiB/s]


**medium model is selected.**

In [5]:
!ls /content/drive/MyDrive/'Colab Notebooks'/'Whisper Youtube'  -alh

total 31M
-rw------- 1 root root  26K Sep  2  2023 '20 Double Tops and Bottoms.srt'
-rw------- 1 root root  19K Sep  2  2023 '27 Swinging and Scalping.srt'
-rw------- 1 root root  35K Sep  2  2023 '28 Orders.srt'
-rw------- 1 root root 101K Sep  2  2023 '29 Protective Stops.srt'
-rw------- 1 root root  38K Sep  2  2023 '30 Actual Risk.srt'
-rw------- 1 root root  80K Sep  2  2023 '32 Scaling In.srt'
-rw------- 1 root root 153K Sep  2  2023  danqin.srt
-rw------- 1 root root  30M Sep  1  2023  sZnVzwo7V-Q.mp4
-rw------- 1 root root  94K Sep  2  2023  sZnVzwo7V-Q.srt


In [6]:
!cd /content/drive/MyDrive/BPAproject && ls -alh

total 1.2G
-rw------- 1 root root  57M Sep  2  2023 '19 Wedges.mp4'
-rw------- 1 root root  32M Sep  2  2023 '20 Double Tops and Bottoms.mp4'
-rw------- 1 root root  23M Sep  2  2023 '21 Triangles.mp4'
-rw------- 1 root root  16M Sep  2  2023 '22 Head and Shoulders.mp4'
-rw------- 1 root root  19M Sep  2  2023 '23 Rounding Tops and Bottoms.mp4'
-rw------- 1 root root  24M Sep  2  2023 '24 Climaxes.mp4'
-rw------- 1 root root  64M Sep  2  2023 '25 Measured Moves, Support, and Resistance.mp4'
-rw------- 1 root root  58M Sep  2  2023 "26 Probability and Trader's Equation.mp4"
-rw------- 1 root root  25M Sep  2  2023 '27 Swinging and Scalping.mp4'
-rw------- 1 root root  40M Sep  2  2023 '28 Orders.mp4'
-rw------- 1 root root 129M Sep  2  2023 '29 Protective Stops.mp4'
-rw------- 1 root root  46M Sep  2  2023 '30 Actual Risk.mp4'
-rw------- 1 root root  24M Sep  2  2023 '31 Protective Stops For Scalps.mp4'
-rw------- 1 root root  92M Sep  2  2023 '32 Scaling In.mp4'
-rw------- 1 root root 

### Compare files
I was want to make a file path auto comparison back then
本文分析了两个文件夹中的文件，并寻找它们之间通用的前缀。然后，它检查特定带有这些通用前缀的 .srt 文件是否存在于两个文件夹中，并打印仅存在于其中一个文件夹中的文件路径。

这个代码块的目的是查找和记录只有在其中一个文件夹（folder1 或 folder2）中存在的具有特定前缀 .srt 文件。它通过迭代通用的前缀集合并检查每个前缀的文件是否存在于两个文件夹中来实现这一点。如果一个文件仅存在于一个文件夹中，就会打印其路径。

更高层次的目标可能是识别和处理两种不同数据集（folder1 和 folder2）之间的差异或不一致性。例如，它可以用于在不同的数据集之间同步或合并数据，或用于比较和验证数据完整性。

In [None]:
import os

folder1 = "/content/drive/MyDrive/BPAproject"
folder2 = "/content/drive/MyDrive/Colab Notebooks/Whisper Youtube"

files1 = os.listdir(folder1)
files2 = os.listdir(folder2)

common_prefixes = set()

# 获取文件名前缀相同的集合
for file1 in files1:
    prefix = os.path.splitext(file1)[0]
    common_prefixes.add(prefix)

# 比较文件名前缀相同但后缀不同的文件
for file2 in files2:
    prefix = os.path.splitext(file2)[0]
    if prefix in common_prefixes:
        extension1 = os.path.splitext(file1)[1]
        extension2 = os.path.splitext(file2)[1]
        if extension1 != extension2:
            file1_path = os.path.join(folder1, file1)
            file2_path = os.path.join(folder2, file2)
            print(f"相同前缀但后缀不同的文件：{file1_path} 和 {file2_path}")
            common_prefixes.remove(prefix)  # 移除已经匹配过的前缀

# 列出只有一个后缀存在的文件
for prefix in common_prefixes:
    file1_path = os.path.join(folder1, f"{prefix}.mp4")
    file2_path = os.path.join(folder2, f"{prefix}.srt")
    if os.path.exists(file1_path) and not os.path.exists(file2_path):
        print(f"只有一个后缀存在的文件：{file1_path}")



## Install yt-dlp


In [14]:
# prompt:

!pip install yt-dlp


Collecting yt-dlp
  Downloading yt_dlp-2024.4.9-py3-none-any.whl (3.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting brotli (from yt-dlp)
  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m38.4 MB/s[0m eta [36m0:00:00[0m
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.4/194.4 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m50.6 MB/s[0m eta [36m0:00:00[0m
Collecting websockets>=12.0 (from yt-dlp

# download video with yt-dlp

In [15]:
!yt-dlp 'https://www.youtube.com/watch?v=GsahrKB24oM'

[youtube] Extracting URL: https://www.youtube.com/watch?v=GsahrKB24oM
[youtube] GsahrKB24oM: Downloading webpage
[youtube] GsahrKB24oM: Downloading ios player API JSON
[youtube] GsahrKB24oM: Downloading android player API JSON
[youtube] GsahrKB24oM: Downloading player db9cbc4e
[youtube] GsahrKB24oM: Downloading m3u8 information
[info] GsahrKB24oM: Downloading 1 format(s): 244+251
[download] Destination: [字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].f244.webm
[K[download] 100% of  166.79MiB in [1;37m00:00:04[0m at [0;32m39.16MiB/s[0m
[download] Destination: [字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].f251.webm
[K[download] 100% of   67.48MiB in [1;37m00:00:01[0m at [0;32m34.21MiB/s[0m
[Merger] Merging formats into "[字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].webm"
Deleting original file [字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].f244.webm (pass -k to keep)
Deleting original file [字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].f251.webm (pass -k to keep

# Check and rename the file(Chinese file name might leads to error)

In [21]:
#! mv '[字幕]丘成桐：中國數學發展不如美國1940年代 中國數學的現狀與未來 [GsahrKB24oM].webm' qiuchentong.webm
!ls -al



total 550620
drwxr-xr-x 1 root root      4096 May 15 04:19  .
drwxr-xr-x 1 root root      4096 May 15 04:02  ..
-rw------- 1 root root  32908920 May 15 04:11 '20 Double Tops and Bottoms.mp4'
-rw-r--r-- 1 root root  38553412 May 15 04:10 '20 Double Tops and Bottoms.wav'
drwxr-xr-x 4 root root      4096 May 14 20:30  .config
drwx------ 6 root root      4096 May 15 04:08  drive
-rw-r--r-- 1 root root 246706841 May 15 04:45  qiuchentong.mp4
-rw-r--r-- 1 root root 245630447 May  9 23:05  qiuchentong.webm
drwxr-xr-x 1 root root      4096 May 14 20:30  sample_data


# Convert webm to mp4

In [19]:
# prompt:

!ffmpeg -i qiuchentong.webm -c:v libx264 -crf 23 -c:a aac -b:a 128k qiuchentong.mp4


ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

In [22]:
#@markdown # **Video selection** 📺

#@markdown Enter the URL of the Youtube video you want to transcribe, wether you want to save the audio file in your Google Drive, and run the cell.

Type = "Google Drive" #@param ['Youtube video or playlist', 'Google Drive']
#@markdown ---
#@markdown #### **Youtube video or playlist**
URL = "https://www.youtube.com/watch?v=GsahrKB24oM&t=2890s" #@param {type:"string"}
store_audio = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### **Google Drive video, audio (mp4, wav), or folder containing video and/or audio files**
video_path = "/BPAproject/qiuchentong.mp4" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change the video.**

video_path_local_list = []

if Type == "Youtube video or playlist":

    try:
        list_video_yt = [pytube.YouTube(URL)]
    except Exception:
        try:
            list_video_yt = list(pytube.Playlist(URL).videos)
        except Exception:
            raise(RuntimeError(f"{URL} isn't recgnized."))

    for video_yt in list_video_yt:
        try:
            video_yt.check_availability()
            display(
                YouTubeVideo(video_yt.video_id)
            )
        except pytube.exceptions.VideoUnavailable:
            display(
                Markdown(f"**{URL} isn't available.**"),
            )
            raise(RuntimeError(f"{URL} isn't available."))
        video_path_local = Path(".").resolve() / (video_yt.video_id+".mp4")
        video_yt.streams.filter(
            type="audio",
            mime_type="audio/mp4",
            abr="48kbps"
        ).first().download(
            output_path = video_path_local.parent,
            filename = video_path_local.name
        )
        if store_audio:
            shutil.copy(video_path_local, drive_whisper_path / video_path_local.name)
        video_path_local_list.append(video_path_local)

elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
            if video_path_drive.is_file():
                display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
            elif video_path_drive.is_dir():
                display(Markdown(f"**Subfolders not supported.**"))
            else:
                display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
            video_path_local = Path(".").resolve() / (video_path_drive.name)
            shutil.copy(video_path_drive, video_path_local)
            video_path_local_list.append(video_path_local)
    elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

else:
    raise(TypeError("Please select supported input type."))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


**/content/drive/My Drive/BPAproject/qiuchentong.mp4 does not exist.**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [8]:
#@markdown # **Run the model** 🚀

#@markdown Run this cell to execute the transcription of the video. This can take a while and very based on the length of the video and the number of parameters of the model selected above.

#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**
#@markdown ---
language = "English" #@param ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba']
#@markdown > Language spoken in the audio, use `Auto detection` to let Whisper detect the language.
#@markdown ---
verbose = 'Live transcription' #@param ['Live transcription', 'Progress bar', 'None']
#@markdown > Whether to print out the progress and debug messages.
#@markdown ---
output_format = 'srt' #@param ['txt', 'vtt', 'srt', 'tsv', 'json', 'all']
#@markdown > Type of file to generate to record the transcription.
#@markdown ---
task = 'transcribe' #@param ['transcribe', 'translate']
#@markdown > Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`).
#@markdown ---

#@markdown <br/>

#@markdown ### **Optional: Fine tunning**
#@markdown ---
temperature = 0.15 #@param {type:"slider", min:0, max:1, step:0.05}
#@markdown > Temperature to use for sampling.
#@markdown ---
temperature_increment_on_fallback = 0.2 #@param {type:"slider", min:0, max:1, step:0.05}
#@markdown > Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.
#@markdown ---
best_of = 5 #@param {type:"integer"}
#@markdown > Number of candidates when sampling with non-zero temperature.
#@markdown ---
beam_size = 8 #@param {type:"integer"}
#@markdown > Number of beams in beam search, only applicable when temperature is zero.
#@markdown ---
patience = 1.0 #@param {type:"number"}
#@markdown > Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search.
#@markdown ---
length_penalty = -0.05 #@param {type:"slider", min:-0.05, max:1, step:0.05}
#@markdown > Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization.
#@markdown ---
suppress_tokens = "-1" #@param {type:"string"}
#@markdown > Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations.
#@markdown ---
initial_prompt = "" #@param {type:"string"}
#@markdown > Optional text to provide as a prompt for the first window.
#@markdown ---
condition_on_previous_text = True #@param {type:"boolean"}
#@markdown > if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
#@markdown ---
fp16 = True #@param {type:"boolean"}
#@markdown > whether to perform inference in fp16.
#@markdown ---
compression_ratio_threshold = 2.4 #@param {type:"number"}
#@markdown > If the gzip compression ratio is higher than this value, treat the decoding as failed.
#@markdown ---
logprob_threshold = -1.0 #@param {type:"number"}
#@markdown > If the average log probability is lower than this value, treat the decoding as failed.
#@markdown ---
no_speech_threshold = 0.6 #@param {type:"slider", min:-0.0, max:1, step:0.05}
#@markdown > If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence.
#@markdown ---

verbose_lut = {
    'Live transcription': True,
    'Progress bar': False,
    'None': None
}

args = dict(
    language = (None if language == "Auto detection" else language),
    verbose = verbose_lut[verbose],
    task = task,
    temperature = temperature,
    temperature_increment_on_fallback = temperature_increment_on_fallback,
    best_of = best_of,
    beam_size = beam_size,
    patience=patience,
    length_penalty=(length_penalty if length_penalty>=0.0 else None),
    suppress_tokens=suppress_tokens,
    initial_prompt=(None if not initial_prompt else initial_prompt),
    condition_on_previous_text=condition_on_previous_text,
    fp16=fp16,
    compression_ratio_threshold=compression_ratio_threshold,
    logprob_threshold=logprob_threshold,
    no_speech_threshold=no_speech_threshold
)

temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
    temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
    temperature = [temperature]

if Model.endswith(".en") and args["language"] not in {"en", "English"}:
    warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
    args["language"] = "en"

for video_path_local in video_path_local_list:
    display(Markdown(f"### {video_path_local}"))

    video_transcription = whisper.transcribe(
        whisper_model,
        str(video_path_local),
        temperature=temperature,
        **args,
    )

    # Save output
    whisper.utils.get_writer(
        output_format=output_format,
        output_dir=video_path_local.parent
    )(
        video_transcription,
        str(video_path_local.stem),
        {
          "max_line_width": 47,
          "max_line_count": 1,
          "highlight_words": False
        }
    )
    try:
        if output_format=="all":
            for ext in ('txt', 'vtt', 'srt', 'tsv', 'json'):
                transcript_file_name = video_path_local.stem + "." + ext
                shutil.copy(
                    video_path_local.parent / transcript_file_name,
                    drive_whisper_path / transcript_file_name
                )
                display(Markdown(f"**Transcript file created: {drive_whisper_path / transcript_file_name}**"))
        else:
            transcript_file_name = video_path_local.stem + "." + output_format
            shutil.copy(
                video_path_local.parent / transcript_file_name,
                drive_whisper_path / transcript_file_name
            )
            display(Markdown(f"**Transcript file created: {drive_whisper_path / transcript_file_name}**"))

    except:
        display(Markdown(f"**Transcript file created: {transcript_local_path}**"))

### /content/20 Double Tops and Bottoms.mp4

[00:00.000 --> 00:09.000]  This is Al Brooks and my price action trading course and this module has to do with double tops and bottoms.
[00:11.000 --> 00:17.000]  Here is a list of common reversal patterns, most of which are trading ranges.
[00:17.000 --> 00:23.000]  This particular module has to do with double tops and bottoms as reversal patterns.
[00:24.000 --> 00:28.000]  Perfect double tops and bottoms, they're rare.
[00:28.000 --> 00:33.000]  Triple tops that are perfect, triple bottoms that are perfect, are even more rare.
[00:33.000 --> 00:35.000]  But close is close enough.
[00:35.000 --> 00:40.000]  Most double tops and double bottoms are parts of major trend reversals.
[00:40.000 --> 00:45.000]  In fact, all major trend reversals are a type of double top or double bottom,
[00:45.000 --> 00:51.000]  where the second top or second bottom is not exactly at the same level as the first.
[00:52.000 --> 00:59.000]  Bottom line, the market is trying to reverse whether or not the pat

KeyboardInterrupt: 