<a href="https://colab.research.google.com/github/IshmamF/youtube-translation/blob/main/PrimeHackathon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Youtube Videos Transcription with OpenAI's Whisper**

[![blog post shield](https://img.shields.io/static/v1?label=&message=Blog%20post&color=blue&style=for-the-badge&logo=openai&link=https://openai.com/blog/whisper)](https://openai.com/blog/whisper)
[![notebook shield](https://img.shields.io/static/v1?label=&message=Notebook&color=blue&style=for-the-badge&logo=googlecolab&link=https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)](https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb)
[![repository shield](https://img.shields.io/static/v1?label=&message=Repository&color=blue&style=for-the-badge&logo=github&link=https://github.com/openai/whisper)](https://github.com/openai/whisper)
[![paper shield](https://img.shields.io/static/v1?label=&message=Paper&color=blue&style=for-the-badge&link=https://cdn.openai.com/papers/whisper.pdf)](https://cdn.openai.com/papers/whisper.pdf)
[![model card shield](https://img.shields.io/static/v1?label=&message=Model%20card&color=blue&style=for-the-badge&link=https://github.com/openai/whisper/blob/main/model-card.md)](https://github.com/openai/whisper/blob/main/model-card.md)

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

This Notebook will guide you through the transcription of a Youtube video using Whisper. You'll be able to explore most inference parameters or use the Notebook as-is to store the transcript and video audio in your Google Drive.

In [1]:
#@markdown # **Check GPU type** 🕵️

#@markdown The type of GPU you get assigned in your Colab session defined the speed at which the video will be transcribed.
#@markdown The higher the number of floating point operations per second (FLOPS), the faster the transcription.
#@markdown But even the least powerful GPU available in Colab is able to run any Whisper model.
#@markdown Make sure you've selected `GPU` as hardware accelerator for the Notebook (Runtime &rarr; Change runtime type &rarr; Hardware accelerator).

#@markdown |  GPU   |  GPU RAM   | FP32 teraFLOPS |     Availability   |
#@markdown |:------:|:----------:|:--------------:|:------------------:|
#@markdown |  T4    |    16 GB   |       8.1      |         Free       |
#@markdown | P100   |    16 GB   |      10.6      |      Colab Pro     |
#@markdown | V100   |    16 GB   |      15.7      |  Colab Pro (Rare)  |

#@markdown ---
#@markdown **Factory reset your Notebook's runtime if you want to get assigned a new GPU.**

!nvidia-smi -L

!nvidia-smi

GPU 0: Tesla T4 (UUID: GPU-a20f027e-9be4-86e4-97be-96d51bf1e165)
Sun Nov 19 11:47:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+----------------------

In [2]:
#@markdown # **Install libraries** 🏗️
#@markdown This cell will take a little while to download several libraries, including Whisper.

#@markdown ---

! pip install git+https://github.com/openai/whisper.git
! pip install yt-dlp

import sys
import warnings
import whisper
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-5spi3qno
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-5spi3qno
  Resolved https://github.com/openai/whisper.git to commit e58f28804528831904c3b6f2c0e473f346223433
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper==20231117)
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: openai-whisper
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone
  Created wheel for openai-whisper: filename=openai_whisper-20231117-py3-none-a

Using device: cuda:0


In [None]:
#@markdown # **Optional:** Save data in Google Drive 💾
#@markdown Enter a Google Drive path and run this cell if you want to store the results inside Google Drive.

# Uncomment to copy generated images to drive, faster than downloading directly from colab in my experience.
from google.colab import drive
drive_mount_path = Path("/") / "content" / "drive"
drive.mount(str(drive_mount_path))
drive_mount_path /= "My Drive"
#@markdown ---
drive_path = "Colab Notebooks/Whisper Youtube" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change your Google Drive path.**

drive_whisper_path = drive_mount_path / Path(drive_path.lstrip("/"))
drive_whisper_path.mkdir(parents=True, exist_ok=True)

Mounted at /content/drive


In [None]:
#@markdown # **Model selection** 🧠

#@markdown As of the first public release, there are 4 pre-trained options to play with:

#@markdown |  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
#@markdown |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
#@markdown |  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
#@markdown |  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
#@markdown | small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
#@markdown | medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
#@markdown | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |

#@markdown ---
Model = 'large' #@param ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large']
#@markdown ---
#@markdown **Run this cell again if you change the model.**

whisper_model = whisper.load_model(Model)

if Model in whisper.available_models():
    display(Markdown(
        f"**{Model} model is selected.**"
    ))
else:
    display(Markdown(
        f"**{Model} model is no longer available.**<br /> Please select one of the following:<br /> - {'<br /> - '.join(whisper.available_models())}"
    ))

 28%|██████████▊                            | 815M/2.88G [00:04<00:12, 184MiB/s]


KeyboardInterrupt: ignored

In [None]:
def transcribeVideo():

  if Type == "Youtube video or playlist":

    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'outtmpl': '%(id)s.%(ext)s',
        # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
        'postprocessors': [{  # Extract audio using ffmpeg
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
        }]
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([URL])
        list_video_info = [ydl.extract_info(URL, download=False)]

    for video_info in list_video_info:
        video_path_local_list.append(Path(f"{video_info['id']}.wav"))

  elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
      for video_path_drive in video_path.glob("**/*"):
        if video_path_drive.is_file():
          display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
        elif video_path_drive.is_dir():
          display(Markdown(f"**Subfolders not supported.**"))
        else:
          display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
        video_path_local = Path(".").resolve() / (video_path_drive.name)
        shutil.copy(video_path_drive, video_path_local)
        video_path_local_list.append(video_path_local)
    elif video_path.is_file():
      video_path_local = Path(".").resolve() / (video_path.name)
      shutil.copy(video_path, video_path_local)
      video_path_local_list.append(video_path_local)
      display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
      display(Markdown(f"**{str(video_path)} does not exist.**"))

  else:
    raise(TypeError("Please select supported input type."))

  for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
      video_path_local = video_path_local.with_suffix(".wav")
      result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


In [None]:
def translateLang():

  language = "English"
  verbose = 'Live transcription'
  output_format = 'all'
  task = 'translate'
  temperature = 0.15
  temperature_increment_on_fallback = 0.2
  best_of = 5
  beam_size = 8
  patience = 1.0
  length_penalty = -0.05
  suppress_tokens = "-1"
  initial_prompt = ""
  condition_on_previous_text = True
  fp16 = True
  compression_ratio_threshold = 2.4
  logprob_threshold = -1.0
  no_speech_threshold = 0.6

  verbose_lut = {
      'Live transcription': True,
      'Progress bar': False,
      'None': None
  }

  args = dict(
      language = (None if language == "Auto detection" else language),
      verbose = verbose_lut[verbose],
      task = task,
      temperature = temperature,
      temperature_increment_on_fallback = temperature_increment_on_fallback,
      best_of = best_of,
      beam_size = beam_size,
      patience=patience,
      length_penalty=(length_penalty if length_penalty>=0.0 else None),
      suppress_tokens=suppress_tokens,
      initial_prompt=(None if not initial_prompt else initial_prompt),
      condition_on_previous_text=condition_on_previous_text,
      fp16=fp16,
      compression_ratio_threshold=compression_ratio_threshold,
      logprob_threshold=logprob_threshold,
      no_speech_threshold=no_speech_threshold
  )

  temperature = args.pop("temperature")
  temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
  if temperature_increment_on_fallback is not None:
      temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
  else:
      temperature = [temperature]

  if Model.endswith(".en") and args["language"] not in {"en", "English"}:
      warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
      args["language"] = "en"

  for video_path_local in video_path_local_list:
      display(Markdown(f"### {video_path_local}"))

      video_transcription = whisper.transcribe(
          whisper_model,
          str(video_path_local),
          temperature=temperature,
          **args,
      )

      # Save output
      whisper.utils.get_writer(
          output_format=output_format,
          output_dir=video_path_local.parent
      )(
          video_transcription,
          str(video_path_local.stem),
          options=dict(
              highlight_words=False,
              max_line_count=None,
              max_line_width=None,
          )
      )

      def exportTranscriptFile(ext: str):
          local_path = video_path_local.parent / video_path_local.with_suffix(ext)
          export_path = drive_whisper_path / video_path_local.with_suffix(ext)
          shutil.copy(
              local_path,
              export_path
          )
          display(Markdown(f"**Transcript file created: {export_path}**"))

      if output_format=="all":
          for ext in ('.txt', '.vtt', '.srt', '.tsv', '.json'):
              exportTranscriptFile(ext)
      else:
          exportTranscriptFile("." + output_format)


In [5]:
!pip install streamlit
!pip install pyngrok==4.1.1

Collecting streamlit
  Downloading streamlit-1.28.2-py2.py3-none-any.whl (8.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m60.2 MB/s[0m eta [36m0:00:00[0m
Collecting validators<1,>=0.2 (from streamlit)
  Downloading validators-0.22.0-py3-none-any.whl (26 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.40-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.6/190.6 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.8.1b0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m99.3 MB/s[0m eta [36m0:00:00[0m
Collecting watchdog>=2.1.5 (from streamlit)
  Downloading watchdog-3.0.0-py3-none-manylinux2014_x86_64.whl (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.1/82.1 kB[0m [31m12.0 MB/s[0m eta [36m0:0

In [21]:
!pip install youtube_transcript_api

Collecting youtube_transcript_api
  Downloading youtube_transcript_api-0.6.1-py3-none-any.whl (24 kB)
Installing collected packages: youtube_transcript_api
Successfully installed youtube_transcript_api-0.6.1


In [30]:
!pip install transformers



In [35]:
%%writefile app.py
import streamlit as st
import sys
import warnings
import whisper
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
import time
import tempfile
from IPython.display import display, Markdown, YouTubeVideo
from transformers import pipeline

def Summarizer_function(transcription):
    summarizer = pipeline("summarization")
    def word_length(paragraph):
      words = paragraph.split()
      return len(words)
    min_length = word_length(transcription)
    if min_length >= 1800:
    # Assuming 'summarizer' is a function you have defined elsewhere
      summary = summarizer(transcription, max_length = 5000, min_length=150, do_sample=False)
    elif min_length < 1800:
    # Assuming 'summarizer' is a function you have defined elsewhere
      summary = summarizer(transcription, max_length = 2500, min_length=50, do_sample=False)
      return summary


device = torch.device('cuda:0')

Model = 'tiny' # ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large']

whisper_model = whisper.load_model(Model)

st.title("Language Alchemy")

st.header('Translate any youtube video in the language of your choice!')

def main():
  youtube_link = st.text_input("Enter YouTube Video Link")
  choice = st.selectbox('Select', ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba'])
  button = st.button('Submit')

  if button and youtube_link and choice:

    Type = "Youtube video or playlist"
    URL = youtube_link
    video_path = "Colab Notebooks/transcription/my_video.mp4"

    video_path_local_list = []

    if Type == "Youtube video or playlist":

      ydl_opts = {
          'format': 'm4a/bestaudio/best',
          'outtmpl': '%(id)s.%(ext)s',
          # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
          'postprocessors': [{  # Extract audio using ffmpeg
              'key': 'FFmpegExtractAudio',
              'preferredcodec': 'wav',
          }]
      }

      with yt_dlp.YoutubeDL(ydl_opts) as ydl:
          error_code = ydl.download([URL])
          list_video_info = [ydl.extract_info(URL, download=False)]

      for video_info in list_video_info:
          video_path_local_list.append(Path(f"{video_info['id']}.wav"))

    elif Type == "Google Drive":
      # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
      video_path = drive_mount_path / Path(video_path.lstrip("/"))
      if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
          if video_path_drive.is_file():
            display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
          elif video_path_drive.is_dir():
            display(Markdown(f"**Subfolders not supported.**"))
          else:
            display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
          video_path_local = Path(".").resolve() / (video_path_drive.name)
          shutil.copy(video_path_drive, video_path_local)
          video_path_local_list.append(video_path_local)
      elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
      else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

    else:
      raise(TypeError("Please select supported input type."))

    for video_path_local in video_path_local_list:
      if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


    language = "English"
    verbose = 'Live transcription'
    output_format = 'all'
    task = 'translate'
    temperature = 0.15
    temperature_increment_on_fallback = 0.2
    best_of = 5
    beam_size = 8
    patience = 1.0
    length_penalty = -0.05
    suppress_tokens = "-1"
    initial_prompt = ""
    condition_on_previous_text = True
    fp16 = True
    compression_ratio_threshold = 2.4
    logprob_threshold = -1.0
    no_speech_threshold = 0.6

    verbose_lut = {
        'Live transcription': True,
        'Progress bar': False,
        'None': None
    }

    args = dict(
        language = (None if language == "Auto detection" else language),
        verbose = verbose_lut[verbose],
        task = task,
        temperature = temperature,
        temperature_increment_on_fallback = temperature_increment_on_fallback,
        best_of = best_of,
        beam_size = beam_size,
        patience=patience,
        length_penalty=(length_penalty if length_penalty>=0.0 else None),
        suppress_tokens=suppress_tokens,
        initial_prompt=(None if not initial_prompt else initial_prompt),
        condition_on_previous_text=condition_on_previous_text,
        fp16=fp16,
        compression_ratio_threshold=compression_ratio_threshold,
        logprob_threshold=logprob_threshold,
        no_speech_threshold=no_speech_threshold
    )

    temperature = args.pop("temperature")
    temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
    if temperature_increment_on_fallback is not None:
        temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
    else:
        temperature = [temperature]

    if Model.endswith(".en") and args["language"] not in {"en", "English"}:
        warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
        args["language"] = "en"

    for video_path_local in video_path_local_list:
        display(Markdown(f"### {video_path_local}"))

        video_transcription = whisper.transcribe(
            whisper_model,
            str(video_path_local),
            temperature=temperature,
            **args,
        )

        # Save output
        whisper.utils.get_writer(
            output_format=output_format,
            output_dir=video_path_local.parent
        )(
            video_transcription,
            str(video_path_local.stem),
            options=dict(
                highlight_words=False,
                max_line_count=None,
                max_line_width=None,
            )
        )

        def exportTranscriptFile(ext: str):
            local_path = video_path_local.parent / video_path_local.with_suffix(ext)
            with tempfile.NamedTemporaryFile(delete=False) as temp_file:
                temp_path = temp_file.name
                shutil.copy(local_path, temp_path)
                shutil.move(temp_path, local_path)

        exportTranscriptFile('.txt')


    col1, col2 = st.columns(2)
    col1, col2 = st.columns([1,1])
    with col1:
      st.video(youtube_link)
    with col2:
      with st.spinner(text='In progress'):
        max_wait_time = 75  # You can adjust this value based on your needs
        start_time = time.time()

        while not Path(f"{video_info['id']}.txt").is_file():
            if time.time() - start_time > max_wait_time:
                st.warning("Timeout: Unable to load the file within the specified time.")
                break
            time.sleep(1)

        # Assuming video_info['id'] is a placeholder for your actual video ID
        video_file_path = f"{video_info['id']}.txt"

        if Path(video_file_path).is_file():
            video_file = open(video_file_path, 'r')
            text_content = video_file.read()
            st.text_area("Transcription:", text_content, height=225)  # Adjust height as needed

        else:
            st.warning(f"File {video_file_path} not found.")
    st.subheader('Summary:')
    with st.spinner(text="In progress..."):
      summary = Summarizer_function(text_content)[0]["summary_text"]
      st.write(summary)
      time.sleep(2)

if __name__ == "__main__":
  main()




Overwriting app.py


In [None]:
! streamlit run /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py & npx localtunnel --port 8501

[?25l[..................] / rollbackFailedOptional: verb npm-session f3bebcef8edaadc[0m[K[..................] / rollbackFailedOptional: verb npm-session f3bebcef8edaadc[0m[K[..................] / rollbackFailedOptional: verb npm-session f3bebcef8edaadc[0m[K
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.124.244.140:8501[0m
[0m
[K[?25hnpx: installed 22 in 3.199s
your url is: https://tall-papayas-eat.loca.lt
[34m  Stopping...[0m
^C


In [6]:
from pyngrok import ngrok
ngrok.set_auth_token('2YNwFxrosM2EhDHzpfZt8RP4ozL_57TYy1E5dAo47h7XKmyXX')




In [7]:
# Terminate open tunnels if exist
ngrok.kill()

# Set up a new tunnel
site = ngrok.connect(port=8501)

# Run the Streamlit app as a background process
!nohup streamlit run app.py &

# Print the public URL to access the Streamlit app
print('Streamlit URL:', site)

nohup: appending output to 'nohup.out'
Streamlit URL: http://ac1a-35-185-180-67.ngrok-free.app


AttributeError: ignored

In [None]:
!streamlit run app.py --server.port 8888



Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8888[0m
[34m  External URL: [0m[1mhttp://34.124.244.140:8888[0m
[0m
[34m  Stopping...[0m
[34m  Stopping...[0m


In [None]:
#@markdown # **Video selection** 📺

#@markdown Enter the URL of the Youtube video you want to transcribe, wether you want to save the audio file in your Google Drive, and run the cell.

Type = "Youtube video or playlist" #@param ['Youtube video or playlist', 'Google Drive']
#@markdown ---
#@markdown #### **Youtube video or playlist**
URL = "https://www.youtube.com/watch?v=tWCaFVJMUi8&ab_channel=GateSmashers" #@param {type:"string"}
# store_audio = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### **Google Drive video, audio (mp4, wav), or folder containing video and/or audio files**
video_path = "Colab Notebooks/transcription/my_video.mp4" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change the video.**

video_path_local_list = []

if Type == "Youtube video or playlist":

    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'outtmpl': '%(id)s.%(ext)s',
        # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
        'postprocessors': [{  # Extract audio using ffmpeg
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
        }]
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([URL])
        list_video_info = [ydl.extract_info(URL, download=False)]

    for video_info in list_video_info:
        video_path_local_list.append(Path(f"{video_info['id']}.wav"))

elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
            if video_path_drive.is_file():
                display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
            elif video_path_drive.is_dir():
                display(Markdown(f"**Subfolders not supported.**"))
            else:
                display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
            video_path_local = Path(".").resolve() / (video_path_drive.name)
            shutil.copy(video_path_drive, video_path_local)
            video_path_local_list.append(video_path_local)
    elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

else:
    raise(TypeError("Please select supported input type."))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


[youtube] Extracting URL: https://www.youtube.com/watch?v=tWCaFVJMUi8&ab_channel=GateSmashers
[youtube] tWCaFVJMUi8: Downloading webpage
[youtube] tWCaFVJMUi8: Downloading ios player API JSON
[youtube] tWCaFVJMUi8: Downloading android player API JSON
[youtube] tWCaFVJMUi8: Downloading m3u8 information
[info] tWCaFVJMUi8: Downloading 1 format(s): 140
[download] Destination: tWCaFVJMUi8.m4a
[download] 100% of   12.44MiB in 00:00:00 at 33.84MiB/s  
[FixupM4a] Correcting container of "tWCaFVJMUi8.m4a"
[ExtractAudio] Destination: tWCaFVJMUi8.wav
Deleting original file tWCaFVJMUi8.m4a (pass -k to keep)
[youtube] Extracting URL: https://www.youtube.com/watch?v=tWCaFVJMUi8&ab_channel=GateSmashers
[youtube] tWCaFVJMUi8: Downloading webpage
[youtube] tWCaFVJMUi8: Downloading ios player API JSON
[youtube] tWCaFVJMUi8: Downloading android player API JSON
[youtube] tWCaFVJMUi8: Downloading m3u8 information


In [None]:
#@markdown # **Run the model** 🚀

#@markdown Run this cell to execute the transcription of the video. This can take a while and very based on the length of the video and the number of parameters of the model selected above.

#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**
#@markdown ---
language = "English" #@param ['Auto detection', 'Afrikaans', 'Albanian', 'Amharic', 'Arabic', 'Armenian', 'Assamese', 'Azerbaijani', 'Bashkir', 'Basque', 'Belarusian', 'Bengali', 'Bosnian', 'Breton', 'Bulgarian', 'Burmese', 'Castilian', 'Catalan', 'Chinese', 'Croatian', 'Czech', 'Danish', 'Dutch', 'English', 'Estonian', 'Faroese', 'Finnish', 'Flemish', 'French', 'Galician', 'Georgian', 'German', 'Greek', 'Gujarati', 'Haitian', 'Haitian Creole', 'Hausa', 'Hawaiian', 'Hebrew', 'Hindi', 'Hungarian', 'Icelandic', 'Indonesian', 'Italian', 'Japanese', 'Javanese', 'Kannada', 'Kazakh', 'Khmer', 'Korean', 'Lao', 'Latin', 'Latvian', 'Letzeburgesch', 'Lingala', 'Lithuanian', 'Luxembourgish', 'Macedonian', 'Malagasy', 'Malay', 'Malayalam', 'Maltese', 'Maori', 'Marathi', 'Moldavian', 'Moldovan', 'Mongolian', 'Myanmar', 'Nepali', 'Norwegian', 'Nynorsk', 'Occitan', 'Panjabi', 'Pashto', 'Persian', 'Polish', 'Portuguese', 'Punjabi', 'Pushto', 'Romanian', 'Russian', 'Sanskrit', 'Serbian', 'Shona', 'Sindhi', 'Sinhala', 'Sinhalese', 'Slovak', 'Slovenian', 'Somali', 'Spanish', 'Sundanese', 'Swahili', 'Swedish', 'Tagalog', 'Tajik', 'Tamil', 'Tatar', 'Telugu', 'Thai', 'Tibetan', 'Turkish', 'Turkmen', 'Ukrainian', 'Urdu', 'Uzbek', 'Valencian', 'Vietnamese', 'Welsh', 'Yiddish', 'Yoruba']
#@markdown > Language spoken in the audio, use `Auto detection` to let Whisper detect the language.
#@markdown ---
verbose = 'Live transcription' #@param ['Live transcription', 'Progress bar', 'None']
#@markdown > Whether to print out the progress and debug messages.
#@markdown ---
output_format = 'all' #@param ['txt', 'vtt', 'srt', 'tsv', 'json', 'all']
#@markdown > Type of file to generate to record the transcription.
#@markdown ---
task = 'translate' #@param ['transcribe', 'translate']
#@markdown > Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`).
#@markdown ---

#@markdown <br/>

#@markdown ### **Optional: Fine tunning**
#@markdown ---
temperature = 0.15 #@param {type:"slider", min:0, max:1, step:0.05}
#@markdown > Temperature to use for sampling.
#@markdown ---
temperature_increment_on_fallback = 0.2 #@param {type:"slider", min:0, max:1, step:0.05}
#@markdown > Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.
#@markdown ---
best_of = 5 #@param {type:"integer"}
#@markdown > Number of candidates when sampling with non-zero temperature.
#@markdown ---
beam_size = 8 #@param {type:"integer"}
#@markdown > Number of beams in beam search, only applicable when temperature is zero.
#@markdown ---
patience = 1.0 #@param {type:"number"}
#@markdown > Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search.
#@markdown ---
length_penalty = -0.05 #@param {type:"slider", min:-0.05, max:1, step:0.05}
#@markdown > Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization.
#@markdown ---
suppress_tokens = "-1" #@param {type:"string"}
#@markdown > Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations.
#@markdown ---
initial_prompt = "" #@param {type:"string"}
#@markdown > Optional text to provide as a prompt for the first window.
#@markdown ---
condition_on_previous_text = True #@param {type:"boolean"}
#@markdown > if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
#@markdown ---
fp16 = True #@param {type:"boolean"}
#@markdown > whether to perform inference in fp16.
#@markdown ---
compression_ratio_threshold = 2.4 #@param {type:"number"}
#@markdown > If the gzip compression ratio is higher than this value, treat the decoding as failed.
#@markdown ---
logprob_threshold = -1.0 #@param {type:"number"}
#@markdown > If the average log probability is lower than this value, treat the decoding as failed.
#@markdown ---
no_speech_threshold = 0.6 #@param {type:"slider", min:-0.0, max:1, step:0.05}
#@markdown > If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence.
#@markdown ---

verbose_lut = {
    'Live transcription': True,
    'Progress bar': False,
    'None': None
}

args = dict(
    language = (None if language == "Auto detection" else language),
    verbose = verbose_lut[verbose],
    task = task,
    temperature = temperature,
    temperature_increment_on_fallback = temperature_increment_on_fallback,
    best_of = best_of,
    beam_size = beam_size,
    patience=patience,
    length_penalty=(length_penalty if length_penalty>=0.0 else None),
    suppress_tokens=suppress_tokens,
    initial_prompt=(None if not initial_prompt else initial_prompt),
    condition_on_previous_text=condition_on_previous_text,
    fp16=fp16,
    compression_ratio_threshold=compression_ratio_threshold,
    logprob_threshold=logprob_threshold,
    no_speech_threshold=no_speech_threshold
)

temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
    temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
    temperature = [temperature]

if Model.endswith(".en") and args["language"] not in {"en", "English"}:
    warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
    args["language"] = "en"

for video_path_local in video_path_local_list:
    display(Markdown(f"### {video_path_local}"))

    video_transcription = whisper.transcribe(
        whisper_model,
        str(video_path_local),
        temperature=temperature,
        **args,
    )

    # Save output
    whisper.utils.get_writer(
        output_format=output_format,
        output_dir=video_path_local.parent
    )(
        video_transcription,
        str(video_path_local.stem),
        options=dict(
            highlight_words=False,
            max_line_count=None,
            max_line_width=None,
        )
    )

    def exportTranscriptFile(ext: str):
        local_path = video_path_local.parent / video_path_local.with_suffix(ext)
        export_path = drive_whisper_path / video_path_local.with_suffix(ext)
        shutil.copy(
            local_path,
            export_path
        )
        display(Markdown(f"**Transcript file created: {export_path}**"))

    if output_format=="all":
        for ext in ('.txt', '.vtt', '.srt', '.tsv', '.json'):
            exportTranscriptFile(ext)
    else:
        exportTranscriptFile("." + output_format)


### tWCaFVJMUi8.wav

[00:00.000 --> 00:02.000]  Hello friends, welcome to Gate Smashers
[00:02.000 --> 00:05.000]  In this video we are going to discuss about Quick Sort
[00:05.000 --> 00:09.000]  And in this video we are going to discuss all the important points related to Quick Sort
[00:09.000 --> 00:14.000]  Which are very important for your competitive exams or for your college or university level exams
[00:14.000 --> 00:17.000]  And even for your placements
[00:17.000 --> 00:22.000]  So guys, quickly like the video and subscribe to the channel if you haven't done it yet
[00:22.000 --> 00:26.000]  And please press the bell button so that you don't miss all the latest notifications
[00:26.000 --> 00:28.000]  So let's start with Quick Sort
[00:28.000 --> 00:33.000]  The first important point is that it is a divide and conquer technology or divide and conquer method
[00:33.000 --> 00:35.000]  Now what is divide and conquer?
[00:35.000 --> 00:40.000]  Let's say I have a problem and the size of the problem 

**Transcript file created: /content/drive/My Drive/Colab Notebooks/Whisper Youtube/tWCaFVJMUi8.txt**

**Transcript file created: /content/drive/My Drive/Colab Notebooks/Whisper Youtube/tWCaFVJMUi8.vtt**

**Transcript file created: /content/drive/My Drive/Colab Notebooks/Whisper Youtube/tWCaFVJMUi8.srt**

**Transcript file created: /content/drive/My Drive/Colab Notebooks/Whisper Youtube/tWCaFVJMUi8.tsv**

**Transcript file created: /content/drive/My Drive/Colab Notebooks/Whisper Youtube/tWCaFVJMUi8.json**