<a href="https://colab.research.google.com/github/CapitanMurloc/srt/blob/develop/Youtube_SRT_with_Whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#🦄 **Youtube SRT Whisper** - *Notebook creado por [Francisco Javier Estrella Rodriguez aka CapitanMurloc](https://www.youtube.com/channel/UCfDAxbIFtpUsxpA_eUnZQ_w)*

**Work in progress**
- This is the first version
- I want to generate the audio from the translated subtitle.


---


**Before running the colab make sure that the environment is running on a GPU, I tried it on a CPU and it was impossible to finish the process because of the slowness.**


---


You can communicate through my email jestrella@teides.com, if you have any suggestions I am happy to hear from you.

My Github [here](https://github.com/CapitanMurloc)

My twitter [here](https://twitter.com/CapitanMurloc)

My linkedin [here](https://www.linkedin.com/in/fco-javier-estrella-rodriguez/)

My YouTube channel [here](https://www.youtube.com/channel/UCfDAxbIFtpUsxpA_eUnZQ_w)

If you subscribe/follow any channel I would appreciate it, I promise to upload quality material.

# ⛩ What is this?

I'll explain it briefly, this colab allows us to generate subtitles for now of youtube videos and also translate them into a language.

**But doesn't youtube already do this?**

Yes, but in this colab we use OpenAI Whisper for me is the best ASR (Automatic Speech Recognition) that exists or at least the one I have had access to.


## Initial setup

This section allows us to download the video and audio from a youtube link.

In [1]:
#@title 1.- Install the required libs
#@markdown Brief description of the libraries in use

#@markdown ---
#@markdown [Whisper](https://github.com/openai/whisper) - It is the model that allows us to make transcriptions of an audio.

#@markdown [PyTube](https://github.com/pytube/pytube) - Allows us to download videos from YouTube

#@markdown [Transformers](https://huggingface.co/docs/transformers/index) - Necessary to load the translation model and to be able to make use of it

#@markdown [ffmpeg](https://ffmpeg.org) - A complete, cross-platform solution to record, convert and stream audio and video.
%%capture
!pip install git+https://github.com/openai/whisper.git pytube transformers[sentencepiece] sacremoses
!apt install ffmpeg

In [2]:
#@title 2.- Settings for subtitles
#@markdown `Youtube` is the address of the video to generate subtitles.
Youtube = "https://www.youtube.com/watch?v=MUqNwgPjJvQ" #@param {type:"string"}
#@markdown `Source` is the audio language of the youtube video.
Source = "English" #@param ["English", "Spanish"]
#@markdown `Destiny` is the language you want to add to generate the subtitle.
Destiny = "Spanish" #@param ["English", "Spanish"]

language_source = "en" if Source == "English" else "es"
language_destiny = "es" if Destiny == "Spanish" else "en"
youtube_url = Youtube

In [3]:
#@title 3.- ⚙️ Help in choosing the right video and audio
#@markdown This section allows you to choose the video quality and audio quality. 

#@markdown It is recommended for 1080p video and 160kbps audio. 

#@markdown **Sometimes it can give failures if the YouTube video has some license that does not allow to extract information from it.**
import re

def extract_stream_movies(streams):
  extracts = []
  for stream in streams:
    # Extract the type
    stream_type = re.search('type="(\w+)"', stream).group(1)
    progressive = re.search(r'progressive="(\w+)"', stream).group(1)
    # Convert string to boolean
    progressive = True if progressive == 'True' else False
    if stream_type == 'video' and progressive == False:
      # Extract the itag
      tag = re.search('itag="(\d+)"', stream).group(1)
      # Extract the resolution
      res = re.search('res="(\d+p)"', stream).group(1)
      # Append the itag and resolution to the list with separator '-'      
      extracts.append(tag + '-' + res)
  return extracts

def extract_stream_audios(streams):
  extracts = []
  for stream in streams:
    # Extract the type
    stream_type = re.search('type="(\w+)"', stream).group(1)
    progressive = re.search(r'progressive="(\w+)"', stream).group(1)
    # Convert string to boolean
    progressive = True if progressive == 'True' else False
    if stream_type == 'audio' and progressive == False:
      # Extract the itag
      tag = re.search('itag="(\d+)"', stream).group(1)
      # Extract the resolution
      abr = re.search('abr="(\d+kbps)"', stream).group(1)
      # Append the itag and resolution to the list with separator '-'      
      extracts.append(tag + '-' + abr)
  return extracts

import ipywidgets as widgets
streams = !pytube {youtube_url} --list
del streams[0]
movies = extract_stream_movies(streams)
audios = extract_stream_audios(streams)
movie = widgets.Dropdown(options=movies)
audio = widgets.Dropdown(options=audios)
widgets.Box([widgets.VBox([widgets.Label(value='Select a resolution:'), movie]),widgets.VBox([widgets.Label(value='Select an audio:'), audio])])

Box(children=(VBox(children=(Label(value='Select a resolution:'), Dropdown(options=('299-1080p', '298-720p', '…

In [4]:
#@title 4.- 🎥 Download movie and audio
#@markdown `Movie` is the name of the video to be saved.
Movie = "movie.mp4" #@param {type:"string"}
#@markdown `Audio` is the name of the audio to be saved.
Audio = "audio.webm" #@param {type:"string"}
import pytube
def download(url, tag, file_name):
    try:
        yt = pytube.YouTube(url)
        stream = yt.streams.get_by_itag(tag)
        stream.download(filename=file_name)
        return True
    except:
        return False
if download(youtube_url, movie.value.split('-')[0], Movie):
  print("Movie downloaded successfully")
else:
  print("Review -Help in choosing the right audio and video- See the list of available movies, check the resolutions.")
if download(youtube_url, audio.value.split('-')[0], Audio):
  print("Audio downloaded successfully")
else:
  print("Review -Help in choosing the right audio and video- See the list of available audios, check the resolutions.")

download_movie_path = "/content/" + Movie
download_audio_path = "/content/" + Audio

Movie downloaded successfully
Audio downloaded successfully


## Now we are going to generate the subtitles

At the end of this section you will get a video file with two subtitles, one in the original language and the other in the language you chose for the translation.

In [5]:
#@title 1.- Settings for outputs
#@markdown `Output` file name without extension.
Output = "output" #@param {type:"string"}
#@markdown `Model` This is the Whisper model we are going to use.
Model = "large" #@param ['base', 'small', 'medium', 'large']
output_mkv = Output + ".mkv"
output_mp4 = Output + ".mp4"
original_srt_path = "/content/audio_transcription/" + output_mp4 + ".srt"
destiny_srt_path = "/content/audio_transcription/" + Output + "." + language_destiny + ".srt"
complete_srt_path = "/content/audio_transcription/" + output_mp4 + ".txt"
output_mp4_path = "/content/" + output_mp4
output_mkv_path = "/content/" + output_mkv

In [6]:
#@title 2.- Merges the movie and audio from the YouTube video, required for later.
#@markdown When we have executed the previous step, we now have the video and audio separated. What this step does is to put them together in a single video in mp4 format.
%%capture
!ffmpeg -i {download_movie_path} -i {download_audio_path} -c:v copy -c:a aac {output_mp4}

In [7]:
#@title 3.- Transcribe the video from YouTube, patience this may take a while, it depends on the size of the video
#@markdown This section calls the Whisper model and tells you that the task is to transcribe the video, we are using the larger model because we are looking for high accuracy. You can configure it though.
!whisper {output_mp4_path} --task transcribe --language {language_source} --model {Model} --output_dir audio_transcription

2022-11-17 15:32:21.408483: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
100%|██████████████████████████████████████| 2.87G/2.87G [00:20<00:00, 153MiB/s]
tcmalloc: large alloc 3087007744 bytes == 0xa222000 @  0x7f7fb4e7a1e7 0x4b2590 0x5ad01c 0x5dcfef 0x58f92b 0x590c33 0x5e48ac 0x4d20fa 0x51041f 0x58fd37 0x50c4fc 0x5b4ee6 0x58ff2e 0x50d482 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7f7fb4a77c87 0x5b636a
[00:00.000 --> 00:10.480]  In this video, we'll study the encoder architecture, an example of a popular encoder-only architecture
[00:10.480 --> 00:14.680]  as BERT, which is the most popular model of its kind.
[00:14.680 --> 00:18.480]  Let's first start by understanding how it works.
[00:18.480 --> 00:20.960]  We'll use a small e

In [8]:
#@title 4.- Support functions
#@markdown Use of transformers for translation based on the [Helsinki-NLP/opus-mt-XX-to-XX](https://huggingface.co/Helsinki-NLP/opus-mt-en-es) model.
#@markdown --
#@markdown Gracias a 🤗 [HuggingFace](https://huggingface.co)
from transformers import pipeline
import os
import shutil

def translate(lines):
  translates = []
  task = "translation_" + language_source + "_to_" + language_destiny
  target = "Helsinki-NLP/opus-mt-" + language_source + "-" + language_destiny
  translator = pipeline(task, model=target)
  for line in lines:
    translation = translator(line)
    translates.append(translation[0]["translation_text"])
  return translates

def open_file(path):
    #Open the file
    file = open(path, "r")
    # Read a line of the file until there are no more lines
    # Remove the return character from each line.
    # Return the list of lines.
    lines = file.read().splitlines()
    # Close the file
    file.close()
    return lines

# Copy a file and other path and rename it
def copy_srt(src, dst):
    if os.path.isfile(src):
      shutil.copy(src, dst)
      print("File copied successfully.")

# Searches a text file for a matching line and replaces it with a given value
def replace_line(filename, search, replace):
    with open(filename, 'r') as f:
        lines = f.readlines()
    with open(filename, 'w') as f:
        for line in lines:
            if search in line:
                line = replace
            f.writelines(line)

def replace_srt(input, filename):
  searches = open_file(input)
  replaces = translate(searches)
  for i in range(0, len(searches)):
    replace_line(filename, searches[i], replaces[i])

In [9]:
#@title 5.- Create a new srt for the target language from the original srt
copy_srt(original_srt_path, destiny_srt_path)

File copied successfully.


In [10]:
#@title 6.- We translate the copied srt to the target language, please be patient, it may take some time to finish.
replace_srt(complete_srt_path, destiny_srt_path)

Downloading:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

In [11]:
#@title 7.- Now we join the subtitles and generate the result in a new video in [MKV format](https://es.wikipedia.org/wiki/Matroska)
%%capture
!ffmpeg -i {output_mp4_path} -i {original_srt_path} -i {destiny_srt_path} -map 0:v -map 0:a -map 1 -map 2 -c:v copy -c:a copy -c:s srt -metadata:s:s:0 language={language_source} -metadata:s:s:1 language={language_destiny} {output_mkv}

In [12]:
#@title 8.- 😎 If you have made it this far, CONGRATULATIONS, you can now download the result. 
#@markdown When you execute this step you will see a progress bar that will download your result.
from google.colab import files
files.download(output_mkv_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>