<a href="https://colab.research.google.com/github/CapitanMurloc/srt/blob/develop/Youtube_SRT_with_Whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#🦄 **Youtube SRT Whisper** - *Notebook creado por [Francisco Javier Estrella Rodriguez aka CapitanMurloc](https://www.youtube.com/channel/UCfDAxbIFtpUsxpA_eUnZQ_w)*

# ⛩ Descripción




## Initial setup

In [1]:
#@title 1.- Install the required libs
%%capture
!pip install git+https://github.com/openai/whisper.git pytube datasets evaluate transformers[sentencepiece] sacremoses
!apt install ffmpeg

In [12]:
#@title 2.- Settings for subtitles
#@markdown `Youtube` is the address of the video to generate subtitles.
Youtube = "https://www.youtube.com/watch?v=MUqNwgPjJvQ" #@param {type:"string"}
#@markdown `Source` is the language of the video to generate subtitles.
Source = "English" #@param ["English", "Spanish"]
#@markdown `Destiny` is the language of the video to generate subtitles.
Destiny = "Spanish" #@param ["English", "Spanish"]

language_source = "en" if Source == "English" else "es"
language_destiny = "es" if Source == "Spanish" else "en"
youtube_url = Youtube

In [83]:
#@title 3.- ⚙️ Help in choosing the right audio and video
import re

def extract_stream_movies(streams):
  extracts = []
  for stream in streams:
    # Extract the type
    stream_type = re.search('type="(\w+)"', stream).group(1)
    progressive = re.search(r'progressive="(\w+)"', stream).group(1)
    # Convert string to boolean
    progressive = True if progressive == 'True' else False
    if stream_type == 'video' and progressive == False:
      # Extract the itag
      tag = re.search('itag="(\d+)"', stream).group(1)
      # Extract the resolution
      res = re.search('res="(\d+p)"', stream).group(1)
      # Append the itag and resolution to the list with separator '-'      
      extracts.append(tag + '-' + res)
  return extracts

def extract_stream_audios(streams):
  extracts = []
  for stream in streams:
    # Extract the type
    stream_type = re.search('type="(\w+)"', stream).group(1)
    progressive = re.search(r'progressive="(\w+)"', stream).group(1)
    # Convert string to boolean
    progressive = True if progressive == 'True' else False
    if stream_type == 'audio' and progressive == False:
      # Extract the itag
      tag = re.search('itag="(\d+)"', stream).group(1)
      # Extract the resolution
      abr = re.search('abr="(\d+kbps)"', stream).group(1)
      # Append the itag and resolution to the list with separator '-'      
      extracts.append(tag + '-' + abr)
  return extracts

import ipywidgets as widgets
streams = !pytube {youtube_url} --list
del streams[0]
movies = extract_stream_movies(streams)
audios = extract_stream_audios(streams)
movie = widgets.Dropdown(options=movies)
audio = widgets.Dropdown(options=audios)
widgets.Box([widgets.VBox([widgets.Label(value='Select a resolution:'), movie]),widgets.VBox([widgets.Label(value='Select an audio:'), audio])])

Box(children=(VBox(children=(Label(value='Select a resolution:'), Dropdown(options=('299-1080p', '298-720p', '…

In [107]:
#@title 4.- 🎥 Download movie and audio
#@markdown `Movie` is the name of the video to be saved.
Movie = "movie.mp4" #@param {type:"string"}
#@markdown `Audio` is the name of the audio to be saved.
Audio = "audio.webm" #@param {type:"string"}
import pytube
def download(url, tag, file_name):
    try:
        yt = pytube.YouTube(url)
        stream = yt.streams.get_by_itag(tag)
        stream.download(filename=file_name)
        return True
    except:
        return False

if download(youtube_url, movie.value.split('-')[0], Movie):
  print("Movie downloaded successfully")
else:
  print("Review -Help in choosing the right audio and video- See the list of available movies, check the resolutions.")
if download(youtube_url, audio.value.split('-')[0], Audio):
  print("Audio downloaded successfully")
else:
  print("Review -Help in choosing the right audio and video- See the list of available audios, check the resolutions.")

Movie downloaded successfully
Audio downloaded successfully


## Teach the model the new concept (fine-tuning with Dreambooth)
Execute this this sequence of cells to run the training process. The whole process may take from 15 min to 2 hours. (Open this block if you are interested in how this process works under the hood or if you want to change advanced training settings or hyperparameters)

In [8]:
%%capture
!ffmpeg -i "/content/Transformer models Encoders.mp4" -i "/content/Transformer models Encoders.webm" -c:v copy -c:a aac output.mp4

ffmpeg version 3.4.11-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-li

In [9]:
%%capture
!whisper "/content/output.mp4" --task transcribe --language en --model large --output_dir audio_transcription

100%|█████████████████████████████████████| 2.87G/2.87G [01:24<00:00, 36.5MiB/s]
tcmalloc: large alloc 3087007744 bytes == 0x9cda000 @  0x7ff077a2b1e7 0x4b2590 0x5ad01c 0x5dcfef 0x58f92b 0x590c33 0x5e48ac 0x4d20fa 0x51041f 0x58fd37 0x50c4fc 0x5b4ee6 0x58ff2e 0x50d482 0x58fd37 0x50c4fc 0x5b4ee6 0x6005a3 0x607796 0x60785c 0x60a436 0x64db82 0x64dd2e 0x7ff077628c87 0x5b636a
[00:00.000 --> 00:10.480]  In this video, we'll study the encoder architecture, an example of a popular encoder-only architecture
[00:10.480 --> 00:14.680]  as BERT, which is the most popular model of its kind.
[00:14.680 --> 00:18.480]  Let's first start by understanding how it works.
[00:18.480 --> 00:20.960]  We'll use a small example using three words.
[00:20.960 --> 00:25.360]  We use these as inputs and pass them through the encoder.
[00:25.360 --> 00:30.120]  We retrieve a numerical representation of each word.
[00:30.120 --> 00:35.180]  Here for example, the encoder converts those three words, welcome to NYC, in

Ahora instalamos transformers para traducir del idioma ingles al idioma español los subtitulos.

In [42]:
from transformers import pipeline
import os
import shutil

def translate(lines):
  translates = []
  translator = pipeline("translation_en_to_es",model="Helsinki-NLP/opus-mt-en-es")
  for line in lines:
    translation = translator(line)
    translates.append(translation[0]["translation_text"])
  return translates

def open_file(path):
    #Open the file
    file = open(path, "r")
    # Read a line of the file until there are no more lines
    # Remove the return character from each line.
    # Return the list of lines.
    lines = file.read().splitlines()
    # Close the file
    file.close()
    return lines

# Copy a file and other path and rename it
def copy_srt(src, dst):
    if os.path.isfile(src):
      shutil.copy(src, dst)
      print("File copied successfully.")

# Searches a text file for a matching line and replaces it with a given value
def replace_line(filename, search, replace):
    with open(filename, 'r') as f:
        lines = f.readlines()
    with open(filename, 'w') as f:
        for line in lines:
            if search in line:
                line = replace
            f.writelines(line)

def replace_srt(filename):
  searches = open_file(filename)
  replaces = translate(searches)
  for i in range(0, len(searches)):
    replace_line(filename, searches[i], replaces[i])

In [44]:
copy_srt("/content/audio_transcription/output.mp4.srt", "/content/audio_transcription/output.es.srt")

File copied successfully.


In [45]:
replace_srt("/content/audio_transcription/output.es.srt")

In [46]:
%%capture
!ffmpeg -i "/content/output.mp4" -i "/content/audio_transcription/output.mp4.srt" -i "/content/audio_transcription/output.es.srt" -map 0:v -map 0:a -map 1 -map 2 -c:v copy -c:a copy -c:s srt -metadata:s:s:0 language=eng -metadata:s:s:1 language=es output.mkv