<a href="https://colab.research.google.com/github/NotLevente/yt-felirat/blob/main/youtube_magyar_felirat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Youtube feliratozó**

Ez az alkalmazás az [OpenAI Whisper](https://openai.com/research/whisper) szövegfelismerő technológiát használva feliratot készít magyar YouTube videókhoz. Használatához Google Chrome vagy MS Edge böngésző szükséges.
---

Futtatni a lépéseket a "Play" gombok aktiválásával lehet.

**Első** lépésben letölti és telepíti a szükséges komponenseket. Ezt csak egyszer kell megtenni. Habár 20 perc inaktivitás után az ingyenes Google colab törli a munkamenetet... Ha magára hagytad fél órárára és hibát jelez, futtasd újra az első cellát.

**Másodikban** Add meg a feliratozandó video URL-t. Pl.: https://www.youtube.com/watch?v=suISpg_m-hw . Az elkészült feliratfájl letöltésre kerül a videoazonosító.srt, jelen esetben suISpg_m-hw.srt néven. 



A feliratot [ezzel a Chrome addonnal](https://chrome.google.com/webstore/detail/subtitles-for-youtube/oanhbddbfkjaphdibnebkklpplclomal?hl=en) lehet használni.

---

Közbeszerezve, magyarosítva és egyszerűsítve [innen](https://colab.research.google.com/github/ArthurFDLR/whisper-youtube/blob/main/whisper_youtube.ipynb).

In [None]:
#@markdown # **<-- 1. Hozzávalók előkészítése...** 🏗️
#@markdown Kb. 2 perc amíg a függőségeket telepíti.


! pip install git+https://github.com/openai/whisper.git
! pip install yt-dlp

import sys
import warnings
import whisper
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo
from google.colab import files

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)
Model = 'large'
whisper_model = whisper.load_model(Model)

if Model in whisper.available_models():
    display(Markdown(
        f"**{Model} model is selected.**"
    ))
else:
    display(Markdown(
        f"**{Model} model is no longer available.**<br /> Please select one of the following:<br /> - {'<br /> - '.join(whisper.available_models())}"
    ))

In [None]:
#@markdown # **<-- 2. Video kiválasztás és indítás** 🚀

#@markdown Add meg a feliratozandó Youtube video URL-t.

Type = "Youtube video or playlist"
##@markdown ---
##@markdown #### **Youtube video or playlist**
URL = "https://www.youtube.com/watch?v=suISpg_m-hw" #@param {type:"string"}
# store_audio = True #@param {type:"boolean"}
#@markdown **Csak ezt a cellát futtasd újra a ha egy másik videót is feliratoznál.**
video_path_local_list = []

if Type == "Youtube video or playlist":
    
    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'outtmpl': '%(id)s.%(ext)s',
        # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
        'postprocessors': [{  # Extract audio using ffmpeg
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
        }]
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([URL])
        list_video_info = [ydl.extract_info(URL, download=False)]
        
    for video_info in list_video_info:
        video_path_local_list.append(Path(f"{video_info['id']}.wav"))

elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
            if video_path_drive.is_file():
                display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
            elif video_path_drive.is_dir():
                display(Markdown(f"**Subfolders not supported.**"))
            else:
                display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
            video_path_local = Path(".").resolve() / (video_path_drive.name)
            shutil.copy(video_path_drive, video_path_local)
            video_path_local_list.append(video_path_local)
    elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

else:
    raise(TypeError("Please select supported input type."))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])

##@markdown ### **Behavior control**
##@markdown ---
language = "Hungarian"
##@markdown > Language spoken in the audio, use `Auto detection` to let Whisper detect the language.
##@markdown ---
verbose = 'Live transcription'
##@markdown > Whether to print out the progress and debug messages.
##@markdown ---
output_format = 'srt'
##@markdown > Type of file to generate to record the transcription.
##@markdown ---
task = 'transcribe'
##@markdown > Whether to perform X->X speech recognition (`transcribe`) or X->English translation (`translate`).
##@markdown ---

##@markdown <br/>

##@markdown ### **Optional: Fine tunning** 
##@markdown ---
temperature = 0.15 
##@markdown > Temperature to use for sampling.
#ű@markdown ---
temperature_increment_on_fallback = 0.2
##@markdown > Temperature to increase when falling back when the decoding fails to meet either of the thresholds below.
##@markdown ---
best_of = 5 
##@markdown > Number of candidates when sampling with non-zero temperature.
##@markdown ---
beam_size = 8
##@markdown > Number of beams in beam search, only applicable when temperature is zero.
##@markdown ---
patience = 1.0 
##@markdown > Optional patience value to use in beam decoding, as in [*Beam Decoding with Controlled Patience*](https://arxiv.org/abs/2204.05424), the default (1.0) is equivalent to conventional beam search.
##@markdown ---
length_penalty = -0.05 
##@markdown > Optional token length penalty coefficient (alpha) as in [*Google's Neural Machine Translation System*](https://arxiv.org/abs/1609.08144), set to negative value to uses simple length normalization.
##@markdown ---
suppress_tokens = "-1" 
##@markdown > Comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations.
##@markdown ---
initial_prompt = "" 
##@markdown > Optional text to provide as a prompt for the first window.
##@markdown ---
condition_on_previous_text = True 
##@markdown > if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
##@markdown ---
fp16 = True 
##@markdown > whether to perform inference in fp16.
##@markdown ---
compression_ratio_threshold = 2.4 
##@markdown > If the gzip compression ratio is higher than this value, treat the decoding as failed.
##@markdown ---
logprob_threshold = -1.0 
##@markdown > If the average log probability is lower than this value, treat the decoding as failed.
##@markdown ---
no_speech_threshold = 0.6
##@markdown > If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence.
##@markdown ---

verbose_lut = {
    'Live transcription': True,
    'Progress bar': True,
    'None': None
}

args = dict(
    language = (None if language == "Auto detection" else language),
    verbose = verbose_lut[verbose],
    task = task,
    temperature = temperature,
    temperature_increment_on_fallback = temperature_increment_on_fallback,
    best_of = best_of,
    beam_size = beam_size,
    patience=patience,
    length_penalty=(length_penalty if length_penalty>=0.0 else None),
    suppress_tokens=suppress_tokens,
    initial_prompt=(None if not initial_prompt else initial_prompt),
    condition_on_previous_text=condition_on_previous_text,
    fp16=fp16,
    compression_ratio_threshold=compression_ratio_threshold,
    logprob_threshold=logprob_threshold,
    no_speech_threshold=no_speech_threshold
)

temperature = args.pop("temperature")
temperature_increment_on_fallback = args.pop("temperature_increment_on_fallback")
if temperature_increment_on_fallback is not None:
    temperature = tuple(np.arange(temperature, 1.0 + 1e-6, temperature_increment_on_fallback))
else:
    temperature = [temperature]

if Model.endswith(".en") and args["language"] not in {"en", "English"}:
    warnings.warn(f"{Model} is an English-only model but receipted '{args['language']}'; using English instead.")
    args["language"] = "en"

for video_path_local in video_path_local_list:
    display(Markdown(f"### {video_path_local}"))

    video_transcription = whisper.transcribe(
        whisper_model,
        str(video_path_local),
        temperature=temperature,
        **args,
    )

    # Save output
    whisper.utils.get_writer(
        output_format=output_format,
        output_dir=video_path_local.parent
    )(
        video_transcription,
        str(video_path_local.stem),
        options=dict(
            highlight_words=False,
            max_line_count=None,
            max_line_width=None,
        )
    )
    try:
#        if output_format=="all":
#            for ext in ('txt', 'vtt', 'srt', 'tsv', 'json'):
#                transcript_file_name = video_path_local.stem + "." + ext
#                shutil.copy(
#                    video_path_local.parent / transcript_file_name,
#                    drive_whisper_path / transcript_file_name
#                )
#                display(Markdown(f"**Transcript file created: {drive_whisper_path / transcript_file_name}**"))
#        else:
            transcript_file_name = video_path_local.stem + "." + output_format
            subfil = video_path_local.parent / transcript_file_name
            files.download(subfil) 
#            shutil.copy(
#                video_path_local.parent / transcript_file_name,
#                drive_whisper_path / transcript_file_name
#            )
#            from google.colab import files
#            files.download(video_path_local.parent / transcript_file_name) 
#            display(Markdown(f"**Transcript file created: {drive_whisper_path / transcript_file_name}**"))

    except:
        display(Markdown(f"**Transcript file created: {transcript_local_path}**"))