<a href="https://colab.research.google.com/github/codeing-link/youtube/blob/main/autotranslate_muti_ok.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Videos Transcription and Translation with Faster Whisper and ChatGPT**


[![notebook shield](https://img.shields.io/static/v1?label=&message=Notebook&color=blue&style=for-the-badge&logo=googlecolab&link=https://colab.research.google.com/github/lewangdev/autotranslate/blob/main/autotranslate.ipynb)](https://colab.research.google.com/github/lewangdev/autotranslate/blob/main/autotranslate.ipynb)
[![repository shield](https://img.shields.io/static/v1?label=&message=Repository&color=blue&style=for-the-badge&logo=github&link=https://github.com/lewangdev/autotranslate)](https://github.com/lewangdev/autotranslate)

This Notebook will guide you through the transcription and translation of video using [Faster Whisper](https://github.com/guillaumekln/faster-whisper) and ChatGPT. You'll be able to explore most inference parameters or use the Notebook as-is to store the transcript, translation and video audio in your Google Drive.

In [None]:
#@markdown # **Check GPU type** 🕵️

#@markdown The type of GPU you get assigned in your Colab session defined the speed at which the video will be transcribed.
#@markdown The higher the number of floating point operations per second (FLOPS), the faster the transcription.
#@markdown But even the least powerful GPU available in Colab is able to run any Whisper model.
#@markdown Make sure you've selected `GPU` as hardware accelerator for the Notebook (Runtime &rarr; Change runtime type &rarr; Hardware accelerator).

#@markdown |  GPU   |  GPU RAM   | FP32 teraFLOPS |     Availability   |
#@markdown |:------:|:----------:|:--------------:|:------------------:|
#@markdown |  T4    |    16 GB   |       8.1      |         Free       |
#@markdown | P100   |    16 GB   |      10.6      |      Colab Pro     |
#@markdown | V100   |    16 GB   |      15.7      |  Colab Pro (Rare)  |

#@markdown ---
#@markdown **Factory reset your Notebook's runtime if you want to get assigned a new GPU.**

!nvidia-smi -L

!nvidia-smi

In [None]:
#@markdown # **Install libraries** 🏗️
#@markdown This cell will take a little while to download several libraries.

#@markdown ---
! pip install faster-whisper==0.10.0
! pip install yt-dlp==2023.11.16
! pip install openai==0.28.1

! wget https://github.com/Purfview/whisper-standalone-win/releases/download/libs/cuBLAS.and.cuDNN_linux.zip
! unzip -o cuBLAS.and.cuDNN_linux.zip -d /usr/lib


In [None]:
#@markdown # **Import libraries for Python** 🐍

#@markdown This cell will import all libraries for python code.
import sys
import warnings
from faster_whisper import WhisperModel
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

In [None]:
#@markdown # **Optional:** Save data in Google Drive 💾
#@markdown Enter a Google Drive path and run this cell if you want to store the results inside Google Drive.

# Uncomment to copy generated images to drive, faster than downloading directly from colab in my experience.
from google.colab import drive
drive_mount_path = Path("/") / "content" / "drive"
drive.mount(str(drive_mount_path))
drive_mount_path /= "My Drive"
#@markdown ---
drive_path = "Colab Notebooks/Videos Transcription and Translation" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change your Google Drive path.**

drive_whisper_path = drive_mount_path / Path(drive_path.lstrip("/"))
drive_whisper_path.mkdir(parents=True, exist_ok=True)

In [None]:
#@markdown # **Model selection** 🧠

#@markdown As of the first public release, there are 4 pre-trained options to play with:

#@markdown |  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
#@markdown |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
#@markdown |  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~0.8 GB     |      ~32x      |
#@markdown |  base  |    74 M    |     `base.en`      |       `base`       |     ~1.0 GB     |      ~16x      |
#@markdown | small  |   244 M    |     `small.en`     |      `small`       |     ~1.4 GB     |      ~6x       |
#@markdown | medium |   769 M    |    `medium.en`     |      `medium`      |     ~2.7 GB     |      ~2x       |
#@markdown | large-v1  |   1550 M   |        N/A         |      `large-v1`       |    ~4.3 GB     |       1x       |
#@markdown | large-v2  |   1550 M   |        N/A         |      `large-v2`       |    ~4.3 GB     |       1x       |
#@markdown | large-v3  |   1550 M   |        N/A         |      `large-v2`       |    ~3.6 GB     |       1x       |

#@markdown ---
model_size = 'large-v2' #@param ['tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1', 'large-v2', 'large-v3']
device_type = "cuda" #@param {type:"string"} ['cuda', 'cpu']
compute_type = "float16" #@param {type:"string"} ['float16', 'int8_float16', 'int8']
#@markdown ---
#@markdown **Run this cell again if you change the model.**

model = WhisperModel(model_size, device=device_type, compute_type=compute_type)


In [None]:
#@markdown # **Video selection** 📺

#@markdown Enter the URL of the video you want to transcribe, wether you want to save the audio file in your Google Drive, and run the cell.

Type = "Video or playlist URL" #@param ['Video or playlist URL', 'Google Drive']
#@markdown ---
#@markdown #### **Video or playlist URL**
URL = "https://www.youtube.com/playlist?list=PL-c0DN3fTeQcbHY41LZ_5NZaCf1_3YLQa" #@param {type:"string"}
# store_audio = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### **Google Drive video, audio (mp4, wav), or folder containing video and/or audio files**
video_path = "Colab Notebooks/transcription/my_video.mp4" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change the video.**

video_path_local_list = []

if Type == "Video or playlist URL":

    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'outtmpl': '%(id)s.%(ext)s',
        # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
        'postprocessors': [{  # Extract audio using ffmpeg
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
        }]
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([URL])
        list_video_info = [ydl.extract_info(URL, download=False)]

    for video_info in list_video_info:
        video_path_local_list.append(Path(f"{video_info['id']}.wav"))

elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
            if video_path_drive.is_file():
                display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
            elif video_path_drive.is_dir():
                display(Markdown(f"**Subfolders not supported.**"))
            else:
                display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
            video_path_local = Path(".").resolve() / (video_path_drive.name)
            shutil.copy(video_path_drive, video_path_local)
            video_path_local_list.append(video_path_local)
    elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

else:
    raise(TypeError("Please select supported input type."))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


In [None]:
#@markdown # **将当前目录的wav文件保存到列表中**
import os

# 获取当前目录下所有的 .wav 文件
video_path_local_list = [file for file in os.listdir('.') if file.endswith('.wav')]

# 打印列表内容，检查.wav文件
print(video_path_local_list)

In [None]:
#@markdown # **Run the model** 🚀
from datetime import datetime, timedelta
import re

def format_time(time_str):
    # 解析时间字符串
    hours, minutes, seconds_micro = time_str.split(':')
    seconds, microseconds = seconds_micro.split(',')
    # 格式化时间为 HH:MM:SS,mmm (毫秒为三位数)
    formatted_time = f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d},{int(microseconds[:3]):03d}"
    return formatted_time

def time_format_to_seconds(t):
    h, m, s_ms = t.split(':')
    s, ms = s_ms.split(',')
    return int(h) * 3600 + int(m) * 60 + int(s) + int(ms) / 1000


def write_to_srt_file(segments, filename):
    with open(filename, "w", encoding="utf-8") as file:
        for index, segment in enumerate(segments, start=1):
            start_seconds = time_format_to_seconds(segment['start'])
            end_seconds = time_format_to_seconds(segment['end'])
            file.write(f"{index}\n")
            file.write(f"{seconds_to_time_format(start_seconds)} --> {seconds_to_time_format(end_seconds)}\n")
            file.write(f"{segment['text']}\n\n")
#@markdown Run this cell to execute the transcription of the video. This can take a while and very based on the length of the video and the number of parameters of the model selected above.
def seconds_to_time_format(s):
    # Convert seconds to hours, minutes, seconds, and milliseconds
    hours = s // 3600
    s %= 3600
    minutes = s // 60
    s %= 60
    seconds = s // 1
    milliseconds = round((s % 1) * 1000)

    # Return the formatted string
    return f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d},{int(milliseconds):03d}"


#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**
#@markdown #### Language
language_options = {
    "Auto Detect": "auto",
    "English": "en",
    "中文(Chinese)": "zh",
    "日本語(Japanese)": "ja",
    "Deutsch(German)": "de",
    "Français(French)": "fr"
}

language_option = "Auto Detect" #@param ["Auto Detect", "English", "中文(Chinese)", "日本語(Japanese)", "Deutsch(German)", "Français(French)"] {allow-input: true}
language = language_options.get(language_option, language_option)

#@markdown #### initial prompt
initial_prompt = "Hello, Let's begin to talk." #@param {type:"string"}
#@markdown ---
#@markdown #### Word-level timestamps
word_level_timestamps = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### VAD filter
vad_filter = False #@param {type:"boolean"}
vad_filter_min_silence_duration_ms = 50 #@param {type:"integer"}
#@markdown ---

for video_path_local in video_path_local_list:
  segments, info = model.transcribe(str(video_path_local), beam_size=5,
                                    language=None if language == "auto" else language,
                                    initial_prompt=initial_prompt,
                                    word_timestamps=word_level_timestamps,
                                    vad_filter=vad_filter,
                                    vad_parameters=dict(min_silence_duration_ms=vad_filter_min_silence_duration_ms))

  language_detected = info.language
  display(Markdown(f"Detected language '{info.language}' with probability {info.language_probability}"))

  fragments = []
  line_text = []

  for segment in segments:
    print(f"[{seconds_to_time_format(segment.start)} --> {seconds_to_time_format(segment.end)}] {segment.text}")

    start_time = format_time(seconds_to_time_format(segment.start))
    end_time = format_time(seconds_to_time_format(segment.end))
    text = segment.text
    line_text.append({'start': start_time, 'end': end_time, 'text': text})

    if word_level_timestamps:
      for word in segment.words:
        ts_start = seconds_to_time_format(word.start)
        ts_end = seconds_to_time_format(word.end)
        #print(f"[{ts_start} --> {ts_end}] {word.word}")
        fragments.append(dict(start=word.start,end=word.end,text=word.word))
    else:
      ts_start = seconds_to_time_format(segment.start)
      ts_end = seconds_to_time_format(segment.end)
      #print(f"[{ts_start} --> {ts_end}] {segment.text}")
      fragments.append(dict(start=segment.start,end=segment.end,text=segment.text))



  write_to_srt_file(line_text, video_path_local.replace(".wav", ".srt"))


In [None]:
#@markdown # **保存srt文件，删除本地的srt和wav文件**
import os
import shutil

def copy_srt_files_to_target(target_directory):
    # 获取当前目录
    current_directory = os.getcwd()

    # 确保目标目录存在
    if not os.path.exists(target_directory):
        os.makedirs(target_directory)

    # 遍历当前目录中的文件
    for filename in os.listdir(current_directory):
        if filename.endswith(".srt"):
            # 构建完整的源文件路径和目标文件路径
            source_path = os.path.join(current_directory, filename)
            target_path = os.path.join(target_directory, filename)

            # 复制文件
            shutil.copy(source_path, target_path)
            print(f"Copied: {filename}")

# 指定目标目录（这里直接指定为 './drive/Mydrive/srt'）
#print(drive_whisper_path)
target_folder = drive_whisper_path
copy_srt_files_to_target(target_folder)



import os

def delete_files_with_extensions(extensions):
    # 获取当前目录
    current_directory = os.getcwd()

    # 遍历当前目录中的文件
    for filename in os.listdir(current_directory):
        if any(filename.endswith(ext) for ext in extensions):
            # 构建文件的完整路径
            file_path = os.path.join(current_directory, filename)

            # 删除文件
            os.remove(file_path)
            print(f"Deleted: {filename}")

# 要删除的文件扩展名列表
extensions_to_delete = [".srt", ".wav"]
delete_files_with_extensions(extensions_to_delete)