<a href="https://colab.research.google.com/github/Blowdok/AI-tools-for-you/blob/main/youtube_whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouTube Video Transcription with OpenAI's Whisper

[![License](https://img.shields.io/github/license/kazuki-sf/youtube-whisper)](https://github.com/kazuki-sf/youtube-whisper)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kazuki-sf/youtube-whisper/blob/main/youtube_whisper.ipynb)

## How to Use the Notebook
Feel free to `Copy to Drive` the notebook or run it directly.
1. Enter the URL of the YouTube video or shorts you want to transcribe.
2. Choose the whisper model you want to use.
3. Run the code cell (Step 1-3) and wait for the transcription to complete.

## Notes
* `T4 GPU` or higher is recommended for running the notebook. You can change the runtime type by going to `Runtime` -> `Change runtime type` -> `Hardware accelerator` -> `GPU`.
* Whenever you change the YouTube URL or Whisper Model, please run the `Step 1` and then run `Step 3` (You can skip `Step 2` if you already ran it before)
* When you run `Step 3`, the website might ask you a permission to download multiple files.
* This project is not affiliated with OpenAI. The code provided here is for educational purposes only.
* Here's a list of whisper model and the relative speed of each model. For more information, please visit the official GitHub page: https://github.com/openai/whisper#available-models-and-languages
---

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |



In [1]:
# @title Step 1: Enter URL & Choose Whisper Model

# @markdown Enter the URL of the YouTube video
YouTube_URL = "https://www.youtube.com/watch?v=NyNHRBojsqw" #@param {type:"string"}

# @markdown Choose the whisper model you want to use
whisper_model = "medium" # @param ["tiny", "base", "small", "medium", "large", "large-v2", "large-v3"]

# @markdown Save the transcription as text (.txt) file?
text = True #@param {type:"boolean"}

# @markdown Save the transcription as an SRT (.srt) file?
srt = False #@param {type:"boolean"}


In [2]:
# =========================
# 🚀 STEP 2 : INSTALLATION DES DÉPENDANCES
# =========================
!pip install -q yt-dlp openai-whisper torch
!apt update && apt install ffmpeg -y
!pip install -q git+https://github.com/openai/whisper.git

import os
import torch
import whisper
import yt_dlp
from pathlib import Path
from datetime import datetime
from google.colab import files

# Détection du GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🔍 Appareil utilisé : {device}")

# Chargement du modèle Whisper
model = whisper.load_model(whisper_model).to(device)
print(f"🎤 Modèle Whisper chargé : {whisper_model}")


[33m0% [Working][0m            Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
[33m0% [Connecting to archive.ubuntu.com (185.125.190.81)] [Connected to cloud.r-project.org (3.171.85.6[0m                                                                                                    Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
[33m0% [Waiting for headers] [Connected to r2u.stat.illinois.edu (192.17.190.167)] [Connected to develop[0m                                                                                                    Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy

  checkpoint = torch.load(fp, map_location=device)


🎤 Modèle Whisper chargé : medium


In [None]:
# =========================
# 🚀 STEP 3 : UTILITAIRES
# =========================
def to_snake_case(name):
    """Convertit un nom en format snake_case"""
    return name.lower().replace(" ", "_").replace(":", "_").replace("__", "_")

def format_time(seconds):
    """Formate un timestamp en HH:MM:SS,ms"""
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    milliseconds = int((seconds % 1) * 1000)
    return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"

# =========================
# 🚀 STEP 4 : TÉLÉCHARGEMENT DE L'AUDIO
# =========================
def download_audio_from_youtube(url, out_dir="/content"):
    """Télécharge l'audio d'une vidéo YouTube avec yt-dlp et corrige le fichier résultant."""
    print(f"\n🎬 Téléchargement de l'audio...")

    file_name = f"audio_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp3"
    file_path = Path(out_dir) / file_name

    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': str(file_path),
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])

    # Corriger l'erreur de double extension `.mp3.mp3`
    double_ext_path = file_path.with_suffix(".mp3.mp3")
    if double_ext_path.exists():
        double_ext_path.rename(file_path)

    print(f"\n✅ Audio téléchargé : {file_path}")
    return file_path

# =========================
# 🚀 STEP 5 : TRANSCRIPTION AVEC WHISPER
# =========================
def transcribe_audio(model, file, text=True, srt=True):
    """Transcrit un fichier audio avec Whisper et génère des fichiers texte et sous-titres."""
    file_path = Path(file)

    # Vérifier si le fichier existe
    if not file_path.exists():
        raise FileNotFoundError(f"🚨 Erreur : le fichier audio '{file_path}' n'existe pas. Vérifie le téléchargement.")

    print(f"\n📝 Transcription en cours avec Whisper...")
    result = model.transcribe(str(file_path), verbose=False)

    if text:
        txt_path = file_path.with_suffix(".txt")
        print(f"\n📄 Enregistrement de la transcription en TXT : {txt_path}")
        with open(txt_path, "w", encoding="utf-8") as txt:
            txt.write(result["text"])

    if srt:
        srt_path = file_path.with_suffix(".srt")
        print(f"\n📄 Enregistrement des sous-titres en SRT : {srt_path}")
        with open(srt_path, "w", encoding="utf-8") as srt_file:
            for segment in result["segments"]:
                start = segment["start"]
                end = segment["end"]
                text = segment["text"]
                srt_file.write(f"{segment['id'] + 1}\n")
                srt_file.write(f"{format_time(start)} --> {format_time(end)}\n")
                srt_file.write(f"{text}\n\n")

    print("\n✨ Transcription terminée !")

    # Téléchargement automatique des fichiers
    download_transcription(txt_path if text else None, srt_path if srt else None)

    return result

# =========================
# 🚀 STEP 6 : TÉLÉCHARGEMENT DES FICHIERS
# =========================
def download_transcription(txt_file, srt_file):
    """Télécharge automatiquement les fichiers .txt et .srt dans Google Colab."""
    print("\n📥 Téléchargement des fichiers transcription...")

    if txt_file:
        files.download(str(txt_file))
        print(f"✅ Téléchargement de {txt_file} terminé !")

    if srt_file:
        files.download(str(srt_file))
        print(f"✅ Téléchargement de {srt_file} terminé !")

# =========================
# 🚀 STEP 7 : EXÉCUTION DU SCRIPT
# =========================
if __name__ == "__main__":
    print("\n🚀 Début du processus...\n")

    # 📥 Étape 1 : Télécharger l'audio
    audio_file = download_audio_from_youtube(YouTube_URL)

    # 📝 Étape 2 : Transcrire avec Whisper
    result = transcribe_audio(model, audio_file, text=text, srt=srt)

    print("\n🎉 Tout est prêt !")



🚀 Début du processus...


🎬 Téléchargement de l'audio...
[youtube] Extracting URL: https://www.youtube.com/watch?v=NyNHRBojsqw
[youtube] NyNHRBojsqw: Downloading webpage
[youtube] NyNHRBojsqw: Downloading tv client config
[youtube] NyNHRBojsqw: Downloading player 1080ef44
[youtube] NyNHRBojsqw: Downloading tv player API JSON
[youtube] NyNHRBojsqw: Downloading ios player API JSON
[youtube] NyNHRBojsqw: Downloading m3u8 information
[info] NyNHRBojsqw: Downloading 1 format(s): 251
[download] Destination: /content/audio_20250128_234142.mp3
[download] 100% of    3.66MiB in 00:00:00 at 22.00MiB/s  
[ExtractAudio] Destination: /content/audio_20250128_234142.mp3.mp3
Deleting original file /content/audio_20250128_234142.mp3 (pass -k to keep)

✅ Audio téléchargé : /content/audio_20250128_234142.mp3

📝 Transcription en cours avec Whisper...




Detected language: Shona


  0%|          | 0/21355 [00:00<?, ?frames/s]