<a href="https://colab.research.google.com/github/alexwhite116/youtube-transcription/blob/main/youtube_transcription.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouTube Transcription Project

This notebook looks to achieve the following:
1. Create/implement a model to transcribe a YouTube video given a link and store the transcription.
2. (Stretch) Implement a search function to enter text and return the most relevant video link based on its transcription.
3. (Stretch) Implement a summary of videos retrieved via the search function, also presenting the video title and thumbnail.
4. (Stretch) Implement automation for a specific streamer so that transcriptions/summaries of all of their content are available (and searchable).
5. Deploy the above as part of a Gradio app (stretch goals optional/iterative).

## Imports

We import `torch`, `torchaudio` and `pytube`


*   `torch`: For machine learning model implementation
  *   `torchaudio`: We will be working with the audio of the videos for transcription
*   `pytube`: To enable us to work with YouTube streams
* `moviepy.editor`: To allow us to manipulate audio files
* `IPython`: To allow us to play audio files in order to understand our data


In [6]:
import os

import random

import moviepy.editor
import IPython

import torch
import torchaudio
print(f"PyTorch version: {torch.__version__}")
print(f"torchaudio version: {torchaudio.__version__}")

try:
  import pytube
  print(f"PyTube version: {pytube.__version__}")
except:
  !pip install pytube
  import pytube
  print(f"PyTube version: {pytube.__version__}")

PyTorch version: 2.2.1+cu121
torchaudio version: 2.2.1+cu121
PyTube version: 15.0.0


## Downloading from YouTube

We first define a function to download audio from any given YouTube URL using `pytube`.

In [2]:
def download_youtube_audio(link: str) -> str:
  """Downloads youtube video, converts to audio and returns path.

  Arg:
    link (str): Link in string format to YouTube video to be downloaded.

  Returns:
    str: Path to audio file in local directory
  """
  streams = pytube.YouTube(link).streams.filter(only_audio=True) # Get only audio streams of youtube video
  video_path = streams[0].download() # take the first audio stream. video as this .mp4 format
  audio_path = video_path[0:-1] + "3" # .mp4 -> .mp3
  audio = moviepy.editor.AudioFileClip(video_path)
  audio.write_audiofile(audio_path)
  audio.close()
  os.remove(video_path)
  return audio_path

Now we download the video *[I rank EVERY BOSS in Elden Ring from best to worst as the Doll from Bloodborne](https://www.youtube.com/watch?v=bql0s1aCR_4&t=1s)* from YouTube and save its audio.

In [3]:
link = "https://www.youtube.com/watch?v=bql0s1aCR_4&t=1s"

audio_path = download_youtube_audio(link)

MoviePy - Writing audio in /content/I rank EVERY BOSS in Elden Ring from best to worst as the Doll from Bloodborne.mp3


                                                                       

MoviePy - Done.




Next, we define a function to play a snippet of our audio file. By default, this is a random 30 second clip.

In [17]:
def play_audio_snippet(audio_path: str,
                       start_time: float | None = None,
                       length: float = 30,
                       clip_path: str = "/content/sample_clip.mp3") -> None:
  clip = moviepy.editor.AudioFileClip(audio_path)
  if start_time is None:
    start_time = random.uniform(0, clip.end-30)
  if not clip.is_playing(start_time) or not clip.is_playing(start_time + length):
    raise Exception("Invalid snippet time given")
  clip = clip.subclip(start_time, start_time + length)
  if os.path.exists(clip_path):
    os.remove(clip_path)
  clip.write_audiofile(clip_path)
  IPython.display.display(IPython.display.Audio(clip_path))

In [19]:
play_audio_snippet(audio_path)

MoviePy - Writing audio in /content/sample_clip.mp3


                                                                   

MoviePy - Done.


