<a href="https://colab.research.google.com/github/RQledotai/holocron-colab/blob/master/notebooks/whisper_to_synopsis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Whipser to Synopsis

## Description

This Google Colab notebook demonstrates how to convert a MP4 video into a blog posts by:
1. Extracting the Audio track from video
2. Transcribing the Audio track to text
3. Summarizing the text into key takeaways

## Initialization

### Install Dependencies

In [40]:
%pip install moviepy



In [41]:
%pip install pytubefix



In [42]:
%pip install speechrecognition



In [43]:
%pip install vosk



In [44]:
%pip install google-generativeai



### Download Vosk model

For the Vosk API to work, we need to download a model for transcription (see [Model list](https://alphacephei.com/vosk/models)). In this case, we will leverage the Lightweight wideband model for Android and RPi (i.e. `vosk-model-small-en-us-0.15`).

In [45]:
from vosk import Model

model = Model(lang="en-us")

## Extract Audio Track

For this notebook, the first step is to retrieve a video from YouTube. In this case, we will download the [*How Google Search Works* video](https://www.youtube.com/watch?v=0eKVizvYSUQ).

In [46]:
from pytubefix import YouTube

yt_object = YouTube('https://www.youtube.com/watch?v=0eKVizvYSUQ')
# print the title of the video
print(f'Title: {yt_object.title}')

Title: How Google Search Works (in 5 minutes)


In [47]:
# download the video stream
yt_object_high_res = yt_object.streams.get_highest_resolution()
print(f'Downloading: {yt_object_high_res}')
yt_object_high_res.download()

Downloading: <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">


'/content/How Google Search Works (in 5 minutes).mp4'

The next step is to extract the Audio track from the video. We have selected the [`moviepy` library](https://zulko.github.io/moviepy/) to achieve this.

In [48]:
import moviepy.editor as mpe

video = mpe.VideoFileClip('/content/How Google Search Works (in 5 minutes).mp4')
video.audio.write_audiofile('/content/audio.wav')
video.close()

MoviePy - Writing audio in /content/audio.wav




MoviePy - Done.


## Transcribe Audio Track

The next step is to transcribe the audio track. For this step, we leverage a combination of the [`SpeechRecognition` library](https://github.com/Uberi/speech_recognition) and the [Vosk API](https://alphacephei.com/vosk/).

In [49]:
import speech_recognition as sr

# initialize the recognizer engine
recognizer_engine = sr.Recognizer()

# upload the audio file to be processed
with sr.AudioFile('/content/audio.wav') as audio_file:
  audio_track = recognizer_engine.listen(audio_file)
  audio_output = recognizer_engine.recognize_vosk(audio_track)

In [50]:
import json

# print the recognized text
print(audio_output)

Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder.
