# Youtube video to Text Summarization using Huggingface pipelines
In this notebook we have used the Speech2Text pipeline in transformers library, where we used a Youtube video link, which we downloaded in the mp4 format and converted it to mp3 format and used the pipeline to convert it to transcript text form.

In [1]:
!nvidia-smi

Fri Sep  1 16:52:33 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   64C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

With the above code we check whether the system has a GPU cable of processsing or not

In [11]:
# Importing dependencies

! pip install transformers
from transformers import pipeline



In [12]:
! pip install pytube
! pip install moviepy



In [14]:
# Downloading the Video and converting to mp3 format from Youtube
from pytube import YouTube
from moviepy.editor import VideoFileClip
import os

#Video URL
video_url = "https://www.youtube.com/watch?v=UNP03fDSj1U"

# Output Formats
mp4_filename = "video.mp4"      # Here we save the video with the name of "video.mp4"
mp3_filename = "ytaudio.mp3"   # Here we save the converted audio with the name of "ytaudio.mp3"

# Download the YouTube video
yt = YouTube(video_url)
stream = yt.streams.get_highest_resolution()
stream.download(filename=mp4_filename)

# Convert MP4 to WAV
video_clip = VideoFileClip(mp4_filename)
audio_clip = video_clip.audio
audio_clip.write_audiofile(mp3_filename, codec ='mp3')

# Clean up the intermediate MP4 file
os.remove(mp4_filename)

print(f"Downloaded video as {mp4_filename} and converted to {mp3_filename}")


chunk:   1%|          | 29/4569 [07:58<20:47:12, 16.48s/it, now=None]

MoviePy - Writing audio in ytaudio.mp3



chunk:   0%|          | 0/4569 [00:00<?, ?it/s, now=None][A
chunk:   2%|▏         | 93/4569 [00:00<00:04, 922.75it/s, now=None][A
chunk:   4%|▍         | 186/4569 [00:00<00:04, 918.03it/s, now=None][A
chunk:   6%|▌         | 278/4569 [00:00<00:04, 883.51it/s, now=None][A
chunk:   8%|▊         | 382/4569 [00:00<00:04, 942.42it/s, now=None][A
chunk:  11%|█▏        | 517/4569 [00:00<00:03, 1085.93it/s, now=None][A
chunk:  14%|█▍        | 652/4569 [00:00<00:03, 1174.34it/s, now=None][A
chunk:  17%|█▋        | 778/4569 [00:00<00:03, 1199.65it/s, now=None][A
chunk:  20%|█▉        | 899/4569 [00:00<00:03, 1152.32it/s, now=None][A
chunk:  22%|██▏       | 1015/4569 [00:00<00:03, 985.33it/s, now=None][A
chunk:  24%|██▍       | 1118/4569 [00:01<00:04, 858.46it/s, now=None][A
chunk:  26%|██▋       | 1209/4569 [00:01<00:04, 786.43it/s, now=None][A
chunk:  28%|██▊       | 1292/4569 [00:01<00:04, 763.51it/s, now=None][A
chunk:  30%|███       | 1371/4569 [00:01<00:04, 743.50it/s, now=Non

MoviePy - Done.
Downloaded video as video.mp4 and converted to ytaudio.mp3


In [15]:
# Building the in-display mp3 audio player with iPython

from IPython.display import Audio, display
display (Audio('ytaudio.mp3', autoplay = True))

In [16]:
# Transcribing the audio in Text Format
! pip install kenlm
! pip install pyctcdecode
# ! pip install pyannote.core



In [1]:
# from pyannote.audio import Pipeline
# pipeline = Pipeline.from_pretrained("pyannote/voice-activity-detection", use_auth_token="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")
from transformers import pipeline

# models tried
# openai/whisper-large-v2   jonatasgrosman/wav2vec2-large-xlsr-53-english   facebook/wav2vec2-base-960h
whisper = pipeline('automatic-speech-recognition', model = "facebook/wav2vec2-base-960h", device=0)  # here we can use 'device = 0' beacause we have a GPU, if we had a normal CPU we would have left it at default which is -1


Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [2]:
text = whisper('ytaudio.mp3')
text

{'text': "A FEW YEARS AGO I FELT LIKE I WAS STUCK IN A RUT SO I DECIDED TO FOLLOW IN THE FOOTSTEPS OF THE GREAT AMERICAN PHILOSOPHER MORGAN SPURLOCK AND TRY SOMETHING NEW FOR THIRTY DAYS THE IDEA IS ACTUALLY PRETTY SIMPLE THINK ABOUT SOMETHING YOU'VE ALWAYS WANTED TO ADD TO YOUR LIFE AND TRY IT FOR THE NEXT THIRTY DAYS IT TURNS OUT THIRTY DAYS IS JUST ABOUT THE WRIGHT AMOUNT OF TIME TO AVENEW HABIT OR SUBTRACT A HABIT LIKE WATCHING THE NEWS FROM YOUR LIFE THERE'S A FEW THINGS THAT I LEARNED WHILE DOING THESE THIRTY DAY CHALLENGES THE FIRST WAS INSTEAD OF THE MONTHS FLYING BY FOR GOTTON THE TIME WAS MUCH MORE MEMORABLE THIS WAS PART OF A CHALLENGE I DID TO TAKE A PICTURE EVERY DAY FOR A MONTH AND I REMEMBER EXACTLY WHERE I WAS AND WHAT I WAS DOING THAT DAY I ALSO NOTICED THAT AS I STARTED TO DO MORE AND HARDER THIRTY DAY CHALLENGES MY SELF CONFIDENCE GREW I WENT FROM DEATH DWELLING COMPUTER NERD TO THE KIND OF GOT WHO BIKES TO WORK FOR FUN EVEN LAST YEAR I AT IT UP HIKING UP MOUNT KILLI

# Transcription Summarization
Now that the transcript has been generated we can use one of the Summarizer pipeline

In [6]:
text_list = list(text.values())
# print(len(text_list))
# Initialize the summarization pipeline
#text_list
summarizer = pipeline("summarization", model = 'facebook/bart-large-cnn')

# Summarize the text
summaries = summarizer(text_list, max_length=300, min_length=50, do_sample=False)

#max_length: This parameter specifies the maximum length (in terms of tokens) for the generated summary.
#If a summary exceeds this length, it will be truncated to match the specified max_length.
#Setting a lower value for max_length can result in shorter and more concise summaries, but it may also remove important details.

#min_length: This parameter specifies the minimum length (in terms of tokens) for the generated summary.
#If a summary is shorter than this length, the model may continue generating text to meet the minimum length requirement.
#Setting a higher value for min_length can ensure that the summaries are not too short and provide more


In [8]:
# Print the summaries
# for i, summary in enumerate(summaries):
#     print(f"Summary for document_{i + 1}:", summary["summary_text"])

summaries

[{'summary_text': 'A FEW YEARS AGO I FELT LIKE I WAS STUCK IN A RUT SO I DECIDED TO FOLLOW IN THE FOOTSTEPS OF THE GREAT AMERICAN PHILOSOPHER MORGAN SPURLOCK. THIRTY DAYS is just about the WRIGHT AMOUNT of time to AVENEW HABIT or SUBTRACT A HABit.'}]

In [10]:
import torch
print(torch.cuda.memory_summary(device=None, abbreviated=False))

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      | 395744 KiB |  10429 MiB | 133479 MiB | 133092 MiB |
|       from large pool | 395136 KiB |  10429 MiB | 133402 MiB | 133016 MiB |
|       from small pool |    608 KiB |      1 MiB |     76 MiB |     76 MiB |
|---------------------------------------------------------------------------|
| Active memory         | 395744 KiB |  10429 MiB | 133479 MiB | 133092 MiB |
|       from large pool | 395136 KiB |  10429 MiB | 133402 MiB | 133016 MiB |
|       from small pool |    608 KiB |      1 MiB |     76 MiB |     76 MiB |
|---------------------------------------------------------------