# YouTube Video Downloader and Transcript Analyzer

This notebook demonstrates how to download a YouTube video, extract its audio, transcribe the audio, and perform various analyses on the transcription using OpenAI Whisper and HuggingFace models. The analyses include generating abstract summaries, key points, action items, sentiment analysis, and detailed summaries.



## 1. Setup

Install the required libraries for downloading YouTube videos and processing audio.

In [None]:
# Install yt-dlp library for downloading YouTube videos
!pip install yt-dlp

# Install ffmpeg-python library for audio extraction
!pip install --upgrade ffmpeg-python

# Install Whisper for transcription
!pip install -U openai-whisper

# Install libraries for text analysis
!pip install -q -U langchain transformers bitsandbytes accelerate
!pip install langchain-community langchain-core


Collecting yt-dlp
  Downloading yt_dlp-2024.7.25-py3-none-any.whl.metadata (170 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/170.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m163.8/170.1 kB[0m [31m9.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m170.1/170.1 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting brotli (from yt-dlp)
  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.20.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting requests<3,>=2.32.2 (from yt-dlp)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Co

## 2. Download YouTube Video

This section contains a function to download a YouTube video as audio-only and save it to a specified path.


In [None]:
import yt_dlp
import os
import re

def download_youtube_video(url, download_path='videos'):
    """
    Downloads a YouTube video and saves it in the specified path.

    Parameters:
    url (str): The URL of the YouTube video to download.
    download_path (str): The directory where the downloaded video will be saved.

    Returns:
    str: The path to the downloaded video.
    """
    ydl_opts = {
        'format': 'bestvideo[height<=480]+bestaudio[ext=m4a]/mp4',
        'outtmpl': f'{download_path}/%(title)s.%(ext)s',
        'merge_output_format': 'mp4',
        'http_headers': {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0'
        }
    }

    os.makedirs(download_path, exist_ok=True)

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info_dict = ydl.extract_info(url, download=True)
        video_title = info_dict.get('title', None)
        if video_title:
            video_path = os.path.join(download_path, f"{video_title}.mp4")
        else:
            video_path = None

    return video_path


In [None]:
# YouTube URL and download path

youtube_url = 'https://www.youtube.com/watch?v=UBSDX8Wyz0s'
download_path = 'videos'
video_path = download_youtube_video(youtube_url, download_path)

[youtube] Extracting URL: https://www.youtube.com/watch?v=UBSDX8Wyz0s
[youtube] UBSDX8Wyz0s: Downloading webpage




[youtube] UBSDX8Wyz0s: Downloading ios player API JSON
[youtube] UBSDX8Wyz0s: Downloading player 1f8742dc
[youtube] UBSDX8Wyz0s: Downloading web player API JSON
[youtube] UBSDX8Wyz0s: Downloading m3u8 information
[info] UBSDX8Wyz0s: Downloading 1 format(s): 244+140
[download] Destination: videos/Practice English Conversation (Family life - Illiteracy) Improve English Speaking Skills.f244.webm
[download] 100% of   14.79MiB in 00:00:00 at 26.45MiB/s  
[download] Destination: videos/Practice English Conversation (Family life - Illiteracy) Improve English Speaking Skills.f140.m4a
[download] 100% of    9.88MiB in 00:00:00 at 14.21MiB/s  
[Merger] Merging formats into "videos/Practice English Conversation (Family life - Illiteracy) Improve English Speaking Skills.mp4"
Deleting original file videos/Practice English Conversation (Family life - Illiteracy) Improve English Speaking Skills.f140.m4a (pass -k to keep)
Deleting original file videos/Practice English Conversation (Family life - Illite

## 3. Extract Audio from Video

This section contains a function to extract audio from the downloaded video file and save it as an MP3 file.


In [None]:
import ffmpeg

def extract_audio_from_video(video_path, audio_path):
    """
    Extracts audio from a video file and saves it as an MP3 file.

    Parameters:
    video_path (str): The path to the video file.
    audio_path (str): The path where the extracted audio will be saved.
    """
    if not os.path.isfile(video_path):
        print(f"Error: Video file does not exist: {video_path}")
        return

    try:
        (
            ffmpeg
            .input(video_path)
            .output(audio_path, format='mp3')
            .run(overwrite_output=True, capture_stdout=True, capture_stderr=True)
        )
        print(f"Audio extracted successfully to: {audio_path}")
    except ffmpeg.Error as e:
        print(f"FFmpeg error: {e.stderr.decode('utf8')}")
    except Exception as e:
        print(f"An unexpected error occurred: {str(e)}")


In [None]:
# Define the path for the extracted audio file

audio_path = video_path.replace(".mp4", ".mp3")
extract_audio_from_video(video_path, audio_path)

Audio extracted successfully to: videos/Practice English Conversation (Family life - Illiteracy) Improve English Speaking Skills.mp3


## 4. Transcribe Audio

This section contains code to transcribe the extracted audio using the OpenAI Whisper model.


In [None]:
import whisper

# Load the Whisper model
model = whisper.load_model("medium")

# Transcribe the audio file
result = model.transcribe(audio_path)
print(result["text"])

# Store the transcription text
transcription = result["text"]


100%|██████████████████████████████████████| 1.42G/1.42G [00:14<00:00, 109MiB/s]


 Good morning, sir. Can you help me, please? It's the first time I come to this restaurant. Good morning, sir. Welcome to Johnny's Cafe. Here is the menu. What would you like to order? Well, can you tell me what's on the menu, please? I can't... Oh, can you hurry up, please? I have been waiting for 10 minutes. Hurry up! I'm sorry, sir. I know you have been waiting, but... I can't... Don't be sorry. Just tell the boy what you would like to order. That's all. Yes, sir. I'm sorry. Can you please tell me what's on the menu? Of course, sir. We have cheese sandwich, American coffee, fruit juice... Seriously? Are you going to tell him all the menu or what? Are you kidding? Why don't you take that menu and read it while other people order some food? Damn! Calm down, sir. I will suggest that to the gentleman. Listen... Excuse me, sir. If you want, you can take the menu and have a seat. In some minutes, I will go to your table and take your order after you have read the entire menu. Yes, I would

In [None]:
transcription

" Good morning, sir. Can you help me, please? It's the first time I come to this restaurant. Good morning, sir. Welcome to Johnny's Cafe. Here is the menu. What would you like to order? Well, can you tell me what's on the menu, please? I can't... Oh, can you hurry up, please? I have been waiting for 10 minutes. Hurry up! I'm sorry, sir. I know you have been waiting, but... I can't... Don't be sorry. Just tell the boy what you would like to order. That's all. Yes, sir. I'm sorry. Can you please tell me what's on the menu? Of course, sir. We have cheese sandwich, American coffee, fruit juice... Seriously? Are you going to tell him all the menu or what? Are you kidding? Why don't you take that menu and read it while other people order some food? Damn! Calm down, sir. I will suggest that to the gentleman. Listen... Excuse me, sir. If you want, you can take the menu and have a seat. In some minutes, I will go to your table and take your order after you have read the entire menu. Yes, I woul

## 5. Analyze Transcription

This section contains functions to generate various analyses from the transcription, including abstract summaries, key points, action items, sentiment analysis, and detailed summaries using HuggingFace models.


In [None]:
import torch
from transformers import BitsAndBytesConfig
from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
from huggingface_hub import login
from google.colab import userdata

# Log in to HuggingFace
HUGGING_FACE_TOKEN = userdata.get("HUGGING_FACE_TOKEN")
login(HUGGING_FACE_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# Configure model quantization for efficient inference
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Load the Mistral model
model_4bit = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.2", device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
# Initialize the text generation pipeline

pipeline_inst = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        temperature=0.1,
        max_new_tokens=612,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)

In [None]:
llm = HuggingFacePipeline(pipeline=pipeline_inst)

  warn_deprecated(


In [None]:
def abstract_summary_extraction(transcription):
    template = """You are a highly skilled AI trained in language comprehension and summarization. I would like you to
              read the following text and summarize it into a concise abstract paragraph. Aim to retain the most
              important points, providing a coherent and readable summary that could help a person understand the
              main points of the discussion without needing to read the entire text. Please avoid unnecessary
              details or tangential points. Only provide the summary as response
    -------------------
    Text: {transcription}
    -------------------
    """
    prompt = PromptTemplate(template=template, input_variables=["transcription"])

    llm_chain = LLMChain(prompt=prompt, llm=llm)
    response = llm_chain.run({"transcription":transcription})
    return response

In [None]:
def key_points_extraction(transcription):
    template = '''
        "You are a proficient AI with a specialty in distilling information into key points. "
        "Based on the following text, identify and list the main points that were discussed or brought up. "
        "These should be the most important ideas, findings, or topics that are crucial to the essence of the discussion. "
        "Your goal is to provide a list that someone could read to quickly understand what was talked about.\n\n"
        f"Text:\n{transcription}\n\nKey Points:"
    '''

    prompt = PromptTemplate(template=template, input_variables=["transcription"])

    llm_chain = LLMChain(prompt=prompt, llm=llm)
    response = llm_chain.run({"transcription":transcription})
    return response

In [None]:
def action_item_extraction(transcription):
    template = '''
        "You are an AI expert in analyzing conversations and extracting action items. "
        "Please review the text and identify any tasks, assignments, or actions that were agreed upon or mentioned as needing to be done. "
        "These could be tasks assigned to specific individuals, or general actions that the group has decided to take. "
        "Please list these action items clearly and concisely.\n\n"
        f"Text:\n{transcription}\n\nAction Items:"
    '''

    prompt = PromptTemplate(template=template, input_variables=["transcription"])

    llm_chain = LLMChain(prompt=prompt, llm=llm)
    response = llm_chain.run({"transcription":transcription})
    return response

In [None]:
def sentiment_analysis(transcription):
    template = '''
        "As an AI with expertise in language and emotion analysis, your task is to analyze the sentiment of the following text. "
        "Please consider the overall tone of the discussion, the emotion conveyed by the language used, and the context in which words and phrases are used. "
        "Indicate whether the sentiment is generally positive, negative, or neutral, and provide brief explanations for your analysis where possible.\n\n"
        f"Text:\n{transcription}\n\nSentiment Analysis:"
    '''

    prompt = PromptTemplate(template=template, input_variables=["transcription"])

    llm_chain = LLMChain(prompt=prompt, llm=llm)
    response = llm_chain.run({"transcription":transcription})
    return response

In [None]:
def detailed_summery(transcription):
    template = '''Summarize the following video transcription in detail. Ensure that you cover the following aspects:
            1. Provide an introduction that includes the title, main topic, and purpose of the video.
            2. Divide the transcription into sections based on topic changes, speakers, or segments, and summarize each section.
            3. Highlight all key points, arguments, data, statistics, examples, and anecdotes.
            4. Extract important quotes, definitions, step-by-step processes, and instructions.
            5. Note significant visual or audio elements such as slides, graphics, demonstrations, and changes in tone or emotion.
            6. List any action items, recommendations, or next steps given in the video.
            7. Conclude with the speaker’s closing remarks, calls to action, and information on additional resources or contacts.

            Transcription:
            [{transcription}]'''
    prompt = PromptTemplate(template=template, input_variables=["transcription"])

    llm_chain = LLMChain(prompt=prompt, llm=llm)
    response = llm_chain.run({"transcription":transcription})
    return response

In [None]:
# Generate different types of analysis from the transcription

abstract = abstract_summary_extraction(transcription)
key_points = key_points_extraction(transcription)
action_items = action_item_extraction(transcription)
sentiment = sentiment_analysis(transcription)
detailed_summery = detailed_summery(transcription)

  warn_deprecated(
  warn_deprecated(


In [None]:
# Display results
print("Abstract Summary:\n", abstract)
print("Key Points:\n", key_points)
print("Action Items:\n", action_items)
print("Sentiment Analysis:\n", sentiment)
print("Detailed Summary:\n", detailed_summery)

Abstract Summary:
 You are a highly skilled AI trained in language comprehension and summarization. I would like you to
              read the following text and summarize it into a concise abstract paragraph. Aim to retain the most
              important points, providing a coherent and readable summary that could help a person understand the
              main points of the discussion without needing to read the entire text. Please avoid unnecessary
              details or tangential points. Only provide the summary as response
    -------------------
    Text:  Good morning, sir. Can you help me, please? It's the first time I come to this restaurant. Good morning, sir. Welcome to Johnny's Cafe. Here is the menu. What would you like to order? Well, can you tell me what's on the menu, please? I can't... Oh, can you hurry up, please? I have been waiting for 10 minutes. Hurry up! I'm sorry, sir. I know you have been waiting, but... I can't... Don't be sorry. Just tell the boy what you

In [None]:
# YouTube Video Downloader and Transcript Analyzer

This repository contains a Jupyter Notebook that demonstrates how to download a YouTube video, extract its audio, transcribe the audio, and perform various analyses on the transcription using OpenAI Whisper and HuggingFace models. The analyses include generating abstract summaries, key points, action items, sentiment analysis, and detailed summaries.

## Project Motivation

I started this project to create a comprehensive tool for extracting and analyzing content from YouTube videos. With the increasing amount of valuable information shared on YouTube, there is a need for efficient ways to process and understand video content without watching the entire video. This tool can help by providing concise summaries, key points, action items, and sentiment analysis from the video transcripts.

## Use Cases

This tool can be helpful in various scenarios, including:

- **Education:** Summarize lectures and extract key points for easier studying.
- **Business Meetings:** Capture action items and key discussion points from meetings.
- **Content Creation:** Generate summaries and insights from interviews, podcasts, and webinars.
- **Research:** Analyze video content for research purposes, extracting valuable information efficiently.
- **Marketing:** Understand customer feedback from video reviews and testimonials.

## Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
