# Youtube Video Summarization

## My First Frontier LLM Project!

Welcome to my first LLM-based project! The goal of this project is to leverage large language models (LLMs) to summarize YouTube videos. Currently, it only supports English transcriptions, so instead of watching the entire video, you can simply read the summary!

## Important Note
Be mindful when testing with longer videos, as they may consume significant resources and could lead to high costs on your ChatGPT bill.
You can switch to Ollama for free usage if you're looking to reduce costs.


In [1]:
!pip install youtube-transcript-api openai



In [2]:
# imports

import os

import requests
from dotenv import load_dotenv
from IPython.display import Markdown, display

from openai import OpenAI
from youtube_transcript_api import YouTubeTranscriptApi
import re

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [3]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [4]:
openai = OpenAI()

In [5]:
class YoutubeVideoID:
    def __init__(self, url):
        self.url = url
        self.video_id = self.extract_video_id(url)

    def extract_video_id(self, url):
        """
        Extracts the YouTube video ID from a given URL.
        Supports both regular and shortened URLs.
        """
        # Regular expression to match YouTube video URL and extract the video ID
        regex = r"(?:https?:\/\/)?(?:www\.)?(?:youtube\.com\/(?:[^\/\n\s]+\/\S+\/|\S*\?v=)|(?:youtu\.be\/))([a-zA-Z0-9_-]{11})"
        match = re.match(regex, url)
        
        if match:
            return match.group(1)
        else:
            raise ValueError("Invalid YouTube URL")

    def __str__(self):
        return f"Video ID: {self.video_id}"

In [6]:
# Example usage
video_url = "https://www.youtube.com/watch?v=kqaMIFEz15s"

yt_video = YoutubeVideoID(video_url)
print(yt_video)

Video ID: kqaMIFEz15s


In [7]:
def get_transcript(video_id, language='en'):
    try:
        # Try to get the transcript in the desired language (Indonesian by default)
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=[language])
        # Join all the 'text' fields into a single string
        return " ".join([item['text'] for item in transcript])
    except Exception as e:
        print(f"Error fetching transcript: {e}")
        return None


In [8]:
# Fetch transcript using the video ID
transcript_text = get_transcript(yt_video.video_id)
print(len(transcript_text))

16073


In [9]:
# Function to summarize text using ChatGPT
def summarize_text(text):
    try:
        system_prompts = """
        You are a helpful assistant who provides concise and accurate summaries of text. Your task is to:
        
        - Capture the key points of the content.
        - Keep the summary brief and easy to understand.
        - Avoid summarizing overly lengthy texts or breaking them into excessively short summaries.
        - Use bullet points where appropriate to enhance clarity and structure.
        """
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompts},
                {"role": "user", "content": f"Summarize the following text:\n{text}"}
            ],
            max_tokens=200
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error summarizing text: {e}")
        return None

In [10]:
def split_text(text, chunk_size=3000):
    """
    Splits large text into smaller chunks based on the given chunk size.
    Ensures that chunks end with a full stop where possible to maintain sentence integrity.
    
    :param text: str, the text to be split
    :param chunk_size: int, maximum size of each chunk (default 3000 characters)
    :return: list of str, where each str is a chunk of text
    """
    chunks = []
    while len(text) > chunk_size:
        # Find the last full stop within or at the chunk size
        split_point = text.rfind('.', 0, chunk_size + 1)  # +1 to include the period itself if it's at chunk_size
        if split_point == -1:  # No period found within the chunk size
            split_point = chunk_size
        
        # Append the chunk, ensuring we don't strip spaces that might be part of the sentence structure
        chunks.append(text[:split_point + 1] if split_point != chunk_size else text[:chunk_size])
        text = text[split_point + 1:] if split_point != chunk_size else text[chunk_size:]
    
    # Add the remaining text as the final chunk, only strip if there's content
    if text:
        chunks.append(text.strip())
    
    return chunks

transcript_chunks = split_text(transcript_text)

# Now you can summarize each chunk individually
summaries = []
for chunk in transcript_chunks:
    summary = summarize_text(chunk)
    summaries.append(summary)


# Combine the individual summaries into one
full_summary = " ".join(summaries)
display(Markdown(full_summary))


- Two years ago, the narrator discussed cybersecurity trends for 2023 and 2024, and is now predicting for 2025 and beyond.
- A review of last year's predictions shows substantial developments:
  - **Passkeys Adoption**: The transition from passwords to passkeys (FIDO technology) has seen significant growth, with 4.2 million passkeys saved and one in three users utilizing them.
  - **AI Phishing**: Generative AI has improved phishing attempts, creating highly personalized and legitimate-looking emails, reducing errors that were once common in such scams.
  - **Deepfakes**: A significant incident occurred where a deepfake impersonated a CFO in a video call, resulting in a $25 million fraud. This was noted as a concerning trend during events like the 2024 presidential election. 

Overall, the narrator's previous predictions about the advancement of cybersecurity threats and technologies appear to have largely materialized. - **Deepfake Incident**: A deepfake robocall featuring Joe Biden misled voters about the necessity of participating in the primary election, suggesting they could save their votes for the general election.
  
- **Issues with Generative AI**: 
  - Generative AI sometimes generates false information (referred to as "hallucinations").
  - An example involved converting a friend's running pace from kilometers to miles, where an AI chatbot incorrectly reported a world record time before finally correcting it to a more realistic pace.

- **Concerns About AI Security**:
  - There is a growing need for cybersecurity measures to protect AI deployments, which has become a primary concern for companies.
  - The speaker reports that securing AI is the top question from clients regarding their AI strategies.

- **AI in Cybersecurity**: 
  - Predictive uses for AI in cybersecurity include developing advanced chatbots to assist analysts with accurate, fact-based responses using retrieval-augmented generation (RAG) technology The text discusses the future implications of technology and AI on cybersecurity, highlighting both positive and negative trends. Key points include:

- **Advancements in Technology**: Trends in technology, particularly generative AI, are emerging in the market, aiding case management and incident tracking.
- **Generative AI Benefits**: Generative AI can effectively summarize incidents, assisting in the transition of cases between individuals.
- **Looking Ahead**: While past trends in technology may evolve, AI will play a significant role in shaping the future, with both advantages and disadvantages.
- **Concerns About Shadow AI**: Unauthorized AI deployments (referred to as "shadow AI") pose risks, such as data leakage and operational issues within organizations.
- **Deepfake Risks**: The growing sophistication of deepfake technology presents dangers to businesses and governments, exemplified by significant financial fraud cases.

Overall, the piece emphasizes the dual nature of technological advances, particularly in AI, highlighting both their potential benefits and the serious risks they may - **Reliability of Sources**: Concerns are raised about the authenticity of information, particularly how deepfakes can mislead people, influencing beliefs.
  
- **Legal Implications**: The text discusses the challenges deepfakes pose in legal contexts, such as evidence in court. This could create reasonable doubt about the authenticity of evidence (e.g., whether a video shows the actual perpetrator of a crime).

- **Cybersecurity Threats**: Generative AI can be exploited to write malware, making cyber attacks easier for those without technical skills. A study indicated that generative AI chatbots could generate exploit code successfully a high percentage of the time.

- **Increase in Attacks**: A major online retailer reported a significant increase in cyber attacks, attributed to the misuse of generative AI technologies.

- **Expanded Attack Surface**: Each new technology, including AI, increases the points of entry for cyber attacks. This includes risks associated with "shadow AI" and the potential for data - **Prompt Injection Concerns**: 
  - Generative AI is vulnerable to prompt injection attacks, where it is manipulated to perform unintended actions.
  - This presents a significant security risk, as mentioned by the OWASP organization as the top attack type against large language models.
  - There is currently a need for improved defenses against these attacks.

- **Positive Use of AI in Cybersecurity**:
  - Despite the risks, AI can be leveraged to enhance cybersecurity measures.
  - Generative AI can assist in threat analysis and provide expert advice on potential responses to security breaches, rather than automating responses directly.
  - This approach allows cybersecurity professionals to evaluate and choose between suggested actions.

- **Quantum Computing Threats**:
  - Quantum computers pose a future risk to cryptography, as they could potentially break current encryption methods.
  - While quantum technology has beneficial applications, its ability to compromise encrypted messages is a significant concern that needs to be addressed, with uncertainty regarding the timeline of when - Emphasis on the urgency of transitioning to quantum-safe or post-quantum cryptographic algorithms to protect data from potential quantum computer attacks.
- Concern over the “harvest now, decrypt later” tactic, where attackers can store data now and decrypt it later when quantum computers become powerful enough.
- Highlighting the risks for sensitive information, especially related to nation-states that may remain classified for long periods.
- Anticipation of organizations beginning to adopt quantum-safe cryptography projects.
- Mention of available resources for deeper insights on the IBM Technology Channel related to these topics.
- Invitation for readers to share their own predictions in the comments for community knowledge exchange.