This project leverages Gemini AI's capabilities to analyze and summarize entire YouTube channels, providing valuable insights for creators and the platform itself. The substantial data volume (potentially million tokens) necessitates focusing on video content. Analyzing numerous videos from a single channel, through automated transcription and summarization, enables comprehensive trend identification and insights for content optimization.

Concept:

This project aims to create a tool that, given a YouTube channel URL and a user-defined number of videos, will automatically download video audio, transcribe it, and consolidate the data. This aggregated text will then be processed by Gemini AI to extract meaningful summaries, identifying similarities, variations, improvements, and learning points within the channel's content. This analysis will allow creators to understand their audience, optimize content strategies, and potentially identify emerging trends.

Technical Approach:

The algorithm will incorporate:

Automated Video Audio Download: Efficiently downloading audio from a given YouTube channel's videos.

Automated Transcription: Accurate transcription of downloaded audio into text.

Data Aggregation: Consolidating transcribed text into a comprehensive data set for Gemini AI analysis.

Gemini AI Analysis: Employing Gemini AI's capabilities for summarization, trend identification, and extraction of key insights.

Insight Presentation: Presenting analyzed data in a user-friendly format, potentially highlighting key themes, top-performing videos, and audience engagement patterns.

Potential Applications and Benefits:

Creator Insights: Assist YouTube channel owners in understanding their audience and improving content quality. Identify popular topics, trends, and areas for potential improvement.

YouTube Platform Enhancement: Provide data-driven insights to YouTube for algorithm enhancement, content trend identification, and improved content recommendation.


/kaggle/input/image-representation/Screenshot 2024-10-26 at 7.28.44AM.png


In [2]:
from IPython.display import HTML

# Embed the YouTube video
HTML("""
<iframe width="560" height="315" src="https://www.youtube.com/embed/F2PV2qsE3w4" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
""")


<iframe width="560" height="315" src="https://www.youtube.com/embed/F2PV2qsE3w4" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>


In [1]:
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GENAI_API_KEY")


**Please make sure to empty the file directory to avoid unnecessary confusion between the files**

In [5]:
import os
import shutil

# Delete all files and folders under /kaggle/working to start fresh
working_directory = "/kaggle/working"
for filename in os.listdir(working_directory):
    file_path = os.path.join(working_directory, filename)
    try:
        if os.path.isfile(file_path) or os.path.islink(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
    except Exception as e:
        print(f'Failed to delete {file_path}. Reason: {e}')

In [6]:
!pip install yt-dlp openai-whisper
!apt-get install ffmpeg


Collecting yt-dlp
  Downloading yt_dlp-2024.10.22-py3-none-any.whl.metadata (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.6/171.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai-whisper
  Downloading openai-whisper-20240930.tar.gz (800 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m800.5/800.5 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting websockets>=13.0 (from yt-dlp)
  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.


**How to Download and Analyze YouTube Videos**

You can download YouTube videos either as a **single video** or **multiple videos** from a channel for further analysis. Follow these steps to get started:

### **1. Purpose**
We are using this tool to **download videos from YouTube** and then **analyze** their content. This is useful for research, content summarization, or insights extraction.

### **2. Options Available**
You have two main options:

- **Download a Single Video**
- **Download Multiple Videos from a Channel**

### **3. Download a Single Video**
To download just one video:

1. Select the **"Specific"** option.
2. Provide the **URL** of the video.
   - Example: [https://www.youtube.com/watch?v=az210VxLulE](https://www.youtube.com/watch?v=az210VxLulE)

This option will allow you to download and analyze a specific video of your choice.

### **4. Download Multiple Videos**
If you want to download multiple videos from a YouTube channel:

1. Select the **"Multiple"** option.
2. Provide the **URL of the YouTube Channel's Videos section**.
   - Example: https://www.youtube.com/@Google/videos
3. Enter the **number of videos** you want to download.

This option will collect and download the specified number of videos for analysis.

**Note**: Be cautious when selecting longer videos, as processing them may take significant time and resources. During testing, consider upgrading to **Colab Pro** for better performance and faster processing.

### **5. Start Download**
Once you've selected your option and provided the necessary information, click the **"Download"** button to begin downloading the video(s).

Following these steps will help you successfully download and analyze YouTube content as per your needs.





In [64]:
#New approach as the old one is not giving the input options.

import yt_dlp
import os

# Function to download specific video or multiple videos based on input
def download_videos():
    # Prompt user to choose download type
    choice = input("Download Type (Enter 'specific' for a single video or 'multiple' for a playlist/channel): ").strip().lower()
    
    # Validate choice and gather additional inputs based on the choice
    if choice == 'specific':
        specific_video_url = input("Enter YouTube video URL: ").strip()
        if specific_video_url:
            download_specific_video(specific_video_url)
        else:
            print("No URL provided. Exiting.")
    elif choice == 'multiple':
        youtube_url = input("Enter YouTube playlist or channel URL: ").strip()
        try:
            num_videos = int(input("Enter the number of videos to download: ").strip())
        except ValueError:
            print("Invalid number entered. Exiting.")
            return
        if youtube_url and num_videos > 0:
            download_multiple_videos(youtube_url, num_videos)
        else:
            print("Invalid URL or number of videos. Exiting.")
    else:
        print("Invalid choice. Please enter 'specific' or 'multiple'.")

# Helper function to download a specific video
def download_specific_video(url):
    download_folder = "./downloads"
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)
        
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': os.path.join(download_folder, '%(title)s.%(ext)s'),
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }]
    }
    
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])
    print(f"Downloaded: {url}")

# Helper function to download multiple videos from a playlist or channel
def download_multiple_videos(url, num_videos):
    download_folder = "./downloads"
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)
        
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': os.path.join(download_folder, '%(title)s.%(ext)s'),
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'playlistend': num_videos
    }
    
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])
    print(f"Downloaded {num_videos} videos from: {url}")

# Run the main function to prompt for inputs and start downloading
download_videos()


Download Type (Enter 'specific' for a single video or 'multiple' for a playlist/channel):  specific
Enter YouTube video URL:  https://www.youtube.com/watch?v=az210VxLulE


[youtube] Extracting URL: https://www.youtube.com/watch?v=az210VxLulE
[youtube] az210VxLulE: Downloading webpage
[youtube] az210VxLulE: Downloading ios player API JSON
[youtube] az210VxLulE: Downloading mweb player API JSON
[youtube] az210VxLulE: Downloading m3u8 information
[info] az210VxLulE: Downloading 1 format(s): 251-7
[download] Destination: ./downloads/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.webm
[download] 100% of    4.96MiB in 00:00:00 at 14.43MiB/s  
[ExtractAudio] Destination: ./downloads/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.mp3
Deleting original file ./downloads/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.webm (pass -k to keep)
Downloaded: https://www.youtube.com/watch?v=az210VxLulE


In [65]:
'''# Kaggle Notebook Code to Download YouTube Videos with Conditional Input Widgets

import yt_dlp
import os
import ipywidgets as widgets
from IPython.display import display

# Create widgets for user input
choice_widget = widgets.Dropdown(
    options=['specific', 'multiple'],
    value='specific',
    description='Download Type:',
    disabled=False,
)

specific_video_url_widget = widgets.Text(
    value='',
    placeholder='Enter YouTube video URL here',
    description='Video URL:',
    disabled=False,
)

youtube_url_widget = widgets.Text(
    value='',
    placeholder='Enter YouTube playlist/channel URL here',
    description='Playlist URL:',
    disabled=True,
)

num_videos_widget = widgets.IntText(
    value=5,
    description='Number of Videos:',
    disabled=True,
)

# Function to update widget visibility based on choice
def update_widgets(change):
    if choice_widget.value == 'specific':
        specific_video_url_widget.disabled = False
        youtube_url_widget.disabled = True
        num_videos_widget.disabled = True
    elif choice_widget.value == 'multiple':
        specific_video_url_widget.disabled = True
        youtube_url_widget.disabled = False
        num_videos_widget.disabled = False

# Attach the update function to the choice widget
choice_widget.observe(update_widgets, names='value')

# Initial call to set widget states correctly
update_widgets(None)

# Display widgets
display(choice_widget, specific_video_url_widget, youtube_url_widget, num_videos_widget)

# Button to trigger download
download_button = widgets.Button(
    description="Download Videos",
    button_style='success',
    icon='download'
)

output = widgets.Output()

# Function to handle button click
def on_download_button_click(b):
    with output:
        output.clear_output()  # Clear previous outputs
        download_folder = "./downloads"
        if not os.path.exists(download_folder):
            os.makedirs(download_folder)
        
        choice = choice_widget.value
        specific_video_url = specific_video_url_widget.value
        youtube_url = youtube_url_widget.value
        num_videos = num_videos_widget.value
        
        if choice == 'specific' and specific_video_url:
            ydl_opts = {
                'format': 'bestaudio/best',
                'outtmpl': os.path.join(download_folder, '%(title)s.%(ext)s'),
                'postprocessors': [{
                    'key': 'FFmpegExtractAudio',
                    'preferredcodec': 'mp3',
                    'preferredquality': '192',
                }]
            }
            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
                ydl.download([specific_video_url])
            print(f"Downloaded: {specific_video_url}")
        elif choice == 'multiple' and youtube_url:
            ydl_opts = {
                'format': 'bestaudio/best',
                'outtmpl': os.path.join(download_folder, '%(title)s.%(ext)s'),
                'postprocessors': [{
                    'key': 'FFmpegExtractAudio',
                    'preferredcodec': 'mp3',
                    'preferredquality': '192',
                }],
                'playlistend': num_videos
            }
            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
                ydl.download([youtube_url])
            print(f"Downloaded {num_videos} videos from: {youtube_url}")
        else:
            print("Invalid choice or missing URL. Please set 'specific' or 'multiple' and provide the appropriate URL.")

# Link the button click to the function
download_button.on_click(on_download_button_click)

# Display the button and output area
display(download_button, output)'''


'# Kaggle Notebook Code to Download YouTube Videos with Conditional Input Widgets\n\nimport yt_dlp\nimport os\nimport ipywidgets as widgets\nfrom IPython.display import display\n\n# Create widgets for user input\nchoice_widget = widgets.Dropdown(\n    options=[\'specific\', \'multiple\'],\n    value=\'specific\',\n    description=\'Download Type:\',\n    disabled=False,\n)\n\nspecific_video_url_widget = widgets.Text(\n    value=\'\',\n    placeholder=\'Enter YouTube video URL here\',\n    description=\'Video URL:\',\n    disabled=False,\n)\n\nyoutube_url_widget = widgets.Text(\n    value=\'\',\n    placeholder=\'Enter YouTube playlist/channel URL here\',\n    description=\'Playlist URL:\',\n    disabled=True,\n)\n\nnum_videos_widget = widgets.IntText(\n    value=5,\n    description=\'Number of Videos:\',\n    disabled=True,\n)\n\n# Function to update widget visibility based on choice\ndef update_widgets(change):\n    if choice_widget.value == \'specific\':\n        specific_video_url_w

In [71]:
import warnings

warnings.filterwarnings("ignore", category=FutureWarning, message=".*torch.load.*weights_only.*")


In [72]:
import os
import whisper
from glob import glob

# Kaggle-specific download directory
download_folder = "./downloads"
if not os.path.exists(download_folder):
    os.makedirs(download_folder)

# Load the Whisper model
model = whisper.load_model("base")

# Directory to store transcriptions
transcription_folder = "./transcriptions"
if not os.path.exists(transcription_folder):
    os.makedirs(transcription_folder)

# Iterate over all downloaded audio files and transcribe them
audio_files = glob(os.path.join(download_folder, "*.mp3"))
for audio_file in audio_files:
    print(f"Transcribing {audio_file}...")
    result = model.transcribe(audio_file)

    # Save the transcription to a text file
    output_file = os.path.join(transcription_folder, os.path.basename(audio_file).replace(".mp3", ".txt"))
    with open(output_file, "w") as f:
        f.write(result["text"])

    print(f"Transcription saved to {output_file}")
    
print("CELL EXECUTION IS COMPLETED")

Transcribing ./downloads/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.mp3...




Transcription saved to ./transcriptions/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.txt
CELL EXECUTION IS COMPLETED


In [73]:
import os
import whisper
from glob import glob

# Kaggle-specific download directory
download_folder = "./downloads"
if not os.path.exists(download_folder):
    os.makedirs(download_folder)

# Load the Whisper model
model = whisper.load_model("base")

# Directory to store transcriptions
transcription_folder = "./transcriptions"
if not os.path.exists(transcription_folder):
    os.makedirs(transcription_folder)

# Iterate over all downloaded audio files and transcribe them
audio_files = glob(os.path.join(download_folder, "*.mp3"))
for audio_file in audio_files:
    print(f"Transcribing {audio_file}...")
    result = model.transcribe(audio_file)

    # Save the transcription to a text file
    output_file = os.path.join(transcription_folder, os.path.basename(audio_file).replace(".mp3", ".txt"))
    with open(output_file, "w") as f:
        f.write(result["text"])

    print(f"Transcription saved to {output_file}")

print("CELL EXECUTION IS COMPLETED")


Transcribing ./downloads/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.mp3...




Transcription saved to ./transcriptions/The Smallest, Most Teeny Tiny, Miniature Museum Tour by @TheSquaretoSpare.txt
CELL EXECUTION IS COMPLETED


In [74]:
import os
from glob import glob

# Define the folder containing transcriptions and the output file
transcription_folder = "./transcriptions"
merged_output_file = "./merged_transcription.txt"

# Get all text files in the transcription folder
transcription_files = glob(os.path.join(transcription_folder, "*.txt"))

# Combine all transcription files into one
with open(merged_output_file, "w") as outfile:
    for transcription_file in transcription_files:
        with open(transcription_file, "r") as infile:
            outfile.write(infile.read() + "\n\n")  # Add line breaks between files

print(f"All transcriptions merged into {merged_output_file}")


All transcriptions merged into ./merged_transcription.txt


***> Once you reach to this point, you can try different questions and know more about the videos we just downloaded*******

In [78]:
# API Key using Kaggle Secrets for Kaggle
import google.generativeai as genai
import tiktoken
import os

# Get the API key from Kaggle secrets
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("GENAI_API_KEY")
if not api_key:
    raise ValueError("API key not found. Please set the GENAI_API_KEY in Kaggle Secrets.")

# Configure the generative AI library with the API key
genai.configure(api_key=api_key)

# Load the merged transcription file
merged_file_path = "./merged_transcription.txt"
with open(merged_file_path, "r") as file:
    transcript_text = file.read()

# Calculate the number of tokens in the transcript
token_length = len(transcript_text.split())
print(f"Token length: {token_length}")

# Use the generative model to summarize the content
model = genai.GenerativeModel(model_name='gemini-1.5-flash-latest')
response = model.generate_content(f'Whats this about?:\n{transcript_text}')

# Print the summary
print("Summary:")
print(response.text)


Token length: 482
Summary:
This is a script for a YouTube video about the world of miniature creations. The speaker, Khan, is introducing a virtual "Museum of Minatures" showcasing different genres of miniature art.

Here's a breakdown of the video's content:

**Introduction:**

* Khan introduces the concept of miniaturization and his channel, "The Squares Is Bare," which focuses on miniature creations.
* He sets up the video as a tour of a virtual "Museum of Minatures" on YouTube.

**Gallery of Food:**

* Highlights the popularity of miniature food within the miniature community.
* Mentions the skill and artistry involved in creating tiny food.
* Explains the appeal of miniature food, including its cuteness and connection to cooking shows and Japanese kawaii aesthetics.

**Gallery of Figurines:**

* Focuses on miniature figurines, noting the intricate detail work required.
* Mentions the availability of tutorials for creating figurines, covering techniques like sculpting, assembly, an