# üìä F1 YouTube Data Extraction

## Notebook 01: Data Extraction from YouTube Data API v3

This notebook extracts videos and comments from the **official Formula 1 YouTube channel** for descriptive analytics.

### What we'll extract:
- **Videos**: Title, description, view count, likes, comments, upload date
- **Comments**: Text, author, likes, publish date, replies

### API Quota Note:
YouTube Data API has a daily quota of 10,000 units. Each search costs ~100 units, each video details ~1 unit.
Plan your extraction accordingly!

In [1]:
# Setup and imports
import sys
from pathlib import Path

# Add src directory to path
sys.path.insert(0, str(Path.cwd().parent))

import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Import our modules
from src import config
from src.youtube_extractor import YouTubeF1Extractor, load_existing_data

print("‚úÖ Imports successful!")
print(f"üìÅ Raw data directory: {config.RAW_DATA_DIR}")
print(f"üìÅ Processed data directory: {config.PROCESSED_DATA_DIR}")

  from pandas.core import (


‚úÖ Imports successful!
üìÅ Raw data directory: c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\raw
üìÅ Processed data directory: c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\processed


## 1. Initialize YouTube Extractor

First, let's create our extractor and verify API connectivity by fetching channel information.

In [2]:
# Initialize the YouTube extractor
extractor = YouTubeF1Extractor()

# Get and display channel information
channel_info = extractor.get_channel_info()

if channel_info:
    print("üèéÔ∏è Formula 1 YouTube Channel")
    print("=" * 50)
    print(f"üì∫ Channel: {channel_info.get('title', 'N/A')}")
    print(f"üë• Subscribers: {channel_info.get('subscriber_count', 0):,}")
    print(f"üé¨ Total Videos: {channel_info.get('video_count', 0):,}")
    print(f"üëÄ Total Views: {channel_info.get('view_count', 0):,}")
else:
    print("‚ùå Could not fetch channel info. Check your API key.")

üèéÔ∏è Formula 1 YouTube Channel
üì∫ Channel: FORMULA 1
üë• Subscribers: 13,900,000
üé¨ Total Videos: 9,521
üëÄ Total Views: 9,009,223,863


## 2. Extract Videos

Extract videos from the 2024 F1 season. Adjust `max_videos` based on your needs and API quota.

In [3]:
# Configuration for extraction
MAX_VIDEOS = 200  # Adjust based on API quota (each search = ~100 quota units)
START_DATE = "2024-01-01T00:00:00Z"  # 2024 F1 Season
END_DATE = "2024-12-31T23:59:59Z"

print(f"üìÖ Extracting videos from {START_DATE[:10]} to {END_DATE[:10]}")
print(f"üé¨ Maximum videos: {MAX_VIDEOS}")

üìÖ Extracting videos from 2024-01-01 to 2024-12-31
üé¨ Maximum videos: 200


In [4]:
# Extract videos from F1 channel
videos_df = extractor.get_videos(
    max_results=MAX_VIDEOS,
    published_after=START_DATE,
    published_before=END_DATE,
    save_to_csv=True
)

print(f"\n‚úÖ Extracted {len(videos_df)} videos")
videos_df.head()

Fetching videos from F1 channel...
  Date range: 2024-01-01 to 2024-12-31
  Max videos: 200
  Fetched 50 videos...
  Fetched 85 videos...
  Saved 85 videos to c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\raw\f1_youtube_videos.csv

‚úÖ Extracted 85 videos


Unnamed: 0,video_id,title,description,published_at,channel_id,channel_title,tags,category_id,duration,view_count,like_count,comment_count,favorite_count,thumbnail_url
0,qlDfY6Vp6bY,EVERY F1 Sprint Highlight of the 2024 F1 Season,"The F1 Sprint delivered once again in 2024, wi...",2024-12-26T14:00:49Z,UCB_qr75-ydFVKSF9Dmo6izg,FORMULA 1,F1|Formula One|Formula 1|Sports|Sport|Action|G...,17,PT37M14S,616095,5642,226,0,https://i.ytimg.com/vi/qlDfY6Vp6bY/hqdefault.jpg
1,L56zt85tv48,When F1 Does Secret Santa... üéÅ,"Lando Norris, Lance Stroll & Charles Leclerc r...",2024-12-24T14:00:23Z,UCB_qr75-ydFVKSF9Dmo6izg,FORMULA 1,F1|Formula One|Formula 1|Sports|Sport|Action|G...,17,PT1M51S,190312,14643,75,0,https://i.ytimg.com/vi/L56zt85tv48/hqdefault.jpg
2,3IDclJXQBAU,Lewis Hamilton Winning For Mercedes For One Ho...,As we mark the end of an incredible era for Le...,2024-12-24T14:00:02Z,UCB_qr75-ydFVKSF9Dmo6izg,FORMULA 1,F1|Formula One|Formula 1|Sports|Sport|Action|G...,17,PT1H,1039614,13583,586,0,https://i.ytimg.com/vi/3IDclJXQBAU/hqdefault.jpg
3,q_X37xjVCTs,The F1 Grid Does Secret Santa 2024!,It's F1 Secret Santa time again! Let's unwrap ...,2024-12-22T15:00:52Z,UCB_qr75-ydFVKSF9Dmo6izg,FORMULA 1,F1|Formula One|Formula 1|Sports|Sport|Action|G...,17,PT17M50S,3131787,137503,3664,0,https://i.ytimg.com/vi/q_X37xjVCTs/hqdefault.jpg
4,Ky4S7V3x8Sg,Yuki Tsunoda Got Nothing Right! üòÖ,"Yuki Tsunoda takes on Wrong Answers Only, incl...",2024-12-21T12:00:32Z,UCB_qr75-ydFVKSF9Dmo6izg,FORMULA 1,,17,PT1M32S,313706,23411,227,0,https://i.ytimg.com/vi/Ky4S7V3x8Sg/hqdefault.jpg


In [5]:
# Quick overview of video statistics
print("üìä Video Statistics Overview")
print("=" * 50)
print(f"Total Views: {videos_df['view_count'].sum():,}")
print(f"Total Likes: {videos_df['like_count'].sum():,}")
print(f"Total Comments: {videos_df['comment_count'].sum():,}")
print(f"\nAverage per video:")
print(f"  Views: {videos_df['view_count'].mean():,.0f}")
print(f"  Likes: {videos_df['like_count'].mean():,.0f}")
print(f"  Comments: {videos_df['comment_count'].mean():,.0f}")

üìä Video Statistics Overview
Total Views: 77,699,344
Total Likes: 2,119,049
Total Comments: 60,176

Average per video:
  Views: 914,110
  Likes: 24,930
  Comments: 708


## 3. Extract Comments

Extract comments for each video. This is the most quota-intensive operation.

In [6]:
# Extract comments for all videos
MAX_COMMENTS_PER_VIDEO = 100  # Top 100 most relevant comments per video

comments_df = extractor.get_all_comments(
    videos_df=videos_df,
    max_comments_per_video=MAX_COMMENTS_PER_VIDEO,
    save_to_csv=True
)

print(f"\n‚úÖ Extracted {len(comments_df)} comments from {len(videos_df)} videos")
comments_df.head()

Fetching comments for 85 videos...
  Processed 10/85 videos, 812 comments total...
  Processed 20/85 videos, 1763 comments total...
  Processed 30/85 videos, 2521 comments total...
  Processed 40/85 videos, 3485 comments total...
  Processed 50/85 videos, 4409 comments total...
  Processed 60/85 videos, 5244 comments total...
  Processed 70/85 videos, 6115 comments total...
  Processed 80/85 videos, 6865 comments total...
  Saved 7361 comments to c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\raw\f1_youtube_comments.csv

‚úÖ Extracted 7361 comments from 85 videos


Unnamed: 0,comment_id,video_id,author_display_name,author_channel_id,text_original,text_display,like_count,published_at,updated_at,reply_count
0,UgyYBK8ojubh6swa8iV4AaABAg,qlDfY6Vp6bY,@albayrakcan,UCeogecQkVZ770cr31q8PPag,20 seconds into the video and lando already fu...,20 seconds into the video and lando already fu...,315,2024-12-26T15:13:43Z,2024-12-26T15:13:43Z,0
1,UgxYiH72tXX7lIPGNyR4AaABAg,qlDfY6Vp6bY,@jmestrada8942,UCNMlG4yDmyENsaPfNUuu5zw,15:08 the sight of Max being chased by both Mc...,"<a href=""https://www.youtube.com/watch?v=qlDfY...",123,2024-12-26T15:57:38Z,2024-12-26T15:57:38Z,0
2,UgyBme1iyneT865TASd4AaABAg,qlDfY6Vp6bY,@keirangeorge9046,UCodpnrI4jbT-iAYHxoFzm7g,Love seeing the old boys leading for a bit in ...,Love seeing the old boys leading for a bit in ...,57,2024-12-26T17:22:32Z,2024-12-26T17:22:32Z,2
3,UgxMAgnQ6sil1SG8osl4AaABAg,qlDfY6Vp6bY,@ananyar908,UCeQuQ_ME40NGo7hOMbGDxzA,love how norris returned piastri's favour üß°,love how norris returned piastri&#39;s favour üß°,13,2024-12-28T18:20:48Z,2024-12-28T18:20:48Z,0
4,Ugw8XNARIGUYD2dkJOx4AaABAg,qlDfY6Vp6bY,@emmanewman9271,UCrPWB1Hg-6f97mkFxSm_Tew,0:00 China \n6:51 Miami\n12:58 Austin\n18:49 U...,"<a href=""https://www.youtube.com/watch?v=qlDfY...",67,2024-12-26T14:50:48Z,2024-12-26T16:21:07Z,2


## 4. Data Summary

Quick look at the extracted data before moving to cleaning.

In [7]:
# Summary of extracted data
print("üìä EXTRACTION SUMMARY")
print("=" * 60)
print(f"\nüé¨ VIDEOS ({len(videos_df)} total)")
print(f"   Columns: {list(videos_df.columns)}")
print(f"   Date range: {videos_df['published_at'].min()[:10]} to {videos_df['published_at'].max()[:10]}")

print(f"\nüí¨ COMMENTS ({len(comments_df)} total)")
print(f"   Columns: {list(comments_df.columns)}")
print(f"   Avg comments per video: {len(comments_df)/len(videos_df):.1f}")

print("\nüìÅ Data saved to:")
print(f"   {config.VIDEOS_CSV}")
print(f"   {config.COMMENTS_CSV}")

üìä EXTRACTION SUMMARY

üé¨ VIDEOS (85 total)
   Columns: ['video_id', 'title', 'description', 'published_at', 'channel_id', 'channel_title', 'tags', 'category_id', 'duration', 'view_count', 'like_count', 'comment_count', 'favorite_count', 'thumbnail_url']
   Date range: 2024-01-07 to 2024-12-26

üí¨ COMMENTS (7361 total)
   Columns: ['comment_id', 'video_id', 'author_display_name', 'author_channel_id', 'text_original', 'text_display', 'like_count', 'published_at', 'updated_at', 'reply_count']
   Avg comments per video: 86.6

üìÅ Data saved to:
   c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\raw\f1_youtube_videos.csv
   c:\Users\ahmed\Downloads\dba-youtube-project\Descriptive\data\raw\f1_youtube_comments.csv


## 5. Load Existing Data (Alternative)

If you've already extracted data, you can load it directly:

In [8]:
# Load existing data (uncomment to use)
# videos_df, comments_df = load_existing_data()

print("‚úÖ Extraction notebook complete!")
print("‚û°Ô∏è Next: Run 02_cleaning.ipynb for data cleaning")

‚úÖ Extraction notebook complete!
‚û°Ô∏è Next: Run 02_cleaning.ipynb for data cleaning
