## Step 1: Install Required Libraries

Install the necessary packages including youtube-transcript-api for transcript extraction.

In [2]:
!pip install --upgrade requests pandas youtube-transcript-api

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## Step 2: Import Libraries and Configuration

Import necessary libraries and load API credentials from config file.

In [3]:
import requests
import pandas as pd
from youtube_transcript_api import YouTubeTranscriptApi
from config import API_KEY, CHANNEL_ID

## Step 3: Fetch Video Data

Use YouTube Data API to retrieve the latest 50 videos from the channel.

In [None]:
# Force reload config module
import importlib
import config
importlib.reload(config)
from config import API_KEY, CHANNEL_ID
print(f"Using Channel ID: {CHANNEL_ID}")

# First, get the channel's uploads playlist ID
channel_url = "https://www.googleapis.com/youtube/v3/channels"
channel_params = {
    "key": API_KEY,
    "id": CHANNEL_ID,
    "part": "contentDetails"
}

channel_response = requests.get(channel_url, params=channel_params).json()
uploads_playlist_id = channel_response["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]
print(f"Uploads Playlist ID: {uploads_playlist_id}")

# Now fetch videos from the uploads playlist (limit to 50 videos)
playlist_url = "https://www.googleapis.com/youtube/v3/playlistItems"
videos = []
next_page_token = None
page_num = 1
MAX_VIDEOS = 500

while True:
    playlist_params = {
        "key": API_KEY,
        "playlistId": uploads_playlist_id,
        "part": "snippet",
        "maxResults": 50
    }
    
    if next_page_token:
        playlist_params["pageToken"] = next_page_token
    
    response = requests.get(playlist_url, params=playlist_params).json()
    
    # Extract videos from this page
    for item in response.get("items", []):
        video_id = item["snippet"]["resourceId"]["videoId"]
        title = item["snippet"]["title"]
        published = item["snippet"]["publishedAt"]
        videos.append([video_id, title, published])
        
        # Stop if we've reached the limit
        if len(videos) >= MAX_VIDEOS:
            break
    
    print(f"Page {page_num}: Fetched {len(response.get('items', []))} videos (Total: {len(videos)})")
    
    # Check if we've reached the limit or there are no more pages
    if len(videos) >= MAX_VIDEOS:
        print(f"Reached video limit of {MAX_VIDEOS}")
        break
    
    next_page_token = response.get("nextPageToken")
    if not next_page_token:
        break
    
    page_num += 1

df = pd.DataFrame(videos, columns=["video_id", "title", "published_date"])
print(f"\n✅ Total videos fetched: {len(df)}")
df.head()

Using Channel ID: UC8butISFwT-Wl7EV0hUK0BQ
Uploads Playlist ID: UU8butISFwT-Wl7EV0hUK0BQ
Page 1: Fetched 50 videos (Total: 50)
Page 2: Fetched 50 videos (Total: 100)
Page 3: Fetched 50 videos (Total: 150)
Page 4: Fetched 50 videos (Total: 200)
Page 5: Fetched 50 videos (Total: 250)
Page 6: Fetched 50 videos (Total: 300)
Page 7: Fetched 50 videos (Total: 350)
Page 8: Fetched 50 videos (Total: 400)
Page 9: Fetched 50 videos (Total: 450)
Page 10: Fetched 50 videos (Total: 500)
Reached video limit of 500

✅ Total videos fetched: 500


Unnamed: 0,video_id,title,published_date
0,f9-Vf_WNvT0,How to check if a string only contains whitesp...,2025-12-19T13:18:33Z
1,dSIVVX0vAsY,Unity 2D Pixel Art Game Tutorial,2025-12-18T14:58:24Z
2,uNS75JWOmBI,Don't chase a path that looks good just becaus...,2025-12-18T12:52:39Z
3,vZHhmzT9TL8,How to use the .append() method in Python,2025-12-17T13:29:40Z
4,Q7P20fHJlm4,Intro to Supabase – Full Tutorial for Beginners,2025-12-16T14:43:47Z


## Step 4: Extract Transcripts

Fetch transcripts for each video. This will try to get transcripts and handle errors for videos without captions.

In [13]:
from youtube_transcript_api import YouTubeTranscriptApi
import time

def get_transcript(video_id):
    """Extract transcript for a video"""
    try:
        # Use fetch method (correct for version 1.2.3)
        api = YouTubeTranscriptApi()
        transcript_data = api.fetch(video_id, languages=['en'])
        # Handle the FetchedTranscriptSnippets object
        if hasattr(transcript_data, 'snippets'):
            full_transcript = " ".join([segment.text for segment in transcript_data.snippets])
        else:
            full_transcript = str(transcript_data)
        return full_transcript
    except Exception as e:
        return f"No transcript: {str(e)[:80]}"

# Extract transcripts for all videos with progress tracking
print("Extracting transcripts...")
print(f"Processing {len(df)} videos...")

transcripts = []
success_count = 0
for idx, video_id in enumerate(df['video_id'], 1):
    transcript = get_transcript(video_id)
    transcripts.append(transcript)
    
    if not transcript.startswith('No transcript'):
        success_count += 1
    
    # Show progress every 10 videos
    if idx % 10 == 0:
        print(f"Progress: {idx}/{len(df)} videos processed | {success_count} successful")
    
    # Add small delay to avoid rate limiting (1 second between requests to avoid IP ban)
    time.sleep(1)

df['transcript'] = transcripts
print(f"\n✅ Completed! Processed {len(df)} videos.")
print(f"✅ Successfully extracted {success_count} transcripts")

# Show sample
df.head()

Extracting transcripts...
Processing 500 videos...
Progress: 10/500 videos processed | 0 successful
Progress: 20/500 videos processed | 0 successful
Progress: 30/500 videos processed | 0 successful
Progress: 40/500 videos processed | 0 successful
Progress: 50/500 videos processed | 0 successful
Progress: 60/500 videos processed | 0 successful
Progress: 70/500 videos processed | 0 successful
Progress: 80/500 videos processed | 0 successful
Progress: 90/500 videos processed | 0 successful
Progress: 100/500 videos processed | 0 successful
Progress: 110/500 videos processed | 0 successful
Progress: 120/500 videos processed | 0 successful
Progress: 130/500 videos processed | 0 successful
Progress: 140/500 videos processed | 0 successful
Progress: 150/500 videos processed | 0 successful
Progress: 160/500 videos processed | 0 successful
Progress: 170/500 videos processed | 0 successful
Progress: 180/500 videos processed | 0 successful
Progress: 190/500 videos processed | 0 successful
Progress

KeyboardInterrupt: 

## Step 5: Check Transcript Status

See which videos have transcripts available and which don't.

In [14]:
# Test if IP ban is back
print("Testing IP ban status...")
test_video_id = df.iloc[0]['video_id']
print(f"Testing video: {test_video_id}")

try:
    api = YouTubeTranscriptApi()
    transcript = api.fetch(test_video_id, languages=['en'])
    print("✅ SUCCESS! IP is NOT banned")
    if hasattr(transcript, 'snippets'):
        print(f"Got {len(transcript.snippets)} transcript segments")
except Exception as e:
    error_msg = str(e)
    if "blocking requests from your IP" in error_msg or "IpBlocked" in str(type(e).__name__):
        print("❌ IP IS BANNED AGAIN!")
        print(f"Error: {error_msg[:200]}")
    else:
        print(f"Different error: {error_msg[:200]}")

Testing IP ban status...
Testing video: f9-Vf_WNvT0
❌ IP IS BANNED AGAIN!
Error: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=f9-Vf_WNvT0! This is most likely caused by:

YouTube is blocking requests from your IP. This usually is due to one of the


## Step 6: Save Results

Export the data with transcripts to a CSV file.

In [10]:
# Save all data
df.to_csv("youtube_metadata_with_transcripts.csv", index=False)
print("Saved to youtube_metadata_with_transcripts.csv")

# Also save only videos with successful transcripts
df_with_transcripts = df[df['has_transcript']]
df_with_transcripts.to_csv("youtube_videos_transcripts_only.csv", index=False)
print(f"Saved {len(df_with_transcripts)} videos with transcripts to youtube_videos_transcripts_only.csv")

Saved to youtube_metadata_with_transcripts.csv
Saved 37 videos with transcripts to youtube_videos_transcripts_only.csv


## Summary and Next Steps

### Current Issue: IP Blocked by YouTube ⚠️
YouTube is **STILL blocking** our IP address (as of Dec 18, 2025). The ban from yesterday has not lifted yet. This is preventing all transcript extractions.

**Error:** "YouTube is blocking requests from your IP"

### What We've Tried:
1. ✅ freeCodeCamp channel (UC8butISFwT-Wl7EV0hUK0BQ) - 1000 videos fetched
2. ✅ MIT OpenCourseWare channel (UCEBb1b_L6zDS3xTUrIALZOw) - 50 videos fetched
3. ✅ TED channel (UCAuUUnT6oDeKwE6v1NGQxug) - 50 videos fetched
4. ⚠️ All transcript extractions blocked due to IP ban (tested multiple times)
5. ✅ Fixed code to use correct `fetch()` method for version 1.2.3

### Solutions:
1. **Wait 1-2 hours** - YouTube's IP ban is usually temporary
2. **Use different network** - Try from different Wi-Fi/mobile hotspot
3. **Use cookies** - Extract YouTube cookies from browser and pass them to the API
4. **Use proxies** - Configure proxy servers (see youtube-transcript-api README)

### Code Status:
- Video fetching works perfectly ✅
- Transcript extraction code updated to use `get_transcript()` method ✅  
- Rate limiting set to 1 second per video ✅
- Progress tracking implemented ✅

Wait 1-2 hours and then rerun transcript extraction cell, or switch to a different network connection.
### Recommended Next Action: