# YouTube Data API v3 - Get Video Data

This notebook demonstrates how to use the YouTube Data API v3 to fetch video information including:
- Video title
- Description
- Channel information
- Statistics (views, likes, comments)
- Video metadata

## Prerequisites
1. Get a YouTube API key from [Google Cloud Console](https://console.cloud.google.com/)
2. Install required packages: `google-api-python-client`

## 1. Install Required Packages

In [None]:
# Install the required packages
!pip install google-api-python-client python-dotenv

## 2. Import Required Libraries

In [1]:
import os
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from dotenv import load_dotenv
import json
from datetime import datetime

# Load environment variables
load_dotenv()

print("✅ Libraries imported successfully")

✅ Libraries imported successfully


## 3. Set Up API Key

Add your YouTube API key to a `.env` file in the backend directory:
```
YOUTUBE_API_KEY=your_api_key_here
```

In [2]:
# Get API key from environment variable
YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY')

if not YOUTUBE_API_KEY:
    print("⚠️ WARNING: YOUTUBE_API_KEY not found in environment variables")
    print("Please add YOUTUBE_API_KEY to your .env file")
    # For testing, you can set it directly here (not recommended for production)
    # YOUTUBE_API_KEY = "your_api_key_here"
else:
    print("✅ API key loaded successfully")
    print(f"API Key: {YOUTUBE_API_KEY[:10]}...")

✅ API key loaded successfully
API Key: AIzaSyBr5M...


## 4. Initialize YouTube API Client

In [3]:
# Build the YouTube API client
youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)
print("✅ YouTube API client initialized")

✅ YouTube API client initialized


## 5. Helper Function to Extract Video ID from URL

In [4]:
def extract_video_id(url_or_id):
    """
    Extract video ID from YouTube URL or return the ID if already provided.
    
    Supports formats:
    - https://www.youtube.com/watch?v=VIDEO_ID
    - https://youtu.be/VIDEO_ID
    - VIDEO_ID (direct ID)
    """
    if 'youtube.com/watch?v=' in url_or_id:
        return url_or_id.split('watch?v=')[1].split('&')[0]
    elif 'youtu.be/' in url_or_id:
        return url_or_id.split('youtu.be/')[1].split('?')[0]
    else:
        # Assume it's already a video ID
        return url_or_id

# Test the function
test_urls = [
    'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
    'https://youtu.be/dQw4w9WgXcQ',
    'dQw4w9WgXcQ'
]

print("Testing video ID extraction:")
for url in test_urls:
    video_id = extract_video_id(url)
    print(f"  {url} → {video_id}")

Testing video ID extraction:
  https://www.youtube.com/watch?v=dQw4w9WgXcQ → dQw4w9WgXcQ
  https://youtu.be/dQw4w9WgXcQ → dQw4w9WgXcQ
  dQw4w9WgXcQ → dQw4w9WgXcQ


## 6. Function to Get Video Data

In [5]:
def get_video_data(video_id):
    """
    Fetch comprehensive data for a YouTube video.
    
    Args:
        video_id (str): YouTube video ID
        
    Returns:
        dict: Video data including title, description, statistics, etc.
    """
    try:
        # Request video data
        request = youtube.videos().list(
            part='snippet,contentDetails,statistics,status',
            id=video_id
        )
        response = request.execute()
        
        if not response['items']:
            return {'error': 'Video not found'}
        
        return response
        
        video = response['items'][0]
        
        # Extract relevant information
        video_data = {
            'video_id': video_id,
            'title': video['snippet']['title'],
            'description': video['snippet']['description'],
            'channel_title': video['snippet']['channelTitle'],
            'channel_id': video['snippet']['channelId'],
            'published_at': video['snippet']['publishedAt'],
            'thumbnail_url': video['snippet']['thumbnails']['high']['url'],
            'duration': video['contentDetails']['duration'],
            'view_count': video['statistics'].get('viewCount', 0),
            'like_count': video['statistics'].get('likeCount', 0),
            'comment_count': video['statistics'].get('commentCount', 0),
            'tags': video['snippet'].get('tags', []),
            'category_id': video['snippet']['categoryId'],
            'default_language': video['snippet'].get('defaultLanguage', 'N/A'),
        }
        
        return video_data
        
    except HttpError as e:
        return {'error': f'HTTP Error: {e.resp.status} - {e.content}'}
    except Exception as e:
        return {'error': f'Error: {str(e)}'}

print("✅ get_video_data() function defined")

✅ get_video_data() function defined


## 7. Get Data for a Specific Video

Replace the URL below with any YouTube video URL you want to analyze.

In [6]:
# Example: Get data for a YouTube video
# Replace this with your desired video URL
video_url = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

# Extract video ID
video_id = extract_video_id(video_url)
print(f"🎬 Video ID: {video_id}\n")

# Fetch video data
video_data = get_video_data(video_id)

# # Check for errors
# if 'error' in video_data:
#     print(f"❌ Error: {video_data['error']}")
# else:
#     print("✅ Video data retrieved successfully!\n")
#     print(f"📺 Title: {video_data['title']}")
#     print(f"👤 Channel: {video_data['channel_title']}")
#     print(f"📅 Published: {video_data['published_at']}")
#     print(f"⏱️ Duration: {video_data['duration']}")
#     print(f"👁️ Views: {video_data['view_count']:,}")
#     print(f"👍 Likes: {video_data['like_count']:,}")
#     print(f"💬 Comments: {video_data['comment_count']:,}")

video_data

🎬 Video ID: dQw4w9WgXcQ



{'kind': 'youtube#videoListResponse',
 'etag': 'SWl9MXAwiDch0AUZVyEox9h1s7E',
 'items': [{'kind': 'youtube#video',
   'etag': '69pihEvHGLXBHela9eq-UxFZTtA',
   'id': 'dQw4w9WgXcQ',
   'snippet': {'publishedAt': '2009-10-25T06:57:33Z',
    'channelId': 'UCuAXFkgsw1L7xaCfnd5JJOw',
    'title': 'Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)',
    'description': 'The official video for “Never Gonna Give You Up” by Rick Astley. \n\nNever: The Autobiography 📚 OUT NOW! \nFollow this link to get your copy and listen to Rick’s ‘Never’ playlist ❤️ #RickAstleyNever\nhttps://linktr.ee/rickastleynever\n\n“Never Gonna Give You Up” was a global smash on its release in July 1987, topping the charts in 25 countries including Rick’s native UK and the US Billboard Hot 100.  It also won the Brit Award for Best single in 1988. Stock Aitken and Waterman wrote and produced the track which was the lead-off single and lead track from Rick’s debut LP “Whenever You Need Somebody”.  The alb

In [8]:
def print_keys(d, parent_key=''):
    for key, value in d.items():
        full_key = f"{parent_key}.{key}" if parent_key else key
        print(full_key)
        if isinstance(value, dict):
            print_keys(value, full_key)

print_keys(video_data['items'][0])

kind
etag
id
snippet
snippet.publishedAt
snippet.channelId
snippet.title
snippet.description
snippet.thumbnails
snippet.thumbnails.default
snippet.thumbnails.default.url
snippet.thumbnails.default.width
snippet.thumbnails.default.height
snippet.thumbnails.medium
snippet.thumbnails.medium.url
snippet.thumbnails.medium.width
snippet.thumbnails.medium.height
snippet.thumbnails.high
snippet.thumbnails.high.url
snippet.thumbnails.high.width
snippet.thumbnails.high.height
snippet.thumbnails.standard
snippet.thumbnails.standard.url
snippet.thumbnails.standard.width
snippet.thumbnails.standard.height
snippet.thumbnails.maxres
snippet.thumbnails.maxres.url
snippet.thumbnails.maxres.width
snippet.thumbnails.maxres.height
snippet.channelTitle
snippet.tags
snippet.categoryId
snippet.liveBroadcastContent
snippet.defaultLanguage
snippet.localized
snippet.localized.title
snippet.localized.description
snippet.defaultAudioLanguage
contentDetails
contentDetails.duration
contentDetails.dimension
contentD

In [9]:
import json
# save as json file
with open(f'youtube_video_{video_id}.json', 'w', encoding='utf-8') as f:
    json.dump(video_data['items'][0], f, ensure_ascii=False, indent=4)

## 8. Display Full Video Data (JSON Format)

In [None]:
# Display the complete video data in JSON format
if 'error' not in video_data:
    print(json.dumps(video_data, indent=2))

## 9. Display Video Description

In [None]:
# Display the video description
if 'error' not in video_data:
    print("📝 Video Description:")
    print("=" * 80)
    print(video_data['description'])
    print("=" * 80)

## 10. Additional Function: Get Video Transcript/Captions

To get video transcripts, you'll need the `youtube-transcript-api` package.

In [None]:
# Install youtube-transcript-api
!pip install youtube-transcript-api

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound

def get_video_transcript(video_id, languages=['en']):
    """
    Get the transcript/captions for a YouTube video.
    
    Args:
        video_id (str): YouTube video ID
        languages (list): List of language codes to try (default: ['en'])
        
    Returns:
        list: Transcript data with timestamps
    """
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=languages)
        return transcript
    except TranscriptsDisabled:
        return {'error': 'Transcripts are disabled for this video'}
    except NoTranscriptFound:
        return {'error': f'No transcript found in languages: {languages}'}
    except Exception as e:
        return {'error': f'Error: {str(e)}'}

print("✅ get_video_transcript() function defined")

## 11. Get Transcript for the Video

In [None]:
# Get transcript for the video
transcript = get_video_transcript(video_id)

if isinstance(transcript, dict) and 'error' in transcript:
    print(f"❌ {transcript['error']}")
else:
    print(f"✅ Transcript retrieved! ({len(transcript)} segments)\n")
    print("First 5 segments:")
    print("=" * 80)
    for i, segment in enumerate(transcript[:5]):
        timestamp = segment['start']
        text = segment['text']
        print(f"[{timestamp:.2f}s] {text}")
    print("=" * 80)

## 12. Convert Transcript to Full Text

In [None]:
# Combine all transcript segments into a single text
if isinstance(transcript, list):
    full_text = ' '.join([segment['text'] for segment in transcript])
    
    print("📄 Full Transcript:")
    print("=" * 80)
    print(full_text)
    print("=" * 80)
    print(f"\nTotal characters: {len(full_text)}")
    print(f"Total words: {len(full_text.split())}")

## 13. Save Video Data and Transcript to File

In [None]:
import json

def save_video_data(video_data, transcript, output_dir='output'):
    """
    Save video data and transcript to JSON files.
    """
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    video_id = video_data['video_id']
    
    # Save video metadata
    metadata_path = os.path.join(output_dir, f'{video_id}_metadata.json')
    with open(metadata_path, 'w', encoding='utf-8') as f:
        json.dump(video_data, f, indent=2, ensure_ascii=False)
    print(f"✅ Metadata saved to: {metadata_path}")
    
    # Save transcript if available
    if isinstance(transcript, list):
        transcript_path = os.path.join(output_dir, f'{video_id}_transcript.json')
        with open(transcript_path, 'w', encoding='utf-8') as f:
            json.dump(transcript, f, indent=2, ensure_ascii=False)
        print(f"✅ Transcript saved to: {transcript_path}")
        
        # Save full text version
        text_path = os.path.join(output_dir, f'{video_id}_transcript.txt')
        full_text = ' '.join([segment['text'] for segment in transcript])
        with open(text_path, 'w', encoding='utf-8') as f:
            f.write(full_text)
        print(f"✅ Full transcript text saved to: {text_path}")

# Save the data
if 'error' not in video_data:
    save_video_data(video_data, transcript)

## Summary

This notebook demonstrates:
1. ✅ Setting up YouTube Data API v3
2. ✅ Extracting video IDs from URLs
3. ✅ Fetching video metadata (title, description, statistics)
4. ✅ Getting video transcripts/captions
5. ✅ Saving data to files

### Next Steps:
- Integrate this with your notes database
- Build a web interface to input video URLs
- Add automatic note generation from transcripts
- Implement video search functionality