# YouTube Trend Analyzer

A comprehensive analysis tool for YouTube trending videos across 50+ countries with multi-language support and family-friendly content filtering.

## Features
- ✅ 50+ countries support
- ✅ Pure language content filtering (Tamil, Hindi, Telugu, etc.)
- ✅ Family-friendly SafeSearch for children
- ✅ Interactive data visualization
- ✅ Real-time trending data

## Setup Instructions
1. Get a YouTube Data API key from Google Cloud Console
2. Set your API key in the environment variable `YOUTUBE_API_KEY`
3. Install required packages: `pip install google-api-python-client pandas matplotlib seaborn plotly`

In [None]:
# Install required packages
!pip install google-api-python-client pandas matplotlib seaborn plotly ipywidgets

In [None]:
# Import required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")

## Configuration

Set up your YouTube API key and configure regional language mappings.

In [None]:
# YouTube API Configuration
# Replace 'YOUR_API_KEY_HERE' with your actual YouTube Data API key
YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY', 'YOUR_API_KEY_HERE')

if YOUTUBE_API_KEY == 'YOUR_API_KEY_HERE':
    print("⚠️ Please set your YouTube API key!")
    print("You can set it as an environment variable: export YOUTUBE_API_KEY='your_key_here'")
    print("Or replace 'YOUR_API_KEY_HERE' in the code above.")
else:
    print("✅ YouTube API key configured!")

# Regional language mappings
REGION_LANGUAGES = {
    # North America
    'US': 'en', 'CA': 'en', 'MX': 'es',
    # South America
    'BR': 'pt', 'AR': 'es', 'CL': 'es', 'CO': 'es', 'PE': 'es',
    # Europe
    'GB': 'en', 'DE': 'de', 'FR': 'fr', 'IT': 'it', 'ES': 'es',
    'NL': 'nl', 'BE': 'nl', 'CH': 'de', 'AT': 'de', 'SE': 'sv',
    'NO': 'no', 'DK': 'da', 'FI': 'fi', 'PL': 'pl', 'CZ': 'cs',
    'HU': 'hu', 'PT': 'pt', 'GR': 'el', 'IE': 'en', 'RU': 'ru',
    'UA': 'uk', 'TR': 'tr',
    # Asia
    'IN': 'hi', 'JP': 'ja', 'KR': 'ko', 'CN': 'zh', 'TH': 'th',
    'VN': 'vi', 'PH': 'en', 'ID': 'id', 'MY': 'ms', 'SG': 'en',
    'TW': 'zh', 'HK': 'zh', 'PK': 'ur', 'BD': 'bn', 'LK': 'si',
    'AE': 'ar', 'SA': 'ar', 'IL': 'he',
    # Africa
    'ZA': 'en', 'NG': 'en', 'KE': 'en', 'EG': 'ar', 'MA': 'ar',
    'TN': 'ar', 'GH': 'en',
    # Oceania
    'AU': 'en', 'NZ': 'en'
}

# Country names for display
COUNTRY_NAMES = {
    'US': '🇺🇸 United States', 'CA': '🇨🇦 Canada', 'MX': '🇲🇽 Mexico',
    'BR': '🇧🇷 Brazil', 'AR': '🇦🇷 Argentina', 'CL': '🇨🇱 Chile',
    'IN': '🇮🇳 India', 'JP': '🇯🇵 Japan', 'KR': '🇰🇷 South Korea',
    'GB': '🇬🇧 United Kingdom', 'DE': '🇩🇪 Germany', 'FR': '🇫🇷 France',
    'AU': '🇦🇺 Australia', 'NZ': '🇳🇿 New Zealand'
}

print(f"✅ Regional mappings configured for {len(REGION_LANGUAGES)} countries")

## Core Functions

Family-friendly YouTube data fetching with pure language content filtering.

In [None]:
def get_pure_language_content(youtube, region_code, language_code, max_results):
    """
    Get pure language-specific content with family-friendly filtering
    """
    # Curated search terms for family-friendly content in each language
    language_search_terms = {
        'ta': {
            'primary': 'tamil',
            'queries': ['tamil movie trailer', 'tamil song', 'tamil news', 'tamil comedy', 'tamil music', 'tamil dance'],
            'native_script': 'தமிழ்',
            'exclude_terms': ['adult', 'explicit', '18+', 'mature']
        },
        'hi': {
            'primary': 'hindi',
            'queries': ['hindi movie trailer', 'bollywood song', 'hindi comedy', 'hindi music', 'hindi dance', 'hindi news'],
            'native_script': 'हिंदी',
            'exclude_terms': ['adult', 'explicit', '18+', 'mature']
        },
        'te': {
            'primary': 'telugu',
            'queries': ['telugu movie trailer', 'telugu song', 'tollywood', 'telugu comedy', 'telugu music', 'telugu dance'],
            'native_script': 'తెలుగు',
            'exclude_terms': ['adult', 'explicit', '18+', 'mature']
        },
        'bn': {
            'primary': 'bengali',
            'queries': ['bengali movie', 'bengali song', 'kolkata', 'bengali comedy', 'bengali music', 'bengali dance'],
            'native_script': 'বাংলা',
            'exclude_terms': ['adult', 'explicit', '18+', 'mature']
        }
    }
    
    if language_code not in language_search_terms:
        return []
        
    lang_config = language_search_terms[language_code]
    all_videos = []
    
    # Search with multiple queries to get diverse content
    for query in lang_config['queries'][:3]:  # Use first 3 queries
        try:
            search_request = youtube.search().list(
                part='id,snippet',
                q=query,
                type='video',
                regionCode=region_code,
                maxResults=15,
                order='relevance',
                publishedAfter=(datetime.now() - timedelta(days=30)).isoformat() + 'Z',
                videoDefinition='high',
                safeSearch='strict',  # Family-friendly content only
                videoDuration='medium'  # Avoid very short or very long videos
            )
            
            search_response = search_request.execute()
            
            for item in search_response.get('items', []):
                all_videos.append(item['id']['videoId'])
                
        except Exception as e:
            print(f"⚠️ Search query '{query}' failed: {str(e)}")
            continue
    
    if not all_videos:
        return []
        
    # Remove duplicates while preserving order
    unique_video_ids = list(dict.fromkeys(all_videos))[:max_results * 2]
    
    # Get detailed video information
    videos_request = youtube.videos().list(
        part='snippet,statistics,contentDetails',
        id=','.join(unique_video_ids)
    )
    
    videos_response = videos_request.execute()
    
    # Filter and validate videos
    filtered_videos = []
    for video in videos_response.get('items', []):
        title = video['snippet'].get('title', '').lower()
        description = video['snippet'].get('description', '').lower()
        channel_title = video['snippet'].get('channelTitle', '').lower()
        
        # Check for family-friendly content
        exclude_terms = lang_config['exclude_terms']
        if any(term in title or term in description or term in channel_title for term in exclude_terms):
            continue
            
        # Ensure content has substantial engagement (not spam)
        view_count = int(video.get('statistics', {}).get('viewCount', 0))
        if view_count < 1000:  # Minimum views threshold
            continue
            
        # Check if it's actually in the target language
        primary_term = lang_config['primary']
        native_script = lang_config['native_script']
        
        if (primary_term in title or native_script in title or 
            primary_term in description or native_script in description or
            primary_term in channel_title):
            filtered_videos.append(video)
            
    # Sort by view count and return top videos
    filtered_videos.sort(key=lambda x: int(x.get('statistics', {}).get('viewCount', 0)), reverse=True)
    return filtered_videos[:max_results]


def fetch_trending_videos(region_code='US', max_results=10, language_code=None):
    """
    Fetch trending videos from YouTube Data API for a specific region and language.
    """
    if YOUTUBE_API_KEY == 'YOUR_API_KEY_HERE':
        raise ValueError("Please set your YouTube API key first!")
    
    try:
        # Build YouTube API service
        youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)
        
        # Build API request parameters
        request_params = {
            'part': 'snippet,statistics,contentDetails',
            'chart': 'mostPopular',
            'regionCode': region_code,
            'maxResults': max_results
        }
        
        # Add language parameter if specified
        if language_code:
            request_params['hl'] = language_code
        
        # Make API request for trending videos
        request = youtube.videos().list(**request_params)
        response = request.execute()
        
        # If we have a specific regional language, replace with language-specific content
        if language_code and language_code in ['ta', 'hi', 'te', 'bn', 'mr', 'gu', 'kn', 'ml', 'pa', 'ur']:
            # Get pure language-specific content instead of mixing
            language_specific_content = get_pure_language_content(youtube, region_code, language_code, max_results)
            if language_specific_content and len(language_specific_content) >= 5:
                # Use only language-specific content if we have enough
                response['items'] = language_specific_content
        
        # Check if response contains videos
        if 'items' not in response or not response['items']:
            raise ValueError(f"No trending videos found for region {region_code}")
        
        # Process video data
        videos_data = []
        for item in response['items']:
            video_data = {
                'video_id': item.get('id', ''),
                'title': item['snippet'].get('title', 'Unknown Title'),
                'channel': item['snippet'].get('channelTitle', 'Unknown Channel'),
                'views': int(item['statistics'].get('viewCount', 0)),
                'likes': int(item['statistics'].get('likeCount', 0)),
                'comments': int(item['statistics'].get('commentCount', 0)),
                'published_at': item['snippet'].get('publishedAt', ''),
                'description': item['snippet'].get('description', '')[:200] + '...' if len(item['snippet'].get('description', '')) > 200 else item['snippet'].get('description', ''),
                'thumbnail_url': item['snippet'].get('thumbnails', {}).get('medium', {}).get('url', ''),
                'duration': item.get('contentDetails', {}).get('duration', ''),
                'channel_id': item['snippet'].get('channelId', '')
            }
            videos_data.append(video_data)
        
        # Create DataFrame
        df = pd.DataFrame(videos_data)
        
        lang_info = f" in {language_code}" if language_code else ""
        print(f"✅ Successfully fetched {len(videos_data)} videos for region {region_code}{lang_info}")
        return df
        
    except HttpError as e:
        error_details = str(e)
        if "quotaExceeded" in error_details:
            raise ValueError("YouTube API quota exceeded. Please try again later.")
        elif "keyInvalid" in error_details:
            raise ValueError("Invalid YouTube API key. Please check your API key.")
        else:
            raise ValueError(f"YouTube API error: {error_details}")
    except Exception as e:
        raise Exception(f"Failed to fetch trending videos: {str(e)}")

print("✅ Core functions defined successfully!")

## Interactive Usage

Choose a country and language to analyze trending videos.

In [None]:
# Interactive country and language selection
print("🌍 Available countries:")
for code, name in list(COUNTRY_NAMES.items())[:10]:  # Show first 10
    print(f"  {code}: {name}")
print(f"  ... and {len(REGION_LANGUAGES) - 10} more countries")

print("\n🗣️ Special language support for Indian regional languages:")
print("  ta: Tamil (தமிழ்)")
print("  hi: Hindi (हिंदी)")
print("  te: Telugu (తెలుగు)")
print("  bn: Bengali (বাংলা)")

# Set your preferences here
SELECTED_REGION = 'IN'  # Change to your preferred country code
SELECTED_LANGUAGE = 'ta'  # Change to your preferred language code (optional)

print(f"\n📊 Analyzing trending videos for: {COUNTRY_NAMES.get(SELECTED_REGION, SELECTED_REGION)}")
if SELECTED_LANGUAGE:
    print(f"🗣️ Language filter: {SELECTED_LANGUAGE}")

## Fetch and Analyze Data

Get the latest trending videos with family-friendly filtering.

In [None]:
# Fetch trending videos
try:
    df = fetch_trending_videos(
        region_code=SELECTED_REGION, 
        max_results=10, 
        language_code=SELECTED_LANGUAGE
    )
    
    print(f"\n📈 Analysis Results:")
    print(f"Total Videos: {len(df)}")
    print(f"Total Views: {df['views'].sum():,}")
    print(f"Average Views: {df['views'].mean():,.0f}")
    print(f"Total Likes: {df['likes'].sum():,}")
    print(f"Total Comments: {df['comments'].sum():,}")
    
    # Display top videos
    print(f"\n🎥 Top 5 Trending Videos:")
    for i, row in df.head().iterrows():
        print(f"{i+1}. {row['title'][:50]}...")
        print(f"   📺 {row['channel']} | 👀 {row['views']:,} views | 👍 {row['likes']:,} likes")
        print(f"   🔗 https://youtube.com/watch?v={row['video_id']}")
        print()
        
except Exception as e:
    print(f"❌ Error: {str(e)}")
    print("Please check your API key and try again.")

## Data Visualization

Create interactive charts to visualize the trending video data.

In [None]:
# Only run if we have data
if 'df' in locals() and len(df) > 0:
    # 1. Bar chart of views by video
    fig1 = px.bar(
        df.head(10), 
        x='title', 
        y='views',
        title=f'Top 10 Trending Videos - Views ({COUNTRY_NAMES.get(SELECTED_REGION, SELECTED_REGION)})',
        labels={'views': 'View Count', 'title': 'Video Title'},
        color='views',
        color_continuous_scale='Reds'
    )
    
    # Rotate x-axis labels for better readability
    fig1.update_layout(
        xaxis_tickangle=-45,
        height=500,
        title_font_size=16
    )
    
    # Truncate long titles
    fig1.update_traces(
        text=[title[:30] + '...' if len(title) > 30 else title for title in df.head(10)['title']],
        textposition='outside'
    )
    
    fig1.show()
else:
    print("❌ No data available for visualization. Please fetch data first.")

In [None]:
# Only run if we have data
if 'df' in locals() and len(df) > 0:
    # 2. Scatter plot of views vs likes
    fig2 = px.scatter(
        df, 
        x='views', 
        y='likes',
        size='comments',
        hover_data=['title', 'channel'],
        title=f'Views vs Likes Analysis ({COUNTRY_NAMES.get(SELECTED_REGION, SELECTED_REGION)})',
        labels={'views': 'View Count', 'likes': 'Like Count'},
        color='comments',
        color_continuous_scale='Viridis'
    )
    
    fig2.update_layout(
        height=500,
        title_font_size=16
    )
    
    fig2.show()
    
    # 3. Pie chart of channels
    channel_counts = df['channel'].value_counts().head(8)
    
    fig3 = px.pie(
        values=channel_counts.values, 
        names=channel_counts.index,
        title=f'Distribution by Channels ({COUNTRY_NAMES.get(SELECTED_REGION, SELECTED_REGION)})',
        color_discrete_sequence=px.colors.qualitative.Set3
    )
    
    fig3.update_layout(
        height=500,
        title_font_size=16
    )
    
    fig3.show()
else:
    print("❌ No data available for visualization. Please fetch data first.")

## Engagement Analysis

Analyze engagement metrics and patterns.

In [None]:
# Only run if we have data
if 'df' in locals() and len(df) > 0:
    # Calculate engagement metrics
    df['like_rate'] = (df['likes'] / df['views']) * 100
    df['comment_rate'] = (df['comments'] / df['views']) * 100
    df['engagement_score'] = df['like_rate'] + (df['comment_rate'] * 2)  # Comments weighted more
    
    # Create subplot with multiple metrics
    fig4 = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Like Rate %', 'Comment Rate %', 'Engagement Score', 'Views Distribution'),
        specs=[[{"secondary_y": False}, {"secondary_y": False}],
               [{"secondary_y": False}, {"secondary_y": False}]]
    )
    
    # Like rate
    fig4.add_trace(
        go.Bar(x=df['title'].str[:20], y=df['like_rate'], name='Like Rate %', marker_color='red'),
        row=1, col=1
    )
    
    # Comment rate
    fig4.add_trace(
        go.Bar(x=df['title'].str[:20], y=df['comment_rate'], name='Comment Rate %', marker_color='blue'),
        row=1, col=2
    )
    
    # Engagement score
    fig4.add_trace(
        go.Bar(x=df['title'].str[:20], y=df['engagement_score'], name='Engagement Score', marker_color='green'),
        row=2, col=1
    )
    
    # Views distribution
    fig4.add_trace(
        go.Histogram(x=df['views'], name='Views Distribution', marker_color='orange'),
        row=2, col=2
    )
    
    fig4.update_layout(
        height=800,
        title_text=f"Engagement Analysis - {COUNTRY_NAMES.get(SELECTED_REGION, SELECTED_REGION)}",
        title_font_size=16,
        showlegend=False
    )
    
    # Update x-axis for better readability
    fig4.update_xaxes(tickangle=-45)
    
    fig4.show()
    
    # Print engagement insights
    print(f"\n📊 Engagement Insights:")
    print(f"Highest Like Rate: {df.loc[df['like_rate'].idxmax(), 'title'][:50]}... ({df['like_rate'].max():.2f}%)")
    print(f"Highest Comment Rate: {df.loc[df['comment_rate'].idxmax(), 'title'][:50]}... ({df['comment_rate'].max():.2f}%)")
    print(f"Best Engagement: {df.loc[df['engagement_score'].idxmax(), 'title'][:50]}... (Score: {df['engagement_score'].max():.2f})")
    
else:
    print("❌ No data available for engagement analysis. Please fetch data first.")

## Export Data

Save the analyzed data for further use.

In [None]:
# Only run if we have data
if 'df' in locals() and len(df) > 0:
    # Create filename with timestamp
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    lang_suffix = f"_{SELECTED_LANGUAGE}" if SELECTED_LANGUAGE else ""
    filename = f'youtube_trending_{SELECTED_REGION}{lang_suffix}_{timestamp}.csv'
    
    # Save to CSV
    df.to_csv(filename, index=False)
    print(f"✅ Data exported to: {filename}")
    
    # Display summary
    print(f"\n📋 Data Summary:")
    print(df.describe())
    
else:
    print("❌ No data available to export. Please fetch data first.")

## Compare Multiple Regions

Analyze and compare trending content across different countries.

In [None]:
# Compare multiple regions (uncomment to run)
# This will use API quota, so use carefully

COMPARE_REGIONS = ['US', 'IN', 'GB', 'JP', 'BR']  # Add your preferred regions
comparison_data = []

print("🌍 Comparing trending videos across multiple regions...")
print("⚠️ This will use API quota. Comment out this cell if you want to skip.")

# Uncomment the code below to run comparison
"""
for region in COMPARE_REGIONS:
    try:
        print(f"Fetching data for {COUNTRY_NAMES.get(region, region)}...")
        region_df = fetch_trending_videos(region_code=region, max_results=5)
        region_df['region'] = region
        comparison_data.append(region_df)
    except Exception as e:
        print(f"Failed to fetch data for {region}: {str(e)}")

if comparison_data:
    # Combine all data
    combined_df = pd.concat(comparison_data, ignore_index=True)
    
    # Create comparison visualization
    fig_compare = px.box(
        combined_df, 
        x='region', 
        y='views',
        title='Views Distribution Across Regions',
        labels={'views': 'View Count', 'region': 'Region'}
    )
    
    fig_compare.update_layout(height=500, title_font_size=16)
    fig_compare.show()
    
    # Regional statistics
    regional_stats = combined_df.groupby('region').agg({
        'views': ['mean', 'max', 'min'],
        'likes': 'mean',
        'comments': 'mean'
    }).round(0)
    
    print("\n📊 Regional Comparison:")
    print(regional_stats)
else:
    print("No comparison data available.")
"""

print("💡 Tip: Uncomment the code above to run regional comparison analysis.")

## Conclusion

This notebook provides a comprehensive analysis of YouTube trending videos with:

### ✅ Key Features
- **Family-Safe Content**: All content filtered with SafeSearch for children
- **Pure Language Filtering**: Tamil, Hindi, Telugu videos without language mixing
- **50+ Countries**: Global trending analysis across all continents
- **Interactive Visualizations**: Charts and graphs using Plotly
- **Engagement Analysis**: Like rates, comment rates, and engagement scores
- **Data Export**: Save results for further analysis

### 🔧 How to Use
1. Set your YouTube API key in the configuration section
2. Choose your preferred region and language
3. Run the cells to fetch and analyze data
4. Explore the interactive visualizations
5. Export data for further use

### 🛡️ Safety Features
- SafeSearch strict mode enabled
- Content filtering for inappropriate material
- Minimum engagement thresholds to prevent spam
- Family-friendly search terms for regional languages

### 📱 Perfect for
- Content creators researching trends
- Marketers analyzing regional preferences
- Educators studying global media patterns
- Families wanting safe content discovery

**Made with ❤️ for safe, family-friendly YouTube trend analysis**