# YouTube Sri Lankan Content Analysis

This notebook demonstrates exploratory data analysis of YouTube videos from Sri Lankan channels using the collected and processed data.

## Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

In [None]:
# Load processed data
import sys
import os
sys.path.append('../scripts')

from utils import load_from_csv

# Load the most recent processed data file
processed_files = [f for f in os.listdir('../data/processed') if f.startswith('processed_videos_') and f.endswith('.csv')]

if processed_files:
    latest_file = max(processed_files)
    df = load_from_csv(f'../data/processed/{latest_file}')
    print(f"Loaded {len(df)} videos from {latest_file}")
    print(f"Dataset shape: {df.shape}")
else:
    print("No processed data files found. Please run process_data.py first.")
    # Create sample data for demonstration
    df = pd.DataFrame({
        'video_id': ['sample_1', 'sample_2'],
        'title': ['Sample Video 1', 'Sample Video 2'],
        'view_count': [1000, 2000],
        'like_count': [50, 100],
        'comment_count': [10, 20]
    })

## Data Overview

In [None]:
# Basic information about the dataset
print("Dataset Info:")
print(f"Total videos: {len(df)}")
print(f"Total features: {len(df.columns)}")
print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

if 'published_at' in df.columns:
    df['published_at'] = pd.to_datetime(df['published_at'])
    print(f"Date range: {df['published_at'].min()} to {df['published_at'].max()}")

print("\nColumn types:")
print(df.dtypes.value_counts())

In [None]:
# Display first few rows
df.head()

In [None]:
# Basic statistics for numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns
df[numeric_cols].describe()

## Summary

This notebook provides a comprehensive analysis of YouTube video performance for Sri Lankan content. The analysis includes:

- **Channel Category Analysis**: Distribution and performance across different content categories
- **Engagement Metrics**: Views, likes, comments, and engagement ratios
- **Temporal Patterns**: Publishing times and their impact on performance
- **Content Analysis**: Title characteristics and their correlation with success

Use this analysis to understand trends in Sri Lankan YouTube content and optimize content strategy.