<a href="https://www.kaggle.com/code/georgiosspyrou1/youtube-video-virality-predictor-eda?scriptVersionId=225139961" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 🌍 YouTube Trending Videos Analysis & Virality Prediction 🚀  

## 📌 Project Overview  
This project aims to **collect, analyze, and model YouTube trending video data** across multiple countries using the **YouTube API**. By gathering daily trending video statistics, we seek to uncover the patterns and factors that contribute to a video's virality.  

## 🛠️ How the Data is Collected  
The data is automatically collected using the **YouTube API** and stored in this GitHub repository:  
🔗 [Trending Video Metadata](https://github.com/gpsyrou/tube-virality/tree/main/assets/meta/trending)  

The main project repository can be found here: https://github.com/gpsyrou/tube-virality

### **Data Collection Process**  
1. **Fetching Trending Videos**:  
   - We start by retrieving the latest trending videos from **multiple countries** via the **YouTube API**.  
   - These trending videos are stored in JSON format in our repository.  

2. **Daily Statistics Updates**:  
   - Each day, a GitHub Actions workflow runs automatically to **gather fresh statistics** for all stored trending videos.  
   - The script fetches video engagement metrics (views, likes, comments, etc.) and updates them in JSON files.  
   - The updated dataset is saved here:  
     🔗 [Video Statistics](https://github.com/gpsyrou/tube-virality/tree/main/assets/meta/video_stats)  

## 🔍 Objectives  
- **Understand Viral Trends**: Identify factors that make a video go viral across different countries.  
- **Statistical Analysis**: Explore engagement trends (views, likes, shares) over time.  
- **Machine Learning Models**: Build predictive models to estimate a video's likelihood of becoming viral.  

## 📊 Dataset & Features  
Our dataset consists of **daily trending videos** from multiple regions, with key features such as:  
- **Video Metadata**: Title, description, channel, published date, tags  
- **Engagement Metrics**: Views, likes, comments, favorites  
- **Technical Details**: Duration, resolution, captions availability  
- **Categorical Information**: Topic categories, privacy settings, license type  

## 🛠️ Methodology  
1. **Data Collection**: Fetch trending video data daily using the YouTube API.  
2. **Data Cleaning & Transformation**: Standardizing, handling missing values, and processing categorical features.  
3. **Exploratory Analysis**: Visualizing key trends in video engagement across countries.  
4. **Feature Engineering**: Extracting meaningful attributes (e.g., video age, growth rate).  
5. **Model Building**: Training machine learning models (classification, regression) to predict virality.  
6. **Evaluation & Interpretation**: Understanding model results and key drivers of viral content.  

## 🚀 Expected Outcomes  
- A **comprehensive dataset** of YouTube trending videos across multiple countries.  
- Insights into **what makes a video go viral** based on statistical analysis.  
- Predictive models that can estimate a video's potential virality.  
- Open-source tools for **researchers, content creators, and marketers** to better understand online video trends.  
  

**Stay tuned as we uncover the secrets of YouTube virality! 🚀📈**  


In [1]:
import os 
os.getcwd()

'/kaggle/working'

In [2]:
import pandas as pd
import json
import requests

In [3]:
class GitHubDataLoader:
    def __init__(self, user, repo, file_path, branch="main"):
        self.user = user
        self.repo = repo
        self.file_path = file_path
        self.branch = branch
        self.raw_url = f"https://raw.githubusercontent.com/{self.user}/{self.repo}/{self.branch}/{self.file_path}"

    def get_csv_file(self):
        response = requests.get(self.raw_url)
        response.raise_for_status()  # Raise error if request fails
        return response.text  # Return CSV content as string


class CSVLoader:
    def __init__(self, csv_content):
        self.csv_content = csv_content

    def to_dataframe(self):
        from io import StringIO
        df = pd.read_csv(StringIO(self.csv_content))
        return df



In [4]:
YOUTUBE_CATEGORY_MAP = {
    1: "Film & Animation",
    2: "Autos & Vehicles",
    10: "Music",
    15: "Pets & Animals",
    17: "Sports",
    18: "Short Movies",
    19: "Travel & Events",
    20: "Gaming",
    21: "Videoblogging",
    22: "People & Blogs",
    23: "Comedy",
    24: "Entertainment",
    25: "News & Politics",
    26: "Howto & Style",
    27: "Education",
    28: "Science & Technology",
    29: "Nonprofits & Activism",
    30: "Movies",
    31: "Anime/Animation",
    32: "Action/Adventure",
    33: "Classics",
    34: "Comedy (Movies)",
    35: "Documentary",
    36: "Drama",
    37: "Family",
    38: "Foreign",
    39: "Horror",
    40: "Sci-Fi/Fantasy",
    41: "Thriller",
    42: "Shorts",
    43: "Shows",
    44: "Trailers"
}

## Trending Videos Analysis

In [5]:
loader = GitHubDataLoader("gpsyrou", "tube-virality", "db/ods/trending_videos.csv")
trending_videos_content = loader.get_csv_file()

trending_videos = CSVLoader(trending_videos_content)
df = trending_videos.to_dataframe()

df.shape

(2500, 15)

In [6]:
df.columns

Index(['id', 'trending_position', 'collection_date', 'publishedAt',
       'country_code', 'channelId', 'channelTitle', 'title', 'description',
       'categoryId', 'viewCount', 'likeCount', 'commentCount', 'thumbnail_url',
       'defaultAudioLanguage'],
      dtype='object')

In [7]:
df.rename(
    columns={'id': 'video_id'}, 
    inplace=True
)

In [8]:
df['categoryId'] = df['categoryId'].astype(int)
df['category'] = df['categoryId'].map(YOUTUBE_CATEGORY_MAP)
df['category'] = df['category'].fillna("Unknown Category")

In [9]:
df.sort_values(by=['video_id', 'collection_date'])

Unnamed: 0,video_id,trending_position,collection_date,publishedAt,country_code,channelId,channelTitle,title,description,categoryId,viewCount,likeCount,commentCount,thumbnail_url,defaultAudioLanguage,category
481,--MGIIuNZy8,32,2025-02-24,2025-02-22T15:00:00Z,IN,UCOQNJjhXwvAScuELTT_i7cQ,Sony LIV,Aman के Competition 'Sonic Lamb' को मिली Peyus...,https://www.sonyliv.com/shows/shark-tank-india...,24,1551499.0,32762.0,2140.0,https://i.ytimg.com/vi/--MGIIuNZy8/hqdefault.jpg,en,Entertainment
444,--MGIIuNZy8,45,2025-02-25,2025-02-22T15:00:00Z,IN,UCOQNJjhXwvAScuELTT_i7cQ,Sony LIV,Aman के Competition 'Sonic Lamb' को मिली Peyus...,https://www.sonyliv.com/shows/shark-tank-india...,24,2070021.0,39518.0,2442.0,https://i.ytimg.com/vi/--MGIIuNZy8/hqdefault.jpg,en,Entertainment
64,-0dKgdKWJP0,15,2025-03-01,2025-02-28T12:30:07Z,IN,UCdxbhKxr8pyWTx1ExCSmJRw,Girliyapa,Medical Dreams - E05 - Faisla | Season Finale ...,Mock test ka waqt aa gaya hai! Ab choice simpl...,24,504175.0,23709.0,2466.0,https://i.ytimg.com/vi/-0dKgdKWJP0/hqdefault.jpg,hi,Entertainment
2085,-5RSoJ5Ky00,36,2025-02-23,2025-02-21T11:30:00Z,GB,UCcCBrelNGfpchXKLFt8gtKg,Supercars,Race 1 Extended Highlights - Thrifty Sydney 50...,Lights are out on season 2025 with a tense spr...,2,115482.0,2781.0,125.0,https://i.ytimg.com/vi/-5RSoJ5Ky00/hqdefault.jpg,en,Autos & Vehicles
937,-7tWhJzScqk,38,2025-02-23,2025-02-23T04:30:07Z,IN,UCJcCB-QYPIBcbKcBQOTwhiA,Vj Siddhu Vlogs,மொட்டமாடி Party பண்ணக்கூடாதா😳😱 | Vj Siddhu Vlogs,For Business inquiries please contact us :7200...,22,1041552.0,104563.0,683.0,https://i.ytimg.com/vi/-7tWhJzScqk/hqdefault.jpg,ta,People & Blogs
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
849,zfpZ8vbVnMs,50,2025-03-01,2025-02-27T00:20:32Z,US,UCOUJtDtJMwImzRHHGjfLyKw,Jana Duggar,Honeymoon & Moving to Nebraska,We spent the first few days of our honeymoon i...,22,199675.0,8822.0,395.0,https://i.ytimg.com/vi/zfpZ8vbVnMs/hqdefault.jpg,en-US,People & Blogs
284,zpieZkvFnlE,35,2025-02-23,2025-02-20T00:00:51Z,US,UCkzCjdRMrW2vXLx8mvPVLdQ,Man City,HIGHLIGHTS Real Madrid 3-1 Man City | Mbappe (...,Manchester City were knocked out this season’s...,17,7242114.0,88119.0,5384.0,https://i.ytimg.com/vi/zpieZkvFnlE/hqdefault.jpg,en-GB,Sports
942,zpieZkvFnlE,43,2025-02-23,2025-02-20T00:00:51Z,IN,UCkzCjdRMrW2vXLx8mvPVLdQ,Man City,HIGHLIGHTS Real Madrid 3-1 Man City | Mbappe (...,Manchester City were knocked out this season’s...,17,7242114.0,88119.0,5384.0,https://i.ytimg.com/vi/zpieZkvFnlE/hqdefault.jpg,en-GB,Sports
1959,zrOk2ftcb2I,10,2025-02-23,2025-02-20T19:06:26Z,DE,UCMkyn9IIMiZrF0SdvwXYcsQ,Stefan Raab - Topic,Rambo Zambo (Was is Bubatz?) (feat. Fritze Merz),Provided to YouTube by Raab Music\n\nRambo Zam...,10,182049.0,3767.0,0.0,https://i.ytimg.com/vi/zrOk2ftcb2I/hqdefault.jpg,,Music
