# YouTube Video Trend Analysis

## Introduction
YouTube is one of the largest video-sharing platforms in the world, with millions of videos uploaded and watched daily. Understanding what makes a video trend can be valuable for content creators, marketers, and analysts. In this project, we analyze a dataset of trending YouTube videos to uncover patterns and insights related to video popularity.

Through this analysis, we hope to discover actionable insights that can help improve content strategy and audience engagement on YouTube.


![](https://cdn.pixabay.com/animation/2023/03/24/18/16/18-16-28-807_512.gif)

## 1. Import Libraries
Import the necessary Python libraries.

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

In [None]:
youTube = pd.read_csv('youtube.csv')

## 2. Load Dataset
Load the YouTube dataset and inspect its structure.

In [None]:
youTube.head()

## 3. Initial Data Exploration
- Check for missing values and duplicates.
- Explore data types and unique values.

In [None]:
# finding duplicates
youTube.duplicated().sum()  # Check for duplicates in the DataFrame

In [None]:
#finding dupicates in a video_id column
youTube['video_id'].duplicated().sum()

In [None]:
# exploring the data types
youTube.dtypes 

## 4. Data Cleaning
- Handle duplicates
- Convert date fields to datetime
- Standardize category codes

## 5. Descriptive Statistics
Get a statistical summary of numeric and categorical features.

In [None]:
# get a statistical summary of the data
youTube.describe()

### Converting the dates to datetime format

In [None]:
youTube['publish_date'] = pd.to_datetime(youTube['publish_date'],dayfirst=True)

In [None]:
youTube['trending_date'] = pd.to_datetime(youTube['trending_date'], format='%y.%d.%m')

In [None]:
youTube.head()

In [None]:
#create a new table based on grouping the data by video_id 
youTube_unique = youTube.drop_duplicates(subset='video_id', keep='first')

In [None]:
youTube_unique.shape 

In [None]:
youTube_unique.head()

### Replae the category id with category names (for youtube table and youtube_unique)

In [None]:
categories_typpe = pd.read_csv('categories_id.csv')

In [None]:
youTube['category_id'] = youTube['category_id'].replace(categories_typpe.set_index('category_id')['category'].to_dict())
# change the column name category_id to category
youTube.rename(columns={'category_id': 'category'}, inplace=True)
youTube.head()

In [None]:
youTube_unique['category_id'] = youTube_unique['category_id'].replace(categories_typpe.set_index('category_id')['category'].to_dict())
# change the column name category_id to category
youTube_unique.rename(columns={'category_id': 'category'}, inplace=True)
youTube_unique.head()

## 7. Insights

## 1- What are the Top 10 Trending YouTube Categories?

In [None]:
# Group the DataFrame by 'category' and count the number of videos in each category
trending_videos_by_category = youTube_unique.groupby('category').size().reset_index(name='count')

# Sort the DataFrame by 'count'
trending_videos_by_category = trending_videos_by_category.sort_values(by='count', ascending=False)
trending_videos_by_category.reset_index(drop=True, inplace=True)

trending_videos_by_category.head(10)

In [None]:
fig = px.bar(trending_videos_by_category.head(10),x='count',y='category',
       title='Top 10 Trending YouTube Categories',text='count',color_discrete_sequence=['red']) 

fig.update_layout(
    plot_bgcolor='white',paper_bgcolor='white', title_x=0.5)

fig.show()

## 2- How many videos published eventually became trending per month?

In [None]:
youTube_unique['trending_month'] = youTube_unique['trending_date'].dt.month
monthly_trends = youTube_unique.groupby('trending_month').size().reset_index(name ='Number of Trending Videos')
monthly_trends

#### there are 4 months that no video became a trend on them

In [None]:
fig = px.line(
    monthly_trends,x='trending_month', y='Number of Trending Videos',
    title='Videos to become trending per month',markers=True,text='Number of Trending Videos',
    color_discrete_sequence=['red'])

fig.update_traces(
    textposition='top center')

fig.update_layout(plot_bgcolor='white')

fig.show()

In [None]:
youTube_unique['trending_days'] = (youTube_unique['trending_date'] - youTube_unique['publish_date']).dt.days

avg_trending_days = youTube_unique.groupby('category')['trending_days'].mean().reset_index().round(1).sort_values(by='trending_days', ascending=True)

fig = px.line( avg_trending_days,x='category', y='trending_days',
    title='Average Number of Days to Trend by Category',
    markers=True, text='trending_days', color_discrete_sequence=['red'])

fig.update_traces(textposition='top center' )

fig.update_layout(plot_bgcolor='white')

fig.show()

## 3- What Countries has Most Published Videos?

In [None]:
country_most_published = youTube_unique.groupby('publish_country')['video_id'].count().reset_index(name='count')
country_most_published.reset_index(drop=True, inplace=True)
country_most_published 

In [None]:
fig = px.pie(
    country_most_published,values='count',names='publish_country',
    title='Countries with Most Published Videos',
    color_discrete_sequence=px.colors.sequential.RdBu ) 

fig.update_traces(
    textposition='inside',textinfo='percent+label')


fig.show()

## 4- What Countries has Most trending Videos?

In [None]:
country_most_trending = youTube.groupby('publish_country')['video_id'].count().reset_index(name='count')
country_most_trending.reset_index(drop=True, inplace=True)
country_most_trending

In [None]:
fig = px.pie(
    country_most_trending,values='count', names='publish_country',
    title='Countries with Most Trending Videos',
    
    color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(
    textposition='inside',
    textinfo='percent+label')

fig.show()

## 5- What channel has Most published Videos?

In [None]:
top_channels_published = youTube_unique.groupby('channel_title')['video_id'].count().reset_index(name='count').sort_values(by='count', ascending=False)
top_channels_published.reset_index(drop=True, inplace=True)
top_channels_published.head(10)

In [None]:
fig = px.funnel(
    top_channels_published.head(10),x='count',y='channel_title',
    title='Top 10 Channels with Most Published Videos',
    color_discrete_sequence=['red'])

fig.update_traces(textposition='inside',textinfo='label+value')

fig.update_layout(plot_bgcolor='white')

fig.show()

## 6- What channel has Most Trending Videos?

In [None]:
top_channels_trending = youTube.groupby('channel_title')['video_id'].count().reset_index(name='count').sort_values(by='count', ascending=False)
top_channels_trending.reset_index(drop=True, inplace=True)
top_channels_trending.head(10)

In [None]:
fig = px.funnel(top_channels_trending.head(10),x='count', y='channel_title',
    title='Top 10 Channels with Most Trending Videos',
    color_discrete_sequence=['red'])

fig.update_traces(
    textposition='inside',textinfo='label+value')

fig.update_layout(plot_bgcolor='white')
fig.show()

## 7- What are the most common days and times for videos to be published on YouTube?

In [None]:
publish_pattern = (youTube.groupby(['published_day_of_week', 'time_frame']).size().reset_index(name='count'))
publish_pattern.sort_values(['time_frame'], inplace=True)

fig = px.density_heatmap(publish_pattern,x='time_frame',y='published_day_of_week',z='count',
    title='Most Common Publishing Days and Times',
    labels={'count': 'Number of Videos'},color_continuous_scale='Reds')
fig.show()


## 8-How do views and likes vary across different YouTube categories?

In [None]:

fig = px.scatter( youTube_unique,x='views', y='likes',size='comment_count',color='category',
                 
    size_max=50,
    labels={
        'views': 'Views',
        'likes': 'Likes',
        'comment_count': 'Comments',
        'category': 'Category'})
fig.update_layout(plot_bgcolor='white')

fig.show()  

## 9- What is the correlation between views, likes, and dislikes on trending YouTube videos?

In [None]:
px.imshow(youTube[['views', 'likes', 'dislikes', 'comment_count']].corr(numeric_only=True),
          text_auto=True, title='Correlation Matrix of Views, Likes, Dislikes, and Comment Count', color_continuous_scale='Reds').show()

In [None]:
fig = px.bar(
    youTube_unique.groupby('comments_disabled')['views'].mean().reset_index(),
    x='comments_disabled',
    y='views',
    title='Total Views by Comment Status',
    text='views',
    color_discrete_sequence=['red']
)

fig.update_layout(
    plot_bgcolor='white')

fig.show()
