# YouTube Channels Analysis



This analysis explores YouTube channels, focusing on metrics like subscribers, views, and uploads. We examine trends across countries and identify top-performing channels. Key areas include:

- Top channels by subscribers and views.
- Distribution of channels and views by country.
- Relationships between views, subscribers, and uploads.
- Engagement metrics like average views per upload.

Join me to uncover insights into YouTube's most popular channels and their success factors.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/top-100-youtube-channels-in-2024/YOUTUBE CHANNELS DATASET.csv


In [2]:
df=pd.read_csv('/kaggle/input/top-100-youtube-channels-in-2024/YOUTUBE CHANNELS DATASET.csv')

In [3]:
df.head()

Unnamed: 0,Ranking,Username,Subscribers,Uploads,Views,Country
0,1.0,MrBeast,336M,838,66853633536,US
1,2.0,T-Series,281M,22313,277242795553,IN
2,3.0,Cocomelon-Nursery Rhymes,186M,1370,194361752276,US
3,4.0,Youtube Movies,185M,0,0,
4,5.0,Set India,180M,148727,172709029653,IN


In [4]:
# Ensure numeric columns are properly formatted
df['Subscribers'] = df['Subscribers'].replace({'M': 'e6', 'B': 'e9'}, regex=True).replace({',': ''}, regex=True).astype(float)
df['Views'] = df['Views'].replace({',': ''}, regex=True).astype(float)
df['Uploads'] = df['Uploads'].replace({',': ''}, regex=True).astype(float)

# Select only numeric columns for correlation
numeric_df = df.select_dtypes(include=['float64', 'int64'])

In [5]:
df

  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


Unnamed: 0,Ranking,Username,Subscribers,Uploads,Views,Country
0,1.0,MrBeast,336000000.0,838.0,6.685363e+10,US
1,2.0,T-Series,281000000.0,22313.0,2.772428e+11,IN
2,3.0,Cocomelon-Nursery Rhymes,186000000.0,1370.0,1.943618e+11,US
3,4.0,Youtube Movies,185000000.0,0.0,0.000000e+00,
4,5.0,Set India,180000000.0,148727.0,1.727090e+11,IN
...,...,...,...,...,...,...
96,97.0,Voce Sabia,46000000.0,1714.0,8.180068e+09,BR
97,98.0,Katy Perry,45600000.0,170.0,2.761685e+10,US
98,99.0,Speed Records,45500000.0,11880.0,3.058744e+10,IN
99,100.0,Zhong,45500000.0,1861.0,1.799816e+10,US


# Top 5 Channels by Subscribers


In [6]:
import plotly.express as px

# Sort by Subscribers
df_sorted = df.sort_values(by='Subscribers', ascending=False).head(5)

# Bar plot
fig = px.bar(df_sorted, x='Username', y='Subscribers', title='Top 5 Channels by Subscribers')
fig.show(renderer='iframe_connected')

### Key Insights:
**MrBeast** leads with **336 million** subscribers, followed closely by **T-Series** with **281 million** subscribers. **Cocomelon - Nursery Rhymes** holds the third position with **186 million** subscribers, while **YouTube Movies** and **Set India** round out the top five with **185 million** and **180 million** subscribers, respectively. This data highlights the dominance of entertainment and music-focused channels, with MrBeast and T-Series being the clear frontrunners in the global YouTube landscape.

# Distribution of Channels by Country


In [7]:
# Group by Country and count channels
country_counts = df['Country'].value_counts().reset_index()
country_counts.columns = ['Country', 'Count']

# Bar plot
fig = px.bar(country_counts, x='Country', y='Count', title='Number of Channels by Country')
fig.show(renderer='iframe_connected')

### Key Insights:
**India (IN)** has the highest number of channels with **28**, closely followed by the **United States (US)** with **27** channels. **Brazil (BR)** ranks third with **5** channels, while **South Korea (KR)** and **Mexico (MX)** each have **4** channels. This indicates a strong presence of YouTube content creators in India and the United States, with other countries like Brazil, South Korea, and Mexico also contributing significantly to the platform's diversity.

# Views vs Subscribers Scatter Plot


In [8]:
# Scatter plot
fig = px.scatter(df, x='Subscribers', y='Views', color='Country', title='Views vs Subscribers')
fig.show(renderer='iframe_connected')

### Key Insights:
Channels from the **United States (US)** and **India (IN)** dominate in terms of subscribers and views, with several channels in the US having subscriber counts around **45-46 million** and views in the range of **8-30 billion**. Channels from **Brazil (BR)** and the **United Arab Emirates (AE)** also show significant activity, with subscriber counts ranging from **45-69 million** and views spanning **8-64 billion**. This highlights the global reach of YouTube, with content creators from diverse countries contributing to the platform's massive viewership and subscriber base.

# Average Views per Upload


In [9]:
# Convert Views and Uploads to numeric (remove commas and convert to integers)
df['Views'] = df['Views'].replace({',': ''}, regex=True).astype(float)
df['Uploads'] = df['Uploads'].replace({',': ''}, regex=True).astype(float)

# Calculate average views per upload
df['Avg_Views_Per_Upload'] = df['Views'] / df['Uploads']

# Sort the DataFrame by 'Avg_Views_Per_Upload' in descending order
df_sorted = df.sort_values('Avg_Views_Per_Upload', ascending=False)

# Bar plot
fig = px.bar(df_sorted, x='Username', y='Avg_Views_Per_Upload', title='Average Views per Upload (Descending Order)')
fig.show(renderer='iframe_connected')

### Key Insights:
The top 10 YouTube channels by **average views per upload** are led by **Bad Bunny** (236.9M), followed by **EminemMusic** (164.6M) and **Katy Perry** (162.5M). **Cocomelon-Nursery Rhymes** has the highest total views (194.4B) and subscribers (186M), ranking fourth in average views (141.9M). Other top performers include **Justin Bieber**, **Taylor Swift**, and **Ariana Grande**, with averages between 128.9M and 135.1M. Children's channels like **Vlad & Niki** and **Like Nastya** also show high engagement, with averages above 117.8M. Most top creators are from the ith averages above 117.8M. Most top creators are from the **United States (US)**.

# Subscribers vs Uploads

In [10]:

fig= px.scatter(df, x='Uploads', y='Subscribers', color='Country', title='Subscribers vs Uploads')
fig.show(renderer='iframe_connected')

# Views Distribution by Country

In [11]:
# Group by Country and sum views
country_views = df.groupby('Country')['Views'].sum().reset_index()
country_views_sorted = country_views.sort_values('Views', ascending=False)

# Bar plot
fig = px.bar(country_views_sorted, x='Country', y='Views', title='Total Views by Country (Descending Order)')
fig.show(renderer='iframe_connected')

### Key Insights:
 **India (IN)** has the highest total views at **1.57 trillion**, followed by the **United States (US)** with **1.22 trillion** views. South Korea (**KR**) ranks third with **155.7 billion** views, while Brazil (**BR**) and Pakistan (**PK**) follow closely with **125.3 billion** and **123.5 billion** views, respectively. Other countries in the top 10 include the United Arab Emirates (**AE**) with **92.5 billion** views, Argentina (**AR**) with **84.3 billion** views, Russia (**RU**) with **77.9 billion** views, Mexico (**MX**) with **64.2 billion** views, and Japan (**JP**) with **63.9 billion** views. This highlights the significant viewership dominance of **India** and the **United States**, with other countries contributing substantially but at a smaller scale.

# Correlation Heatmap

In [12]:
import plotly.graph_objects as go
# Calculate correlation matrix
corr_matrix = numeric_df.corr()

# Heatmap
fig = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.columns,
    colorscale='Viridis',
    text=corr_matrix.round(2).values,  # Add correlation values as text
    hoverinfo='text'  # Show text on hover
))
fig.update_layout(
    title='Correlation Heatmap',
    xaxis_title='Features',
    yaxis_title='Features'
)
fig.show(renderer='iframe_connected')

### Significant Correlations:
1. **Ranking vs Subscribers**: Strong negative correlation (`-0.69`). Higher-ranked channels (lower numerical rank) tend to have fewer subscribers (obviously).
2. **Subscribers vs Views**: Moderate positive correlation (`0.64`). Channels with more subscribers tend to have more views.
3. **Uploads vs Avg_Views_Per_Upload**: Weak negative correlation (`-0.25`). Channels with more uploads tend to have slightly lower average views per upload.

### Key Insight:
- **Subscriber count** is strongly tied to **ranking** and **views**, while **upload frequency** has a minor impact on **average views per upload**.

# Views Distribution by Channel (Box Plot)

In [13]:
# Calculate median views by country and sort in descending order
median_views_by_country = df.groupby('Country')['Views'].median().sort_values(ascending=False)

# Sort the DataFrame by median views for the box plot
df_sorted = df.copy()
df_sorted['Country'] = pd.Categorical(df_sorted['Country'], categories=median_views_by_country.index, ordered=True)
df_sorted = df_sorted.sort_values('Country')

# Box plot
fig = px.box(df_sorted, x='Country', y='Views', title='Views Distribution by Country (Sorted by Median Views)')
fig.show(renderer='iframe_connected')

### Key Insights:
- **Pakistan (PK)** has the highest median and mean views (**61.8 billion**), followed by the **Philippines (PH)** with **56.3 billion** views.
- **India (IN)** ranks third in mean views (**55.9 billion**) but fifth in median views (**38.4 billion**), indicating a wide distribution of viewership.
- The **United States (US)** has a high mean view count (**45.0 billion**) but does not appear in the top 5 for median views, suggesting significant variability in viewership.
- Countries like **Argentina (AR)**, **Russia (RU)**, and **South Korea (KR)** show consistent performance in both median and mean views, with values ranging between **31.9 billion** and **42.1 billion**.

# Interactive Scatter Plot with Hover Data

In [14]:
# Scatter plot with hover data
fig = px.scatter(df, x='Subscribers', y='Views', color='Country', hover_data=['Username', 'Uploads'], title='Subscribers vs Views with Hover Data')
fig.show(renderer='iframe_connected')

# Cumulative Views by Country (Pie Chart)


In [15]:

country_views = df.groupby('Country')['Views'].sum().reset_index()

# Pie chart
fig = px.pie(country_views, values='Views', names='Country', title='Proportion of Total Views by Country')
fig.show(renderer='iframe_connected')

### Key Insights:
**India** and the **United States** collectively account for **~70%** of the total views, highlighting their dominance in global YouTube viewership. Other countries contribute smaller but meaningful shares, reflecting YouTube's widespread but uneven global reach.

# Top 10 Channels by Views

In [16]:
# Sort by Views
df_sorted = df.sort_values(by='Views', ascending=False).head(10)

# Bar plot
fig = px.bar(df_sorted, x='Username', y='Views', title='Top 10 Channels by Views')
fig.show(renderer='iframe_connected')

### Key Insights:
Diverse genres like entertainment, music, and kids' content dominate, with **HAR PAL GEO** and **El Reino Infantil** leading in viewership.