# Exploratory Data Analysis
___
Since **01/02/2021** the script `../yt_trends_mining/trends.py` uses the [YouTube Data API](https://developers.google.com/youtube/v3) to store various video and channel information as a CSV file for all recent videos on a country's [YouTube Trends](https://www.youtube.com/feed/trending) page in a [Google Cloud Storage](https://cloud.google.com/storage) bucket. The script runs once a day at **17:00 UTC** and collects the information for the following countries: ["US", "BR", "RU", "JP", "IN", "GB", "DE", "FR", "CA", "AU"] (ISO 3166). The `20210201_US.csv` shows the data structure of the CSV file.

In [1]:
# Import libraries
import os
import pandas as pd
import plotly.express as px
from emoji import EMOJI_DATA
from utils import category_dict, extract_emojis, get_data_from_gcs, get_seconds, print_size

# Set Pandas options
pd.set_option("display.float_format", lambda x: "%.3f" % x)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option('display.max_colwidth', 100)

# Add Goggle Cloud Credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "../keys/credentials.json"

# List of country codes
countries = ["US", "BR", "RU", "JP", "IN", "GB", "DE", "FR", "CA", "AU"]

# Dataframe & dictionary to compare the different countries
ANALYSIS = pd.DataFrame({"countries": countries})
COUNTRIES = {}

# Want to show plots?
show_plot = False

First, all CSVs for a country are downloaded and stored in a subfolder under `../data/subfolder`. If data has already been downloaded and only a few days are missing then only the missing CSV will be downloaded. Afterwards all CSV are read into Pandas DataFrames.

In [2]:
# Download all country CSVs from GCS & print the subfolder size
for country in countries:
    get_data_from_gcs(country)
    print_size(f"../data/{country}/")

All blobs already downloaded.
Size of all csvs in ../data/US/: 61.080156 MB
All blobs already downloaded.
Size of all csvs in ../data/BR/: 53.524605 MB
All blobs already downloaded.
Size of all csvs in ../data/RU/: 84.734388 MB
All blobs already downloaded.
Size of all csvs in ../data/JP/: 81.669064 MB
All blobs already downloaded.
Size of all csvs in ../data/IN/: 62.223617 MB
All blobs already downloaded.
Size of all csvs in ../data/GB/: 59.414285 MB
All blobs already downloaded.
Size of all csvs in ../data/DE/: 68.76828 MB
All blobs already downloaded.
Size of all csvs in ../data/FR/: 56.25469 MB
All blobs already downloaded.
Size of all csvs in ../data/CA/: 60.948984 MB
All blobs already downloaded.
Size of all csvs in ../data/AU/: 58.777264 MB


In [3]:
# Concatenate all country CSVs in one Dataframe & wrangle it
for country in countries:
    df = pd.concat([pd.read_csv(f"../data/{country}/{file}") for file in os.listdir(f"../data/{country}") if file.endswith(".csv")])
    df = df.assign(categoryNames=lambda x: x["categoryIds"].map(category_dict))
    df["today"] = pd.to_datetime(df["today"]).dt.normalize()
    df["publishedAt"] = pd.to_datetime(df["publishedAt"]).dt.normalize()
    df["duration_in_s"] = df["durations"].apply(get_seconds)

    if "Unnamed: 0" in df.columns:
        df.drop(columns=["Unnamed: 0"], inplace=True)
    
    COUNTRIES[country] = df

# Basic informations
international = pd.concat(COUNTRIES.values(), ignore_index=True)
min_date = international["today"].min().strftime("%Y-%m-%d")
max_date = international["today"].max().strftime("%Y-%m-%d")
print(f"The data set covers the period from {min_date} to {max_date} and consists of {international.shape[0]} rows and {international.shape[1]} columns.")

The data set covers the period from 2021-02-01 to 2023-02-22 and consists of 373964 rows and 26 columns.


In [4]:
# First 3 rows of the "US" dataframe
COUNTRIES["US"].head(3)

Unnamed: 0,sequence,today,videoIds,publishedAt,titles,channelIds,channelTitles,thumbnails,categoryIds,descriptions,tags,views,likes,dislikes,comments,durations,regionRestrictions,relevantTopicIds,channelPublishedAt,channelCountry,channelViewCount,channelSubscriberCount,channelVideoCount,channelTopicIds,categoryNames,duration_in_s
0,1,2022-05-06 00:00:00+00:00,p38WgakuYDo,2022-05-06 00:00:00+00:00,Bad Bunny - Moscow Mule (Official Video) | Un Verano Sin Ti,UCmBA_wu8xGg1OfOkfW13Q0Q,Bad Bunny,https://i.ytimg.com/vi/p38WgakuYDo/default.jpg,10,Bad Bunny - Moscow Mule (Official Video) | Un Verano Sin Ti\nUn Verano Sin Ti https://www.unvera...,"['Bad', 'Bunny', 'Bad Bunny', 'YHLQMDLG', 'EUTDM', 'UnVeranoSinTi', 'Un Verano Sin Ti', 'UVST', ...",8231777.0,1169179.0,,43117.0,PT4M13S,,,2014-06-09T02:50:42Z,PR,19573859774,39000000.0,120,"['/m/0glt670', '/m/04rlf', '/m/0g293']",Music,253
1,2,2022-05-06 00:00:00+00:00,0NzlbBmkM3E,2022-05-05 00:00:00+00:00,A$AP Rocky - D.M.B. (Official Video),UCHE7rAi1Fw1CBmQXFtvJmrw,LIVELOVEASAPVEVO,https://i.ytimg.com/vi/0NzlbBmkM3E/default.jpg,10,A$AP ROCKY’S “D.M.B.”: https://AsapRocky.lnk.to/DMB\n\nOFFICIAL VIDEO FOR ”D.M.B.” BY A$AP ROCKY...,"['ASAP Rocky', 'rihanna', 'asap rocky music', 'asap rocky new music', 'rih rih', 'fenty', 'rocky...",3366590.0,341304.0,,16167.0,PT4M53S,{'blocked': ['RU']},,2011-12-07T23:05:46Z,,1736059684,1440000.0,80,"['/m/04rlf', '/m/0glt670']",Music,293
2,3,2022-05-06 00:00:00+00:00,Wg86eQkdudI,2022-05-05 00:00:00+00:00,House of the Dragon | Official Teaser Trailer | HBO Max,UCx-KWLTKlB83hDI6UKECtJQ,HBO Max,https://i.ytimg.com/vi/Wg86eQkdudI/default.jpg,1,History does not remember blood. It remembers names. August 21. #HouseoftheDragon\n\nAbout HBO M...,"['hbo', 'hbo max', 'hbo max movie', 'hbo max series', 'hbo youtube', 'hbo max youtube', 'hbo max...",5057320.0,87868.0,,6410.0,PT1M44S,,,2019-07-03T14:47:23Z,US,601805114,1380000.0,1186,"['/m/02jjt', '/m/02vxn', '/m/0f2f9']",Film & Animation,104


The dataframes still contains the raw data. For example, a video may still currently be included multiple times if it was on the trending page for several days. Therefore, it is necessary that the data is further aggregated for each specific analysis.

<br>

## 1. Number of videos per day in each category
___

In [5]:
top3_categories = {}
for country in countries:
    df = COUNTRIES[country]

    # Want to see the plots?
    if show_plot:
        px.bar(
            df.groupby(["today", "categoryNames"])
            .agg(count=("categoryNames", "count"))
            .reset_index(),
            x="today",
            y="count",
            color="categoryNames",
            labels={"count": "Number of videos", "today": "Date", "categoryNames": "Video category"},
            title=f"{country}: Number of videos per category for each day",
            color_discrete_map={
                "Film & Animation": "brown",
                "Autos & Vehicles": "lightblue",
                "Music": "green",
                "Pets & Animals": "black",
                "Sports": "grey",
                "Travel & Events": "yellow",
                "Gaming": "magenta",
                "People & Blogs": "orange",
                "Comedy": "purple",
                "Entertainment": "blue",
                "News & Politics": "red",
                "Howto & Style": "pink",
                "Education": "olive",
                "Science & Technology": "cyan",
                "Nonprofits & Activism": "lightgreen"
            }
        ).update_layout(height=600, width=1200).show()

        fig = px.bar(
            df["categoryNames"].value_counts().sort_values(ascending=True).reset_index(),
            x="categoryNames",
            y="index",
            orientation="h",
            title=f"{country}: Number of videos per category",
            labels={
                "categoryNames": "Number of videos",
                "index": "Category"
            }
        ).update_layout(height=400, width=1200).show()

    # What are the top 3 video categories?
    top3 = (
        df["categoryNames"]
        .value_counts()
        .sort_values(ascending=False)
        .reset_index()
        .assign(percentage=lambda x: (x["categoryNames"] / x["categoryNames"].sum()) * 100)
        .head(3)
        .drop(columns=["categoryNames"])
        .set_index("index")
        .to_dict(orient="index")
    )
    top3 = {k: round(v["percentage"], 1) for k, v in top3.items()}
    top3_categories.update({country: top3})

# Store & show results
ANALYSIS = ANALYSIS.assign(top3_categories=lambda x: x["countries"].map(top3_categories))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}"
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}"
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}"
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}"
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}"
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}"
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}"
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}"
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}"
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}"


<br>

## 2. Country of origin of the main content creators
___

In [6]:
origin_share = {}
for country in countries:
    df = COUNTRIES[country]

    # Group data
    origin = (
        df["channelCountry"]
        .value_counts()
        .sort_values(ascending=False)
        .reset_index()
        .assign(percentage=lambda x: (x["channelCountry"] / x["channelCountry"].sum()) * 100)
    )

    # Want to see the plot?
    if show_plot:
        origin["index"] = origin["index"].where(origin["index"].isin(origin["index"].head(5)), "Other")
        fig = (
            px.pie(
                origin,
                values="percentage",
                names="index",
                title=f"{country}: Origin countries of the content creator")
            .update_layout(height=500, width=500)
            .update_traces(hole=.4, textposition='inside', textinfo='percent+label')
            .update_layout(annotations=[dict(text=country, x=0.50, y=0.5, font_size=25, showarrow=False)])
            .show()
        )

    # What is the percentage of videos from the country of origin?
    origin_country = origin[origin["index"] == country]
    origin_share.update({country: round(origin_country["percentage"].values[0], 1)})

# Store & show results
ANALYSIS = ANALYSIS.assign(origin_share=lambda x: x["countries"].map(origin_share))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories,origin_share
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}",76.7
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}",91.8
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}",67.7
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}",95.1
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}",92.2
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}",41.4
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}",71.9
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}",86.9
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}",7.0
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}",15.9


<br>

## 3. Analysis of the video titles
___

In [7]:
title_length = {}
title_upper = {}
emoji_share = {}
emoji_top3 = {}
for country in countries:
    df = COUNTRIES[country]

    # Group data
    data = (
        df.groupby(["channelTitles", "titles", "publishedAt", "categoryNames"])
            .agg(todays_count=("today", "count"))
            .sort_values("todays_count", ascending=False)
            .reset_index()
    )

    # How long is the average title?
    data["title_length"] = data["titles"].apply(lambda x: len(x))
    title_length.update({country: round(data["title_length"].mean(), 1)})

    # What percentage of video titles have capital letters?
    data["title_upper"] = data["titles"].apply(lambda x: sum(1 for c in x if c.isupper()))
    data["title_upper"] = data["title_upper"] / data["title_length"]
    title_upper.update({country: round(data["title_upper"].mean() * 100, 1)})

    # What percentage of videos have emojis in the title?
    data["emoji_in_title"] = data["titles"].apply(lambda x: sum(1 for c in x if c in EMOJI_DATA))
    emoji_share.update({country: round((data[data["emoji_in_title"] >= 1].shape[0] / data.shape[0]) * 100, 1)})

    # What are the 10 most popular emojis?
    data["emojis"] = data["titles"].apply(extract_emojis)
    emojis = data[data["emoji_in_title"] >= 1].copy()
    emojis = emojis.assign(emojis=lambda x: x["emojis"].str.split("")).explode("emojis")
    emojis = emojis["emojis"].value_counts()[1:].head(3).to_dict()
    emoji_top3.update({country: emojis})

# Store & show results
ANALYSIS = ANALYSIS.assign(title_length=lambda x: x["countries"].map(title_length))
ANALYSIS = ANALYSIS.assign(title_upper=lambda x: x["countries"].map(title_upper))
ANALYSIS = ANALYSIS.assign(emoji_share=lambda x: x["countries"].map(emoji_share))
ANALYSIS = ANALYSIS.assign(emoji_top3=lambda x: x["countries"].map(emoji_top3))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories,origin_share,title_length,title_upper,emoji_share,emoji_top3
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}",76.7,50.0,22.6,4.0,"{'😂': 84, '😳': 64, '😱': 51}"
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}",91.8,54.9,45.0,9.0,"{'😱': 255, '😂': 225, '🔥': 156}"
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}",67.7,57.9,30.7,8.6,"{'😱': 491, '⚡': 321, '🔥': 249}"
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}",95.1,37.8,7.9,3.9,"{'‼': 121, '⁉': 59, '🔥': 57}"
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}",92.2,68.4,16.5,26.2,"{'😂': 1305, '❤': 608, '🤣': 571}"
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}",41.4,52.4,23.1,5.6,"{'🔥': 103, '😂': 100, '🤯': 79}"
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}",71.9,55.7,22.7,15.7,"{'🔥': 471, '😂': 444, '😱': 437}"
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}",86.9,51.8,29.9,11.0,"{'🔥': 153, '😱': 151, '😂': 148}"
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}",7.0,50.5,22.4,4.6,"{'😂': 117, '😳': 72, '🔥': 57}"
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}",15.9,49.9,22.8,5.0,"{'😂': 95, '🔥': 66, '😳': 57}"


In [8]:
# What are the 10 most popular international emojis?
international = pd.concat(COUNTRIES.values(), ignore_index=True)
international["emojis"] = international["titles"].apply(extract_emojis)
international = international.assign(emojis=lambda x: x["emojis"].str.split("")).explode("emojis")
top10_emojis = international["emojis"].value_counts()[1:].head(10).to_dict()

# Want to see the plot?
if show_plot:
    fig = px.bar(
        pd.DataFrame.from_dict(top10_emojis, orient="index").reset_index(),
        x="index",
        y=0,
        orientation="v",
        title="10 most popular international emojis",
        labels={
            "index": "Emoji",
            0: "Number of videos"
        }
    ).update_layout(height=500, width=800).show()

top10_emojis

{'😂': 5038,
 '🔥': 2980,
 '😱': 2922,
 '❤': 2012,
 '😍': 1843,
 '🤣': 1750,
 '😳': 1360,
 '😭': 1216,
 '🤯': 910,
 '🙏': 837}

<br>

## 4. The video with the most days in the trend
___

In [9]:
days_in_trend = {}
for country in countries:
    df = COUNTRIES[country]

    # Group data
    data = (
        df.groupby(["channelTitles", "titles", "categoryNames"])
            .agg(todays_count=("today", "count"))
            .sort_values("todays_count", ascending=False)
            .reset_index()
    )

    # Which video has the most days on the trend page?
    max_todays_count = data["todays_count"].max()
    days = data[data["todays_count"] == max_todays_count].iloc[:1]
    days = days.to_dict(orient="records")
    days = [f"({v['categoryNames']}) {v['channelTitles']}: {v['titles']} for {v['todays_count']} days" for v in days][0]
    days_in_trend.update({country: days})

# Store & show results
ANALYSIS = ANALYSIS.assign(days_in_trend=lambda x: x["countries"].map(days_in_trend))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories,origin_share,title_length,title_upper,emoji_share,emoji_top3,days_in_trend
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}",76.7,50.0,22.6,4.0,"{'😂': 84, '😳': 64, '😱': 51}",(Science & Technology) SpaceX: Starlink Mission for 24 days
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}",91.8,54.9,45.0,9.0,"{'😱': 255, '😂': 225, '🔥': 156}",(Education) YouTube: Join the BTS #PermissiontoDance Challenge only on YouTube #Shorts for 6 days
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}",67.7,57.9,30.7,8.6,"{'😱': 491, '⚡': 321, '🔥': 249}","(Nonprofits & Activism) Серебряный Дождь: Прямая трансляция радиостанции ""Серебряный Дождь"" for..."
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}",95.1,37.8,7.9,3.9,"{'‼': 121, '⁉': 59, '🔥': 57}",(People & Blogs) まあたその「ほぼメインチャンネル」: まじ緊急まあたそ&ふくれな大暴れの舞台裏をご覧あれ！！ for 7 days
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}",92.2,68.4,16.5,26.2,"{'😂': 1305, '❤': 608, '🤣': 571}",(People & Blogs) gogo 2728: An Emotional story 🥺❤️ | gogo2728 | deepesh_zo | mr_roshan | shubham...
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}",41.4,52.4,23.1,5.6,"{'🔥': 103, '😂': 100, '🤯': 79}",(Science & Technology) SpaceX: Starlink Mission for 15 days
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}",71.9,55.7,22.7,15.7,"{'🔥': 471, '😂': 444, '😱': 437}",(Science & Technology) SpaceX: Starlink Mission for 7 days
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}",86.9,51.8,29.9,11.0,"{'🔥': 153, '😱': 151, '😂': 148}",(News & Politics) Emmanuel Macron: Adresse aux Français. for 10 days
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}",7.0,50.5,22.4,4.6,"{'😂': 117, '😳': 72, '🔥': 57}",(Science & Technology) SpaceX: Starlink Mission for 16 days
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}",15.9,49.9,22.8,5.0,"{'😂': 95, '🔥': 66, '😳': 57}",(Entertainment) Beast Reacts: Extreme Try Not To Laugh Challenge! for 14 days


<br>

## 5. The video with the most views (total & country of origin)
___

In [10]:
most_views = {}
most_views_coo = {}
for country in countries:
    df = COUNTRIES[country]

    # Which video has the most views?
    data = df[df["views"] == df["views"].max()]
    data = data.to_dict(orient="records")
    data = [f"({v['categoryNames']}) {v['channelTitles']}: {v['titles']} with {v['views']} views" for v in data][0]
    most_views.update({country: data})

    # Which video from the country of origin has the most views?
    data = df[df["channelCountry"] == country]
    data = data[data["views"] == data["views"].max()]
    data = data.to_dict(orient="records")
    data = [f"({v['categoryNames']}) {v['channelTitles']}: {v['titles']} with {v['views']} views" for v in data][0]
    most_views_coo.update({country: data})

# Store & show results
ANALYSIS = ANALYSIS.assign(most_views=lambda x: x["countries"].map(most_views))
ANALYSIS = ANALYSIS.assign(most_views_coo=lambda x: x["countries"].map(most_views_coo))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories,origin_share,title_length,title_upper,emoji_share,emoji_top3,days_in_trend,most_views,most_views_coo
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}",76.7,50.0,22.6,4.0,"{'😂': 84, '😳': 64, '😱': 51}",(Science & Technology) SpaceX: Starlink Mission for 24 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189432443.0 views,"(Entertainment) MrBeast: $456,000 Squid Game In Real Life! with 99337676.0 views"
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}",91.8,54.9,45.0,9.0,"{'😱': 255, '😂': 225, '🔥': 156}",(Education) YouTube: Join the BTS #PermissiontoDance Challenge only on YouTube #Shorts for 6 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189432443.0 views,(Sports) CazéTV: JOGO COMPLETO: BRASIL X CROÁCIA | QUARTAS DE FINAL | COPA DO MUNDO 2022 with 31...
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}",67.7,57.9,30.7,8.6,"{'😱': 491, '⚡': 321, '🔥': 249}","(Nonprofits & Activism) Серебряный Дождь: Прямая трансляция радиостанции ""Серебряный Дождь"" for...",(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 165977661.0 views,(Comedy) dednahype: 😱OMG😱 BEST TEACHER with 77344758.0 views
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}",95.1,37.8,7.9,3.9,"{'‼': 121, '⁉': 59, '🔥': 57}",(People & Blogs) まあたその「ほぼメインチャンネル」: まじ緊急まあたそ&ふくれな大暴れの舞台裏をご覧あれ！！ for 7 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 208865178.0 views,(Film & Animation) アニプレックス YouTube チャンネル: TVアニメ「鬼滅の刃」遊郭編 第1弾PV 2021年放送開始 with 12286506.0 views
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}",92.2,68.4,16.5,26.2,"{'😂': 1305, '❤': 608, '🤣': 571}",(People & Blogs) gogo 2728: An Emotional story 🥺❤️ | gogo2728 | deepesh_zo | mr_roshan | shubham...,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 233425437.0 views,(Music) DM - Desi Melodies: Filhaal2 Mohabbat | Akshay Kumar Ft Nupur Sanon | Ammy Virk | BPraak...
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}",41.4,52.4,23.1,5.6,"{'🔥': 103, '😂': 100, '🤯': 79}",(Science & Technology) SpaceX: Starlink Mission for 15 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 165977740.0 views,(News & Politics) Guardian News: Watch the uncensored moment Will Smith smacks Chris Rock on sta...
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}",71.9,55.7,22.7,15.7,"{'🔥': 471, '😂': 444, '😱': 437}",(Science & Technology) SpaceX: Starlink Mission for 7 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Comedy) Tsuriki Show: Guess the sound challenge🎵 #shorts with 58814660.0 views
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}",86.9,51.8,29.9,11.0,"{'🔥': 153, '😱': 151, '😂': 148}",(News & Politics) Emmanuel Macron: Adresse aux Français. for 10 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Sports) PSG - Paris Saint-Germain: #LEOMESSI: First steps and first training at the Ooredoo Cen...
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}",7.0,50.5,22.4,4.6,"{'😂': 117, '😳': 72, '🔥': 57}",(Science & Technology) SpaceX: Starlink Mission for 16 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Comedy) Manchurek Triplets: Whipped Cream or Mustard? 🤣 #Shorts with 48668343.0 views
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}",15.9,49.9,22.8,5.0,"{'😂': 95, '🔥': 66, '😳': 57}",(Entertainment) Beast Reacts: Extreme Try Not To Laugh Challenge! for 14 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 245528669.0 views,(Sports) How Ridiculous: Javelin Balloon Pop Racing Is INTENSE!! with 161580180.0 views


<br>

## 6. Channel with the most trending videos
___

In [11]:
trendiest_channel = {}
for country in countries:
    df = COUNTRIES[country]

    # Group data
    data = (
        df.groupby(["channelTitles", "titles", "publishedAt", "durations", "thumbnails", "categoryNames"])
            .agg(
                todays=("today", list),
                todays_count=("today", "count"),
                min_views=("views", "min"),
                max_views=("views", "max"),
                min_likes=("likes", "min"),
                max_likes=("likes", "max"),
                min_comments=("comments", "min"),
                max_comments=("comments", "max"),
                channelSubscriberCount=("channelSubscriberCount", "mean"),
                duration_in_s=("duration_in_s", "mean")
            )
            .sort_values("todays_count", ascending=False)
            .reset_index()
    )

    data = data.groupby("channelTitles").agg(
        count=("channelTitles", "count"),
        min_views=("min_views", "mean"),
        durations=("duration_in_s", "mean"),
        subcribers=("channelSubscriberCount", "mean"),
        days_in_trends=("todays_count", "sum"),
        categoryNames=("categoryNames", lambda x: x.value_counts().index[0])
    ).sort_values("count", ascending=False).head(50)
    data["durations"] = data["durations"] / 60
    data["avg_days_in_trends"] = data["days_in_trends"] / data["count"]
    data = data.rename(columns={"durations": "durations_in_min"})
    data = data.reset_index()
    data = data.round(0)

    # Want to see the plot?
    if show_plot:
        fig = px.scatter(
            data,
            x="count",
            y="days_in_trends",
            color="categoryNames",
            size="subcribers",
            labels={
                "count": "Number of videos",
                "days_in_trends": "Number of days in trends",
                "categoryNames": "Video category",
                "avg_days_in_trends": "Average number of days in trends",
                "durations_in_min": "Average duration of video in minutes",
                "min_views": "Average number of views",
                "subcribers": "Average number of subscribers"
            },
            title=f"{country}: Top 100 succesful channels",
            hover_name="channelTitles",
            hover_data=["durations_in_min"],
            size_max=40,
            color_discrete_map={
                "Film & Animation": "brown",
                "Autos & Vehicles": "lightblue",
                "Music": "green",
                "Pets & Animals": "black",
                "Sports": "grey",
                "Travel & Events": "yellow",
                "Gaming": "magenta",
                "People & Blogs": "orange",
                "Comedy": "purple",
                "Entertainment": "blue",
                "News & Politics": "red",
                "Howto & Style": "pink",
                "Education": "olive",
                "Science & Technology": "cyan",
                "Nonprofits & Activism": "lightgreen"
            }
        ).update_layout(height=600, width=1200).show()

    # Which channel has the most trending videos?
    max_trending_count = data["count"].max()
    data = data[data["count"] == max_trending_count].iloc[:1]
    data = data.to_dict(orient="records")
    data = [f"({v['categoryNames']}) {v['channelTitles']}: {v['count']} videos" for v in data][0]
    trendiest_channel.update({country: data})

# Store & show results
ANALYSIS = ANALYSIS.assign(trendiest_channel=lambda x: x["countries"].map(trendiest_channel))
ANALYSIS.head(10)

Unnamed: 0,countries,top3_categories,origin_share,title_length,title_upper,emoji_share,emoji_top3,days_in_trend,most_views,most_views_coo,trendiest_channel
0,US,"{'Gaming': 21.1, 'Entertainment': 20.4, 'Music': 16.0}",76.7,50.0,22.6,4.0,"{'😂': 84, '😳': 64, '😱': 51}",(Science & Technology) SpaceX: Starlink Mission for 24 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189432443.0 views,"(Entertainment) MrBeast: $456,000 Squid Game In Real Life! with 99337676.0 views",(Sports) NFL: 188 videos
1,BR,"{'Entertainment': 24.9, 'Music': 21.3, 'Gaming': 14.9}",91.8,54.9,45.0,9.0,"{'😱': 255, '😂': 225, '🔥': 156}",(Education) YouTube: Join the BTS #PermissiontoDance Challenge only on YouTube #Shorts for 6 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189432443.0 views,(Sports) CazéTV: JOGO COMPLETO: BRASIL X CROÁCIA | QUARTAS DE FINAL | COPA DO MUNDO 2022 with 31...,(Sports) ge: 531 videos
2,RU,"{'People & Blogs': 17.9, 'Entertainment': 16.8, 'News & Politics': 13.5}",67.7,57.9,30.7,8.6,"{'😱': 491, '⚡': 321, '🔥': 249}","(Nonprofits & Activism) Серебряный Дождь: Прямая трансляция радиостанции ""Серебряный Дождь"" for...",(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 165977661.0 views,(Comedy) dednahype: 😱OMG😱 BEST TEACHER with 77344758.0 views,(Entertainment) A4: 371 videos
3,JP,"{'Entertainment': 32.8, 'Gaming': 16.2, 'Music': 13.5}",95.1,37.8,7.9,3.9,"{'‼': 121, '⁉': 59, '🔥': 57}",(People & Blogs) まあたその「ほぼメインチャンネル」: まじ緊急まあたそ&ふくれな大暴れの舞台裏をご覧あれ！！ for 7 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 208865178.0 views,(Film & Animation) アニプレックス YouTube チャンネル: TVアニメ「鬼滅の刃」遊郭編 第1弾PV 2021年放送開始 with 12286506.0 views,(Comedy) 東海オンエア: 392 videos
4,IN,"{'Entertainment': 35.6, 'People & Blogs': 19.1, 'Music': 13.7}",92.2,68.4,16.5,26.2,"{'😂': 1305, '❤': 608, '🤣': 571}",(People & Blogs) gogo 2728: An Emotional story 🥺❤️ | gogo2728 | deepesh_zo | mr_roshan | shubham...,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 233425437.0 views,(Music) DM - Desi Melodies: Filhaal2 Mohabbat | Akshay Kumar Ft Nupur Sanon | Ammy Virk | BPraak...,(Entertainment) Vijay Television: 246 videos
5,GB,"{'Entertainment': 22.1, 'Sports': 18.4, 'Gaming': 17.9}",41.4,52.4,23.1,5.6,"{'🔥': 103, '😂': 100, '🤯': 79}",(Science & Technology) SpaceX: Starlink Mission for 15 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 165977740.0 views,(News & Politics) Guardian News: Watch the uncensored moment Will Smith smacks Chris Rock on sta...,(Sports) Sky Sports Football: 476 videos
6,DE,"{'Entertainment': 26.2, 'Music': 17.5, 'Sports': 12.3}",71.9,55.7,22.7,15.7,"{'🔥': 471, '😂': 444, '😱': 437}",(Science & Technology) SpaceX: Starlink Mission for 7 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Comedy) Tsuriki Show: Guess the sound challenge🎵 #shorts with 58814660.0 views,(Autos & Vehicles) JP Performance: 165 videos
7,FR,"{'Entertainment': 25.8, 'Music': 18.5, 'Gaming': 12.1}",86.9,51.8,29.9,11.0,"{'🔥': 153, '😱': 151, '😂': 148}",(News & Politics) Emmanuel Macron: Adresse aux Français. for 10 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Sports) PSG - Paris Saint-Germain: #LEOMESSI: First steps and first training at the Ooredoo Cen...,(Sports) beIN SPORTS France: 377 videos
8,CA,"{'Entertainment': 21.7, 'Gaming': 21.0, 'Music': 12.6}",7.0,50.5,22.4,4.6,"{'😂': 117, '😳': 72, '🔥': 57}",(Science & Technology) SpaceX: Starlink Mission for 16 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 189515569.0 views,(Comedy) Manchurek Triplets: Whipped Cream or Mustard? 🤣 #Shorts with 48668343.0 views,(Sports) NBA: 123 videos
9,AU,"{'Entertainment': 22.1, 'Gaming': 21.6, 'Sports': 14.0}",15.9,49.9,22.8,5.0,"{'😂': 95, '🔥': 66, '😳': 57}",(Entertainment) Beast Reacts: Extreme Try Not To Laugh Challenge! for 14 days,(Music) HYBE LABELS: BTS (방탄소년단) 'Butter' Official MV with 245528669.0 views,(Sports) How Ridiculous: Javelin Balloon Pop Racing Is INTENSE!! with 161580180.0 views,(Sports) Optus Sport: 122 videos


<br>

## 7. Top 25 international videos and channels
___

In [12]:
# What are the 25 most popular international videos with the most trending days?
top25_videos = (
    pd.concat(COUNTRIES.values(), ignore_index=True)
    .groupby(["channelTitles", "titles", "publishedAt", "categoryNames"])
    .agg(todays_count=("today", "count"))
    .sort_values("todays_count", ascending=False)
    .reset_index()
    .drop(columns=["publishedAt"])
    .head(25)
)

top25_videos

Unnamed: 0,channelTitles,titles,categoryNames,todays_count
0,FIFA,Jung Kook from BTS performs 'Dreamers' at FIFA World Cup opening ceremony,Sports,45
1,HYBE LABELS,BTS (방탄소년단) 'Butter' Official MV,Music,45
2,BLACKPINK,BLACKPINK - ‘Pink Venom’ M/V,Music,44
3,officialpsy,PSY - 'That That (prod. & feat. SUGA of BTS)' MV,Music,42
4,HYBE LABELS,BTS (방탄소년단) 'Yet To Come (The Most Beautiful Moment)' Official MV,Music,42
5,HYBE LABELS,BTS (방탄소년단) 'Permission to Dance' Official MV,Music,42
6,NFL,"Dr. Dre, Snoop Dogg, Eminem, Mary J. Blige & Kendrick Lamar FULL Pepsi Super Bowl LVI Halftime Show",Sports,40
7,BLACKPINK,BLACKPINK - ‘Shut Down’ M/V,Music,40
8,Linkin Park,Lost [Official Music Video] - Linkin Park,Music,38
9,Guardian News,"Watch the uncensored moment Will Smith smacks Chris Rock on stage at the Oscars, drops F-bomb",News & Politics,37


In [13]:
# What are the 25 most popular international channels with the most trending days across all videos?
top25_channels = (
    pd.concat(COUNTRIES.values(), ignore_index=True)
    .groupby(["channelTitles", "titles", "publishedAt", "categoryNames"])
    .agg(todays_count=("today", "count"))
    .sort_values("todays_count", ascending=False)
    .reset_index()
    .groupby("channelTitles")
    .agg(
        count=("channelTitles", "count"),
        days_in_trends=("todays_count", "sum"),
        categoryNames=("categoryNames", lambda x: x.value_counts().index[0]))
    .sort_values("days_in_trends", ascending=False)
    .head(25)
)

top25_channels.reset_index()

Unnamed: 0,channelTitles,count,days_in_trends,categoryNames
0,HYBE LABELS,82,940,Music
1,東海オンエア,392,920,Comedy
2,MrBeast,48,874,Entertainment
3,ge,531,858,Sports
4,NFL,197,756,Sports
5,Clash of Clans,56,737,Gaming
6,SMTOWN,74,731,Music
7,JYP Entertainment,69,727,Music
8,Marvel Entertainment,60,725,Entertainment
9,Sky Sports Football,491,718,Sports
