## Deep Dive into YouTube Trending
During the Data Analytics examination, the Data Scientist at the company N asks for an in-depth analysis of YouTube Trending datasets.

This notebook will clean all the data sources and format them for further analysis.
&nbsp;
### Data Sources
- _categories.csv_ contains YouTube categories mapped by id
- _CAvideos.csv_ contains daily history of YouTube Trending Tab for the Canada region
- _GBvideos.csv_ contains daily history for YouTube Trending Tab for the United Kingdom region
- _USvideos.csv_ contains daily history of YouTube Trending Tab for the United States region

In [15]:
import pandas as pd
from pathlib import Path
from datetime import datetime

### File Locations

In [16]:
today = datetime.today()
categories = {
    "clean": Path.cwd() / "datasets" / "processed" / f"categories_{today:%b-%d-%Y}.pkl",
    "raw": Path.cwd() / "datasets" / "raw" / "categories.csv"
}
trending = {
    "main": {
        "clean": Path.cwd() / "datasets" / "processed" / f"trending_{today:%b-%d-%Y}.pkl",
    },
    "canada": {
        "raw": Path.cwd() / "datasets" / "raw" / "CAvideos.csv"
    },
    "uk": {
        "raw": Path.cwd() / "datasets" / "raw" / "GBvideos.csv"
    },
    "us": {
        "raw": Path.cwd() / "datasets" / "raw" / "USvideos.csv"
    }
}

### Categories | Read

In [17]:
df = pd.read_csv(categories["raw"], sep=';', on_bad_lines='skip')

### Categories | Column Cleanup
* Remove all surrounding spaces
* Replace all inner spaces with underscores

In [18]:
clean_columns = lambda columns: [col.strip().replace(" ", "_") for col in columns]
df.columns = clean_columns(df.columns)
df.head(10)

Unnamed: 0,category_id,category_id_lvl_0,category_id_lvl_1,category_id_lvl_2,category_title_lvl_0,category_title_lvl_1,category_title_lvl_2
0,101,100,101.0,,Politics & Nonprofits,Nonprofits & Activism,
1,102,100,102.0,,Politics & Nonprofits,News & Politics,
2,201,200,201.0,,Cinema,Film & Animation,
3,202,200,202.0,,Cinema,Trailers,
4,203,200,203.0,,Cinema,Full Movies,
5,204,200,204.0,,Cinema,TV Series & Shows,
6,301,300,301.0,,Education & Science,Science & Technology,
7,302,300,302.0,,Education & Science,Education,
8,401,400,401.0,,DIY & Beauty,Other Beauty,
9,402,400,402.0,,DIY & Beauty,DIY,


 ### Categories | Column Transformation
* Rename category_id_lvl_0 to root_id
* Rename category_id_lvl_1, to parent_id
* Rename category_id_lvl_2 to category_id
* Rename category_title_lvl_0 to root
* Rename category_title_lvl_1 to parent
* Rename category_title_lvl_2 to category
* Initialize missed parameters
* Fill the gaps in transformed data
* Rearrange columns

In [19]:
df = df.drop(["category_id"], axis = 1)

df.rename(columns = {
    "category_id_lvl_0": "root_id",
    "category_id_lvl_1": "parent_id",
    "category_id_lvl_2": "category_id",
    "category_title_lvl_0": "root",
    "category_title_lvl_1": "parent",
    "category_title_lvl_2": "category"
}, inplace = True)

for i, row in df.iterrows():
    if not row["root_id"] in df["category_id"]:
        df = df.append({
            "category_id": row["root_id"],
            "parent_id": row["root_id"],
            "root_id": row["root_id"],
            "category": row["root"],
            "parent": row["root"],
            "root": row["root"],
        }, ignore_index=True)

for i, row in df.iterrows():
    if pd.isna(row["parent_id"]):
        df.at[i, "parent_id"] = row["root_id"]
    if pd.isna(row["parent"]):
        df.at[i, "parent"] = row["root"]

for i, row in df.iterrows():
    if not row["parent_id"] in df["category_id"]:
        df = df.append({
            "category_id": row["parent_id"],
            "parent_id": row["parent_id"],
            "root_id": row["root_id"],
            "category": row["parent"],
            "parent": row["parent"],
            "root": row["root"],
        }, ignore_index=True)

for i, row in df.iterrows():
    if pd.isna(row["category_id"]):
        df.at[i, "category_id"] = row["parent_id"]
    if pd.isna(row["category"]):
        df.at[i, "category"] = row["parent"]

df = df.reindex(columns=["category_id", "parent_id", "root_id", "category", "parent", "root"])

df.head(10)

Unnamed: 0,category_id,parent_id,root_id,category,parent,root
0,101.0,101.0,100,Nonprofits & Activism,Nonprofits & Activism,Politics & Nonprofits
1,102.0,102.0,100,News & Politics,News & Politics,Politics & Nonprofits
2,201.0,201.0,200,Film & Animation,Film & Animation,Cinema
3,202.0,202.0,200,Trailers,Trailers,Cinema
4,203.0,203.0,200,Full Movies,Full Movies,Cinema
5,204.0,204.0,200,TV Series & Shows,TV Series & Shows,Cinema
6,301.0,301.0,300,Science & Technology,Science & Technology,Education & Science
7,302.0,302.0,300,Education,Education,Education & Science
8,401.0,401.0,400,Other Beauty,Other Beauty,DIY & Beauty
9,402.0,402.0,400,DIY,DIY,DIY & Beauty


### Categories | Data Clean Up
* Remove duplicates
* Clean numeric data types
* Sort data

In [20]:
df = df.drop_duplicates(subset=None, keep="first", inplace=False)

df[["category_id", "parent_id", "root_id"]] = df[["category_id", "parent_id", "root_id"]].fillna(value=0)

df["category_id"] = df["category_id"].astype("int32")
df["parent_id"] = df["parent_id"].astype("int32")
df["root_id"] = df["root_id"].astype("int32")

df = df.sort_values("category_id")

df.head(10)

Unnamed: 0,category_id,parent_id,root_id,category,parent,root
0,101,101,100,Nonprofits & Activism,Nonprofits & Activism,Politics & Nonprofits
1,102,102,100,News & Politics,News & Politics,Politics & Nonprofits
163,200,200,200,Cinema,Cinema,Cinema
2,201,201,200,Film & Animation,Film & Animation,Cinema
3,202,202,200,Trailers,Trailers,Cinema
4,203,203,200,Full Movies,Full Movies,Cinema
5,204,204,200,TV Series & Shows,TV Series & Shows,Cinema
167,300,300,300,Education & Science,Education & Science,Education & Science
6,301,301,300,Science & Technology,Science & Technology,Education & Science
7,302,302,300,Education,Education,Education & Science


### Categories | Data Overview

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 169 entries, 0 to 162
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   category_id  169 non-null    int32 
 1   parent_id    169 non-null    int32 
 2   root_id      169 non-null    int32 
 3   category     169 non-null    object
 4   parent       169 non-null    object
 5   root         169 non-null    object
dtypes: int32(3), object(3)
memory usage: 7.3+ KB


### Categories | Save Output

In [22]:
df.to_pickle(categories["clean"])

### Trending | Read + Concatenation
* Read three datasets
* Concatenate these

In [23]:
df_trending_ca = pd.read_csv(trending["canada"]["raw"], sep=';', on_bad_lines='skip')
df_trending_uk = pd.read_csv(trending["canada"]["raw"], sep=';', on_bad_lines='skip')
df_trending_us = pd.read_csv(trending["canada"]["raw"], sep=';', on_bad_lines='skip')

df = pd.concat([df_trending_ca, df_trending_uk, df_trending_us])

### Trending | Column Cleanup
* Remove all surrounding spaces
* Replace all inner spaces with underscores

In [24]:
df.columns = clean_columns(df.columns)
df.head(10)

Unnamed: 0,video_id,trending_date,title,channel_title,publish_time,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description,country,tags,category_id
0,n1WpP7iowLc,17.14.11,Eminem - Walk On Water (Audio) ft. Beyoncé,EminemVEVO,2017-11-10T17:00:03.000Z,17158579.0,787425.0,43420.0,125882.0,https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg,False,False,False,Eminem's new track Walk on Water ft. Beyoncé i...,CA,",'eminem','walk','on','water','aftermath/shady...",600.0
1,0dBIkQ4Mz1M,17.14.11,PLUSH - Bad Unboxing Fan Mail,iDubbbzTV,2017-11-13T17:00:00.000Z,1014651.0,127794.0,1688.0,13030.0,https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg,False,False,False,STill got a lot of packages. Probably will las...,CA,",'plush','bad unboxing','unboxing','fan mail',...",805.0
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,2017-11-12T19:05:24.000Z,3191434.0,146035.0,5339.0,8181.0,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...,CA,",'racist superman','rudy','mancuso','king','ba...",805.0
3,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,2017-11-12T18:01:41.000Z,2095828.0,132239.0,1989.0,17518.0,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...,CA,",'ryan','higa','higatv','nigahiga','i dare you...",806.0
4,2Vv-BfVoq4g,17.14.11,Ed Sheeran - Perfect (Official Music Video),Ed Sheeran,2017-11-09T11:04:14.000Z,33523622.0,1634130.0,21082.0,85067.0,https://i.ytimg.com/vi/2Vv-BfVoq4g/default.jpg,False,False,False,🎧: https://ad.gt/yt-perfect\n💰: https://atlant...,CA,",'edsheeran','ed sheeran','acoustic','live','c...",600.0
5,0yIWz1XEeyc,17.14.11,Jake Paul Says Alissa Violet CHEATED with LOGA...,DramaAlert,2017-11-13T07:37:51.000Z,1309699.0,103755.0,4613.0,12143.0,https://i.ytimg.com/vi/0yIWz1XEeyc/default.jpg,False,False,False,► Follow for News! - https://twitter.com/KEEMS...,CA,",'#dramaalert','drama','alert','dramaalert','k...",102.0
6,_uM5kFfkhB8,17.14.11,Vanoss Superhero School - New Students,VanossGaming,2017-11-12T23:52:13.000Z,2987945.0,187464.0,9850.0,26629.0,https://i.ytimg.com/vi/_uM5kFfkhB8/default.jpg,False,False,False,Vanoss Merch Shop: https://vanoss.3blackdot.co...,CA,",'funny moments','montage video games','gaming...",805.0
7,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,2017-11-13T17:13:01.000Z,748374.0,57534.0,2967.0,15959.0,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...,CA,",'shantell martin'",804.0
8,JzCsM1vtn78,17.14.11,THE LOGANG MADE HISTORY. LOL. AGAIN.,Logan Paul Vlogs,2017-11-12T20:19:24.000Z,4477587.0,292837.0,4123.0,36391.0,https://i.ytimg.com/vi/JzCsM1vtn78/default.jpg,False,False,False,Join the movement. Be a Maverick ► https://Sho...,CA,",'logan paul vlog','logan paul','logan','paul'...",806.0
9,43sm-QwLcx4,17.14.11,Finally Sheldon is winning an argument about t...,Sheikh Musa,2017-11-10T14:10:46.000Z,505161.0,4135.0,976.0,1484.0,https://i.ytimg.com/vi/43sm-QwLcx4/default.jpg,False,False,False,Sheldon is roasting pastor of the church\nyoun...,CA,",'god','sheldon cooper','young sheldon'",804.0


Unnamed: 0,video_id,trending_date,title,channel_title,publish_time,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description,country,tags,category_id
0,n1WpP7iowLc,17.14.11,Eminem - Walk On Water (Audio) ft. Beyoncé,EminemVEVO,2017-11-10T17:00:03.000Z,17158579.0,787425.0,43420.0,125882.0,https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg,False,False,False,Eminem's new track Walk on Water ft. Beyoncé i...,CA,",'eminem','walk','on','water','aftermath/shady...",600.0
1,0dBIkQ4Mz1M,17.14.11,PLUSH - Bad Unboxing Fan Mail,iDubbbzTV,2017-11-13T17:00:00.000Z,1014651.0,127794.0,1688.0,13030.0,https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg,False,False,False,STill got a lot of packages. Probably will las...,CA,",'plush','bad unboxing','unboxing','fan mail',...",805.0
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,2017-11-12T19:05:24.000Z,3191434.0,146035.0,5339.0,8181.0,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...,CA,",'racist superman','rudy','mancuso','king','ba...",805.0
3,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,2017-11-12T18:01:41.000Z,2095828.0,132239.0,1989.0,17518.0,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...,CA,",'ryan','higa','higatv','nigahiga','i dare you...",806.0
4,2Vv-BfVoq4g,17.14.11,Ed Sheeran - Perfect (Official Music Video),Ed Sheeran,2017-11-09T11:04:14.000Z,33523622.0,1634130.0,21082.0,85067.0,https://i.ytimg.com/vi/2Vv-BfVoq4g/default.jpg,False,False,False,🎧: https://ad.gt/yt-perfect\n💰: https://atlant...,CA,",'edsheeran','ed sheeran','acoustic','live','c...",600.0
5,0yIWz1XEeyc,17.14.11,Jake Paul Says Alissa Violet CHEATED with LOGA...,DramaAlert,2017-11-13T07:37:51.000Z,1309699.0,103755.0,4613.0,12143.0,https://i.ytimg.com/vi/0yIWz1XEeyc/default.jpg,False,False,False,► Follow for News! - https://twitter.com/KEEMS...,CA,",'#dramaalert','drama','alert','dramaalert','k...",102.0
6,_uM5kFfkhB8,17.14.11,Vanoss Superhero School - New Students,VanossGaming,2017-11-12T23:52:13.000Z,2987945.0,187464.0,9850.0,26629.0,https://i.ytimg.com/vi/_uM5kFfkhB8/default.jpg,False,False,False,Vanoss Merch Shop: https://vanoss.3blackdot.co...,CA,",'funny moments','montage video games','gaming...",805.0
7,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,2017-11-13T17:13:01.000Z,748374.0,57534.0,2967.0,15959.0,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...,CA,",'shantell martin'",804.0
8,JzCsM1vtn78,17.14.11,THE LOGANG MADE HISTORY. LOL. AGAIN.,Logan Paul Vlogs,2017-11-12T20:19:24.000Z,4477587.0,292837.0,4123.0,36391.0,https://i.ytimg.com/vi/JzCsM1vtn78/default.jpg,False,False,False,Join the movement. Be a Maverick ► https://Sho...,CA,",'logan paul vlog','logan paul','logan','paul'...",806.0
9,43sm-QwLcx4,17.14.11,Finally Sheldon is winning an argument about t...,Sheikh Musa,2017-11-10T14:10:46.000Z,505161.0,4135.0,976.0,1484.0,https://i.ytimg.com/vi/43sm-QwLcx4/default.jpg,False,False,False,Sheldon is roasting pastor of the church\nyoun...,CA,",'god','sheldon cooper','young sheldon'",804.0


 ### Trending | Column Transformation
* Remove trending_date, channel_title, publish_time, likes, dislikes, comment_count, thumbnail_link, comments_disabled, rating_disabled, video_error_or_removed, country
* Rearrange the columns

In [25]:
df = df.drop(["trending_date", "channel_title", "publish_time", "likes", "dislikes", "comment_count", "thumbnail_link", "comments_disabled", "ratings_disabled", "video_error_or_removed", "country"], axis = 1)

df = df.reindex(columns=["video_id", "category_id", "title", "description", "tags"])

df.head(10)

Unnamed: 0,video_id,category_id,title,description,tags
0,n1WpP7iowLc,600.0,Eminem - Walk On Water (Audio) ft. Beyoncé,Eminem's new track Walk on Water ft. Beyoncé i...,",'eminem','walk','on','water','aftermath/shady..."
1,0dBIkQ4Mz1M,805.0,PLUSH - Bad Unboxing Fan Mail,STill got a lot of packages. Probably will las...,",'plush','bad unboxing','unboxing','fan mail',..."
2,5qpjK5DgCt4,805.0,"Racist Superman | Rudy Mancuso, King Bach & Le...",WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...,",'racist superman','rudy','mancuso','king','ba..."
3,d380meD0W0M,806.0,I Dare You: GOING BALD!?,I know it's been a while since we did this sho...,",'ryan','higa','higatv','nigahiga','i dare you..."
4,2Vv-BfVoq4g,600.0,Ed Sheeran - Perfect (Official Music Video),🎧: https://ad.gt/yt-perfect\n💰: https://atlant...,",'edsheeran','ed sheeran','acoustic','live','c..."
5,0yIWz1XEeyc,102.0,Jake Paul Says Alissa Violet CHEATED with LOGA...,► Follow for News! - https://twitter.com/KEEMS...,",'#dramaalert','drama','alert','dramaalert','k..."
6,_uM5kFfkhB8,805.0,Vanoss Superhero School - New Students,Vanoss Merch Shop: https://vanoss.3blackdot.co...,",'funny moments','montage video games','gaming..."
7,2kyS6SvSYSE,804.0,WE WANT TO TALK ABOUT OUR MARRIAGE,SHANTELL'S CHANNEL - https://www.youtube.com/s...,",'shantell martin'"
8,JzCsM1vtn78,806.0,THE LOGANG MADE HISTORY. LOL. AGAIN.,Join the movement. Be a Maverick ► https://Sho...,",'logan paul vlog','logan paul','logan','paul'..."
9,43sm-QwLcx4,804.0,Finally Sheldon is winning an argument about t...,Sheldon is roasting pastor of the church\nyoun...,",'god','sheldon cooper','young sheldon'"


Unnamed: 0,video_id,category_id,title,description,tags
0,n1WpP7iowLc,600.0,Eminem - Walk On Water (Audio) ft. Beyoncé,Eminem's new track Walk on Water ft. Beyoncé i...,",'eminem','walk','on','water','aftermath/shady..."
1,0dBIkQ4Mz1M,805.0,PLUSH - Bad Unboxing Fan Mail,STill got a lot of packages. Probably will las...,",'plush','bad unboxing','unboxing','fan mail',..."
2,5qpjK5DgCt4,805.0,"Racist Superman | Rudy Mancuso, King Bach & Le...",WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...,",'racist superman','rudy','mancuso','king','ba..."
3,d380meD0W0M,806.0,I Dare You: GOING BALD!?,I know it's been a while since we did this sho...,",'ryan','higa','higatv','nigahiga','i dare you..."
4,2Vv-BfVoq4g,600.0,Ed Sheeran - Perfect (Official Music Video),🎧: https://ad.gt/yt-perfect\n💰: https://atlant...,",'edsheeran','ed sheeran','acoustic','live','c..."
5,0yIWz1XEeyc,102.0,Jake Paul Says Alissa Violet CHEATED with LOGA...,► Follow for News! - https://twitter.com/KEEMS...,",'#dramaalert','drama','alert','dramaalert','k..."
6,_uM5kFfkhB8,805.0,Vanoss Superhero School - New Students,Vanoss Merch Shop: https://vanoss.3blackdot.co...,",'funny moments','montage video games','gaming..."
7,2kyS6SvSYSE,804.0,WE WANT TO TALK ABOUT OUR MARRIAGE,SHANTELL'S CHANNEL - https://www.youtube.com/s...,",'shantell martin'"
8,JzCsM1vtn78,806.0,THE LOGANG MADE HISTORY. LOL. AGAIN.,Join the movement. Be a Maverick ► https://Sho...,",'logan paul vlog','logan paul','logan','paul'..."
9,43sm-QwLcx4,804.0,Finally Sheldon is winning an argument about t...,Sheldon is roasting pastor of the church\nyoun...,",'god','sheldon cooper','young sheldon'"


### Trending | Data Clean Up
* Remove duplicates
* Clean numeric data types
* Clean text data types
* Remove urls from description column
* Remove pronouns
* Sort data

In [26]:
df = df.drop_duplicates(subset=None, keep="first", inplace=False)

df[["video_id", "category_id"]] = df[["video_id", "category_id"]].fillna(value=0)

df["category_id"] = df["category_id"].astype("int32")

df[["title", "description", "tags"]] = df[["title", "description", "tags"]].fillna(value=" "*256)

url_pattern = r"(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"
df["description"].replace(url_pattern, '', regex=True, inplace=True)

df = df.sort_values("category_id")

df.head(10)

Unnamed: 0,video_id,category_id,title,description,tags
28433,XkGTE4WMMko,0,A BANANA PHONE?! - What the Fruit?,Current Banana phone pricing: you have NOT tr...,",'banana phone','apple','iphone','bluetooth','..."
37714,piO-68zx2UQ,0,Molly Makes Fresh Herb Falafel | From the Test...,"Once you make this falafel recipe, you'll neve...",",'falafel','falafel ball recipe','herbs','tahi..."
2456,1aig3FJwkQs,0,ART with PERMANENT MARKERS? - No Undo Possible!,"➨ Get my APP, Courses, eBooks, Brushes and mor...",",'josiah','brooks','jazza','jazzastudios','ani..."
37721,P_xXvDNZjHI,0,3 EASY SUMMER SALADS | hot for food,Make these 3 easy summer salads to get your gr...,",'veganism','healthy','vegetarian','vegan','ve..."
31897,NqygMHtn2wU,0,Fung Bros and Sean Evans Review International ...,Instant ramen is one of the world's most ubiqu...,",'first we feast','fwf','firstwefeast','food',..."
31891,T5b65lgsvsM,0,THE SIMPSONS and RICK AND MORTY - Style SWAP C...,Sub for more fun with art!\nMusic Remix credit...,",'josiah','brooks','jazza','jazzastudios','ani..."
37736,PKrD9rqjvYE,0,House of VERSACE Fashion Cake | Man About Cake...,To celebrate his love of fashion (and his obse...,",'man about cake','joshua john russell','craft..."
19006,1eXyk9ie0H0,0,HARLEY QUINN MALLET TUTORIAL (ft. Glam&Gore) |...,HEY HEY HEY HEY! Welcome back for Episode 2 of...,",'alexa poletti','alexapoletti','harley quinn'..."
31881,\nMust-See WWE videos on YouTube: https://goo....,0,",'wwe','world wrestling entertainment','wrestl...",...,...
31878,4kN-FlYp5rE,0,Ronda Rousey locks Mickie James in an armbar d...,"As Raw's main event descends into chaos, Ronda...",...


Unnamed: 0,video_id,category_id,title,description,tags
28433,XkGTE4WMMko,0,A BANANA PHONE?! - What the Fruit?,Current Banana phone pricing: you have NOT tr...,",'banana phone','apple','iphone','bluetooth','..."
37714,piO-68zx2UQ,0,Molly Makes Fresh Herb Falafel | From the Test...,"Once you make this falafel recipe, you'll neve...",",'falafel','falafel ball recipe','herbs','tahi..."
2456,1aig3FJwkQs,0,ART with PERMANENT MARKERS? - No Undo Possible!,"➨ Get my APP, Courses, eBooks, Brushes and mor...",",'josiah','brooks','jazza','jazzastudios','ani..."
37721,P_xXvDNZjHI,0,3 EASY SUMMER SALADS | hot for food,Make these 3 easy summer salads to get your gr...,",'veganism','healthy','vegetarian','vegan','ve..."
31897,NqygMHtn2wU,0,Fung Bros and Sean Evans Review International ...,Instant ramen is one of the world's most ubiqu...,",'first we feast','fwf','firstwefeast','food',..."
31891,T5b65lgsvsM,0,THE SIMPSONS and RICK AND MORTY - Style SWAP C...,Sub for more fun with art!\nMusic Remix credit...,",'josiah','brooks','jazza','jazzastudios','ani..."
37736,PKrD9rqjvYE,0,House of VERSACE Fashion Cake | Man About Cake...,To celebrate his love of fashion (and his obse...,",'man about cake','joshua john russell','craft..."
19006,1eXyk9ie0H0,0,HARLEY QUINN MALLET TUTORIAL (ft. Glam&Gore) |...,HEY HEY HEY HEY! Welcome back for Episode 2 of...,",'alexa poletti','alexapoletti','harley quinn'..."
31881,\nMust-See WWE videos on YouTube: https://goo....,0,",'wwe','world wrestling entertainment','wrestl...",...,...
31878,4kN-FlYp5rE,0,Ronda Rousey locks Mickie James in an armbar d...,"As Raw's main event descends into chaos, Ronda...",...


### Trending | Data Overview

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25064 entries, 28433 to 21619
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   video_id     25064 non-null  object
 1   category_id  25064 non-null  int32 
 2   title        25064 non-null  object
 3   description  25064 non-null  object
 4   tags         25064 non-null  object
dtypes: int32(1), object(4)
memory usage: 1.1+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 25064 entries, 28433 to 21619
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   video_id     25064 non-null  object
 1   category_id  25064 non-null  int32 
 2   title        25064 non-null  object
 3   description  25064 non-null  object
 4   tags         25064 non-null  object
dtypes: int32(1), object(4)
memory usage: 1.1+ MB


### Trending | Save Output

In [28]:
df.to_pickle(trending["main"]["clean"])