<a href="https://colab.research.google.com/github/drusho/drusho.github.io/blob/master/_notebooks/2021-07-17-webscrapping-youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Top 10 Youtube Tech Channels
> Analyzing Web Scraped Data Obtained with Selenium

- toc: false
- badges: false
- comments: true
- categories: [Selenium, Web Scrapping, Pandas]
- image: "images/thumbnails/header_youtube_web.png"

<br>

__Notebook Created by David Rusho__

[Github Blog](https://drusho.github.io) | [Github](https://github.com/drusho/webscrape_youtube) | [Tableau](https://public.tableau.com/app/profile/drusho/) | [Linkedin](https://linkedin.com/in/davidrusho) |


## About the Data

Web scraping was performed on the _Top 10 Tech Channels_ on Youtube using _[Selenium](https://selenium-python.readthedocs.io/)_ (an automated browser (driver) controlled using python, which is often used in web scraping and web testing).  The youtube channels to were scrapped were determined using a __[Top 10 Tech Youtubers](https://blog.bit.ai/top-tech-youtubers/)__ list from blog.bit.ai.  Scraping included:
* Channel names
* Number of subscribers per channel
* Data from most popular videos per channel included:
	* Video titles
	* Posting date
	* Number of views


The average number of videos per channel was around 200.  In total, the data from 2000 videos was scrapped.

#hide
## Data Cleaning

In [None]:
#hide
import pandas as pd

In [None]:
# #collapse
yt = pd.read_csv('youtube_video_scrap.csv',parse_dates=['channel_join_date'])
yt.head(2)

Unnamed: 0.1,Unnamed: 0,channel_name,subscribers,title,views,post_date,channel_join_date,channel_views,channel_description
0,0,iJustine,6.89M subscribers,Black Eyed Peas - I gotta Feeling (Parody),18M views,11 years ago,NaT,,
1,1,iJustine,6.89M subscribers,Cake Decorating Challenge with Ro | Nerdy Numm...,12M views,5 years ago,NaT,,
2,2,iJustine,6.89M subscribers,The Voice of Siri!,11M views,5 years ago,NaT,,
3,3,iJustine,6.89M subscribers,Ugliest iPhone Cases Ever?,9.4M views,3 years ago,NaT,,
4,4,iJustine,6.89M subscribers,Making a mini cake with Ro!,9.1M views,3 years ago,NaT,,


In [None]:
#hide
# create df of channel details
channel_details = yt[yt.channel_join_date.notna()]
channel_details = channel_details.drop(columns=['Unnamed: 0','subscribers','title','views','post_date']).reset_index(drop=True)
channel_details.head(2)

Unnamed: 0,channel_name,channel_join_date,channel_views,channel_description
0,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl..."
1,Android Authority,2011-04-03,"767,831,088 views","Your source for the best phones, streaming, ap..."
2,Mrwhosetheboss,2011-04-20,"1,207,247,023 views",My name is Arun Maini. I'm a 25 year old Econo...
3,Jon Rettinger,2007-06-07,"574,893,891 views","Welcome to the video home of Jon Rettinger, fo..."
4,Jonathan Morrison,2010-03-24,"430,629,457 views","High quality videos blending tech + aesthetic,..."


In [None]:
#hide
#create df video details
video_details = yt[yt.channel_join_date.isna()]
video_details = video_details.drop(columns=['Unnamed: 0','channel_join_date','channel_views','channel_description']).reset_index(drop=True)
video_details.head(2)

Unnamed: 0,channel_name,subscribers,title,views,post_date
0,iJustine,6.89M subscribers,Black Eyed Peas - I gotta Feeling (Parody),18M views,11 years ago
1,iJustine,6.89M subscribers,Cake Decorating Challenge with Ro | Nerdy Numm...,12M views,5 years ago
2,iJustine,6.89M subscribers,The Voice of Siri!,11M views,5 years ago
3,iJustine,6.89M subscribers,Ugliest iPhone Cases Ever?,9.4M views,3 years ago
4,iJustine,6.89M subscribers,Making a mini cake with Ro!,9.1M views,3 years ago


In [None]:
#hide
# merge dfs 
merged = channel_details.merge(video_details)
merged.head(2)

Unnamed: 0,channel_name,channel_join_date,channel_views,channel_description,subscribers,title,views,post_date
0,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl...",6.89M subscribers,Black Eyed Peas - I gotta Feeling (Parody),18M views,11 years ago
1,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl...",6.89M subscribers,Cake Decorating Challenge with Ro | Nerdy Numm...,12M views,5 years ago
2,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl...",6.89M subscribers,The Voice of Siri!,11M views,5 years ago
3,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl...",6.89M subscribers,Ugliest iPhone Cases Ever?,9.4M views,3 years ago
4,iJustine,2006-05-07,"1,288,833,680 views","Tech, video games, failed cooking attempts, vl...",6.89M subscribers,Making a mini cake with Ro!,9.1M views,3 years ago


In [None]:
#hide
# dtypes to int for views and subscribers
merged.subscribers = merged.subscribers.str.replace('M subscribers','000000').str.replace('.','').astype('int')
merged.views = merged.views.str.replace('M views','000000').str.replace('K views','000').str.replace('.','').str.replace('1 year ago','0').astype('int')
merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1999 entries, 0 to 1998
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   channel_name         1999 non-null   object        
 1   channel_join_date    1999 non-null   datetime64[ns]
 2   channel_views        1999 non-null   object        
 3   channel_description  1999 non-null   object        
 4   subscribers          1999 non-null   int64         
 5   title                1999 non-null   object        
 6   views                1999 non-null   int64         
 7   post_date            1997 non-null   object        
dtypes: datetime64[ns](1), int64(2), object(5)
memory usage: 140.6+ KB


In [None]:
#hide
# drop na values in post_date col
merged['post_date'] = merged['post_date'].fillna('')

In [None]:
#hide
# channel views to dtype
merged.channel_views = merged.channel_views.str.replace(',','').str.replace('views','').astype('int')
merged.head(2)

Unnamed: 0,channel_name,channel_join_date,channel_views,channel_description,subscribers,title,views,post_date
0,iJustine,2006-05-07,1288833680,"Tech, video games, failed cooking attempts, vl...",689000000,Black Eyed Peas - I gotta Feeling (Parody),18000000,11 years ago
1,iJustine,2006-05-07,1288833680,"Tech, video games, failed cooking attempts, vl...",689000000,Cake Decorating Challenge with Ro | Nerdy Numm...,12000000,5 years ago
2,iJustine,2006-05-07,1288833680,"Tech, video games, failed cooking attempts, vl...",689000000,The Voice of Siri!,11000000,5 years ago


## Data Analysis

### Dataframe Sample

In [36]:
#collapse
# Sampe of dataframe
merged.sample(3)

Unnamed: 0,channel_name,channel_join_date,channel_views,channel_description,subscribers,title,views,post_date
609,Jon Rettinger,2007-06-07,574893891,"Welcome to the video home of Jon Rettinger, fo...",159000000,Xbox 360 vs. PS3: Round 4 (CPU),2000000,11 years ago
1409,Linus Tech Tips,2008-11-24,4932592261,Tech can be complicated; we try to make it eas...,137000000,Upgrading Our WORST Gaming Rigs,10000000,2 years ago
994,Jonathan Morrison,2010-03-24,430629457,"High quality videos blending tech + aesthetic,...",264000000,DIY Nintendo Switch Gaming Desk + Setup!,628000,4 years ago


### List of Youtube Channels in Dataframe

In [56]:
#collapse
# List of Video Channels
merged.groupby('channel_name')['channel_name'].count().to_frame(name='Video Count').reset_index()

Unnamed: 0,channel_name,Video Count
0,Android Authority,200
1,Austin Evans,200
2,Jon Rettinger,200
3,Jonathan Morrison,199
4,Linus Tech Tips,200
5,Marques Brownlee,200
6,Mrwhosetheboss,200
7,Unbox Therapy,200
8,UrAvgConsumer,200
9,iJustine,200


### Top 10 Videos by Views

Discoveries so far:

* Majority of these vidoes are over a year old.  Meaning that as time goes by more video views will be acquired.

* Two videos list a dollar amount in the title.

* Marques Brownlee labels some videos (ex: "Dope Tech")

* Unbox Therapy dominates videos by views list, 8 of 10 videos belong to this channel alone.

In [None]:
#collapse
# Top 10 Videos by Views
merged.groupby(['title','channel_name','post_date'])['views'].max().sort_values(ascending=False).head(10).reset_index()

Unnamed: 0,title,channel_name,post_date,views
0,2020 iPad Pro Review: It's... A Computer?!,Marques Brownlee,1 year ago,99000000
1,The Secret Android Button,Unbox Therapy,5 years ago,98000000
2,Dope Tech: Self-Lacing Nike Mag!,Marques Brownlee,4 years ago,98000000
3,The FASTEST gaming PC money can buy,Linus Tech Tips,2 years ago,98000000
4,"Fortnite on an INSANE $20,000 Gaming PC",Unbox Therapy,3 years ago,98000000
5,A Keyboard Made Of Glass?,Unbox Therapy,5 years ago,98000000
6,iPhone 12 - The iPhone is New Again,Unbox Therapy,9 months ago,98000000
7,$1000 Earphones! (Shure SE846 Unboxing & Test),Unbox Therapy,7 years ago,97000000
8,This is the iPhone SE 2,Unbox Therapy,1 year ago,97000000
9,Human Headphones Just Changed The Game,Unbox Therapy,1 year ago,97000000
