# Data Extraction

There are 5 youtube fitness channels I am interested in. The channel names are MadFit, blogilates, emi wong, Rebecca-Louise and Chloe Ting.

I used the YouTube Data API v3 to scrape the channel and video information. 
To do so, the following steps were carried out in order:
- created a project on Google Developers Console
- requested an authorization credential (API key). 
- enabled Youtube API for my project 

As for the functions that I have used to scrape the youtube data, I give credit to thu-vu92 for using the code provided her GitHub project 'youtube-api-analysis'. This project was inspired by the YouTube API tutorial video created by Thu Vu data analytics, titled 'Youtube API for Python: How to Create a Unique Data Portfolio Project'.


In [1]:
import os
import pandas as pd
import youtube_scraping_functions as ytfuns  # Module to scrape channels and videos

from googleapiclient.discovery import build  # Google API
from IPython.display import JSON             # Disply JSON
from functools import partial                # Use with Map to fix an argument


In [2]:
# API Key generated from Google Cloud
api_key = 'XXXX'

# List of channels ids of the 5 fitness channel which was found from their respective youtube channel url
channel_ids = ['UCpQ34afVgk8cRQBjSJ1xuJQ', # MadFit
               'UCvGEK5_U-kLgO6-AMDPeTUQ', # EmiWong
               'UCIJwWYOfsCfz6PjxbONYXSg', # Blogilates
               'UCCgLoMYIyP0U56dEhEL1wXQ', # ChloeTing
               'UCi0AqmA_3DGPFCu5qY0LLSg', # Rebecca-Louise
              ]

# Get credentials and create an API client
youtube = build("youtube", "v3", developerKey=api_key)

### Get channel statistics 

In [3]:
# get channel data and convert to dataframe
channels_df = pd.DataFrame(ytfuns.get_channel_stats(youtube, channel_ids))

### Get video statistics 

In [4]:
# create empty dataframe to store video data
video_df = pd.DataFrame()

# Create a dataframe of video statistics of all videos from all the channels
for c in channels_df['ChannelName'].unique():
    print("Getting video information from channel: " + c)
    playlist_id = channels_df.loc[channels_df['ChannelName']== c, 'playlistID'].iloc[0]
    
    # get list of video ids of all videos in the channel
    video_ids = ytfuns.get_video_ids(youtube, playlist_id)
    
    # get video statistics of each video in the channel
    video_data = pd.DataFrame(ytfuns.get_video_stats(youtube, video_ids))

    # concat video data of the channel to the dataframe
    video_df = pd.concat([video_df,video_data], ignore_index=True)


Getting video information from channel: Chloe Ting
Getting video information from channel: blogilates
Getting video information from channel: MadFit
Getting video information from channel: Rebecca-Louise
Getting video information from channel: emi wong


#### Check that the data has been scraped properly

In [5]:
channels_df.head()

Unnamed: 0,ChannelName,ChannelDescription,PublishedDate,TotalSubscribers,TotalViews,TotalVideos,playlistID
0,Chloe Ting,Subscribe to my channel and find weekly workou...,2011-08-17T04:29:09Z,24700000,2980737335,407,UUCgLoMYIyP0U56dEhEL1wXQ
1,blogilates,"Hey guys! My name is Cassey Ho, I am a certifi...",2009-06-13T09:05:48Z,8690000,2820126375,1183,UUIJwWYOfsCfz6PjxbONYXSg
2,MadFit,"This is a place where I post REAL TIME, AT HOM...",2018-03-02T01:46:06Z,8000000,943060836,723,UUpQ34afVgk8cRQBjSJ1xuJQ
3,Rebecca-Louise,"Hey, \n\nWelcome to #TEAMBURN 🙌🏻 \n\nI am so e...",2012-09-22T18:04:00Z,720000,117668198,1257,UUi0AqmA_3DGPFCu5qY0LLSg
4,emi wong,welcome to my channel!\nhope my videos can hel...,2014-11-02T14:43:34Z,6100000,819791658,499,UUvGEK5_U-kLgO6-AMDPeTUQ


In [6]:
video_df.head()

Unnamed: 0,video_id,channelTitle,title,description,tags,publishedAt,viewCount,likeCount,favouriteCount,commentCount,duration,definition
0,e7zzES8PeG4,Chloe Ting,Shocking Before After Transformation Results! ...,Check out these amazing before and after trans...,"[Abs, Abs results, Abs workout results, Before...",2023-06-28T14:00:23Z,36033,2677,,163,PT9M22S,hd
1,AZ1ihabY6bI,Chloe Ting,when you're having a bad day,Cute samoyed doggies in Seoul!!,"[samoyed, seoul, day in my life, doggies, dogs...",2023-06-19T14:55:36Z,87568,4739,,77,PT18S,hd
2,5GLA8MrlDnM,Chloe Ting,A day in my life living in Korea,Short vlog from a day out and about while in S...,"[dayinmylife, korea, seoul, vlog, chloeting, c...",2023-06-05T14:51:22Z,317017,9618,,708,PT12M37S,hd
3,IOJ7Fxa8e2Y,Chloe Ting,GROW YOUR BOOTY with these exercises,See the full video here: https://youtu.be/4zuY...,"[glute workout, booty workout, gym workout, fi...",2023-05-24T15:32:44Z,171898,5484,,57,PT23S,hd
4,ljNgkSctkXg,Chloe Ting,INTENSE Full Body Workout - 30 Min No Equipment,This is a 30 min full body intense workout fro...,"[workout, home workout, full body workout, ful...",2023-05-17T14:00:27Z,620735,17223,,863,PT31M14S,hd


### Save raw data 

In [7]:
# get the current working directory; should be the top level project folder
cwd = os.path.dirname(os.getcwd())

# file path of where the scraped data will be stored
path = cwd + "/data/raw"

In [8]:
# Save dataframes as csv files 
channels_df.to_csv(path + "/fitness_channels_2023_06_28.csv", index=False)
video_df.to_csv(path + "/fitness_videos_2023_06_28.csv", index=False)