# COMP30760 ASSIGNMENT 1 - TASK 1 - 18439746 - Matthew O'Donnell
In this assignment we will collect detailed channel, video and playlist data from the Youtube Data API v3 for five different youtube channels (BT Sport, Fox Sports, Google Analytics, Marvel HQ, Netflix Futures).
This notebook covers Task 1 - Data Collection

In [167]:
import json, requests, urllib
from pathlib import Path
from datetime import datetime
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

Settings for the Youtube API and our Data Collection

In [168]:
# API Key
api_key = "AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4"
# Prefix for our API URLs
api_prefix = "https://www.googleapis.com/youtube/v3"
# The names of the Youtube channels we would like to analyse
channel_names = ["BT Sport", "Fox Sports", "Google Analytics", "Marvel HQ", "Netflix Futures"]
# The channel ID's for each of these Youtube channels
channel_ids = {"BT Sport":"UC4i_9WvfPRTuRWEaWyfKuFw", "Fox Sports":"UCwNqHDsnBCKT-olwJwIFyfg",
                    "Google Analytics":"UCJ5UyIAa5nEGksjcdp43Ixw", "Marvel HQ":"UCxwitsUVNzwS5XBSC5UQV8Q",
                    "Netflix Futures":"UCpInjhuJ1WUekcFeXPPyGGg"}

Create a directory for raw data storage if one does not already exist

In [169]:
dir_raw = Path("raw")
dir_raw.mkdir(parents=True, exist_ok=True)

We define a fetch function for retrieving data from our Youtube API

In [170]:
def fetch(endpoint, params={}):
    # construct the url
    url = api_prefix
    if not endpoint.startswith("/"):
        url += "/"
    url += endpoint
    (api_prefix, endpoint)
    params["key"] = api_key
    url += "?" + urllib.parse.urlencode(params)
    print("Fetching %s" % url)
    # fetch the page
    response = requests.get(url)
    jdata = response.text
    return json.loads(jdata)

Youtube associates a unique etag and id with every channel. We will test our fetch function by finding the etags of each of our channels

In [171]:
channel_metadata = {}
channel_etags = {}
for channel_name in channel_names:
    channel_data = fetch("/channels", {"id":channel_ids[channel_name], "part": 'snippet,statistics'})
    for result in channel_data['items']:
        if result["id"] == channel_ids[channel_name]:
            print("Found match for %s: Etag=%s Type=%s" % (channel_name, result["etag"], result["kind"]))
            channel_metadata[channel_name] = result 
            channel_etags[channel_name] = result["etag"]
            break
print("Found etags for %d channels" % len(channel_etags))

Fetching https://www.googleapis.com/youtube/v3/channels?id=UC4i_9WvfPRTuRWEaWyfKuFw&part=snippet%2Cstatistics&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Found match for BT Sport: Etag=qgcjfqFiV9g8eZ2uBOctSFxcdyg Type=youtube#channel
Fetching https://www.googleapis.com/youtube/v3/channels?id=UCwNqHDsnBCKT-olwJwIFyfg&part=snippet%2Cstatistics&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Found match for Fox Sports: Etag=Ot7M9B4LHKLQ88vfVHo-DZ-vmOk Type=youtube#channel
Fetching https://www.googleapis.com/youtube/v3/channels?id=UCJ5UyIAa5nEGksjcdp43Ixw&part=snippet%2Cstatistics&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Found match for Google Analytics: Etag=g249XY3j-SR-wckOPWgdpOaLErg Type=youtube#channel
Fetching https://www.googleapis.com/youtube/v3/channels?id=UCxwitsUVNzwS5XBSC5UQV8Q&part=snippet%2Cstatistics&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Found match for Marvel HQ: Etag=i2pTCA1kh_5Xc6bgMN8U2wevFvY Type=youtube#channel
Fetching https://www.googleapis.com/youtube/v3/channe

Lets test our fetch function by making sure our channels details are correct

In [172]:
metadata_rows = []
for channel_name in channel_names:
    row = {"channel": channel_name, "etag": channel_etags[channel_name], "id": channel_ids[channel_name]}
    row["launch_date"] = channel_metadata[channel_name]["snippet"]["publishedAt"]
    row["views"] = channel_metadata[channel_name]["statistics"]["viewCount"]
    row["subscriber_count"] =  channel_metadata[channel_name]["statistics"]["subscriberCount"]
    row["hidden_subs"] =  channel_metadata[channel_name]["statistics"]["hiddenSubscriberCount"]
    row["video_count"] =  channel_metadata[channel_name]["statistics"]["videoCount"]
    metadata_rows.append(row)
pd.DataFrame(metadata_rows).set_index("channel")

Unnamed: 0_level_0,etag,id,launch_date,views,subscriber_count,hidden_subs,video_count
channel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BT Sport,qgcjfqFiV9g8eZ2uBOctSFxcdyg,UC4i_9WvfPRTuRWEaWyfKuFw,2013-01-02T14:09:35Z,2316604852,4010000,False,8612
Fox Sports,Ot7M9B4LHKLQ88vfVHo-DZ-vmOk,UCwNqHDsnBCKT-olwJwIFyfg,2006-06-05T15:03:01Z,539099766,955000,False,4440
Google Analytics,g249XY3j-SR-wckOPWgdpOaLErg,UCJ5UyIAa5nEGksjcdp43Ixw,2008-02-25T18:07:41Z,50446777,375000,False,461
Marvel HQ,i2pTCA1kh_5Xc6bgMN8U2wevFvY,UCxwitsUVNzwS5XBSC5UQV8Q,2017-09-12T21:11:41Z,780763404,1720000,False,640
Netflix Futures,q17JPiBVr0HPLSLD0cwAayWwLIA,UCpInjhuJ1WUekcFeXPPyGGg,2017-10-16T17:09:57Z,1215418772,3070000,False,1367


We will now create a function to collect data from each of our channels, along with details of a certain amount of videos in our channel, and also details of a certain amount of playlists from our channels. 
Note that we write this data out to four seperate files which we will later parse, analyse etc. 

In [173]:
def fetch_current_conditions(channel_name):
    #Create each of the four .json files which we will be writing data to
    fname1 = "%s*channels.json" % (channel_name)
    out_path1 = dir_raw / fname1
    print("Writing data to %s" % out_path1)
    fout1 = open(out_path1, "w")
    
    fname2 = "%s*videos.json" % (channel_name)
    out_path2 = dir_raw / fname2
    print("Writing data to %s" % out_path2)
    fout2 = open(out_path2, "w")
    
    fname3 = "%s*videoDetails.json" % (channel_name)
    out_path3 = dir_raw / fname3
    print("Writing data to %s" % out_path3)
    fout3 = open(out_path3, "w")
    
    fname4 = "%s*playlists.json" % (channel_name)
    out_path4 = dir_raw / fname4
    print("Writing data to %s" % out_path4)
    fout4 = open(out_path4, "w")
    
    # Create our first endpoint URL to get details of each of our channels
    endpoint1 = "/channels"
    # Fetch the data 
    params1 = {"id":channel_ids[channel_name], "part": 'snippet,statistics,contentDetails'}
    conditions_data1 = fetch(endpoint1, params1)
    json.dump(conditions_data1, fout1, indent=4, sort_keys=True)
    
    # Create another endpoint URL to get details of a certain number of videos in each of our channels
    endpoint2 = "/search"
    # Fetch the data
    params2 = {"channelId":channel_ids[channel_name], "part": 'snippet,id', "order":'date', "maxResults":50}
    conditions_data2 = fetch(endpoint2, params2)
    json.dump(conditions_data2, fout2, indent=4, sort_keys=True)
    
    #Now we must use the video ID's found from our search of the videos of our channel, 
    #to get additional details of our videos such as likes, comments etc.
    video_IDs = {}
    video_metadata = {}
    # Create a nested dictionary, my_dict, which we will use to store video details based on video_IDs
    my_dict = {}
    endpoint4 = "/videos"
    for result in conditions_data2['items']:
        if result['id']['kind'] == "youtube#video":
            count = 0;
            video_IDs[channel_name] = result["id"]["videoId"]
            params4 = {"id":video_IDs[channel_name], "part": 'statistics,snippet'}
            video_metadata[count]= fetch(endpoint4, params4)
            my_dict.update({video_IDs[channel_name] : video_metadata[count]})
            count+=1
    json.dump(my_dict, fout3, indent=4, sort_keys=True)
    
    
    # Create another endpoint URL to get details of a certain amount of playlists in each of our channels
    endpoint3 = "/playlists"
    # Fetch the data
    params3 = {"channelId":channel_ids[channel_name], "part": 'snippet,status,contentDetails', "order": 'date', "maxResults":20}
    conditions_data3 = fetch(endpoint3, params3)
    
    json.dump(conditions_data3, fout4, indent=4, sort_keys=True)
    fout1.close()
    fout2.close()
    fout3.close()
    fout4.close()

We then run the following code to get general details, videos and playlists for each of our five channels

In [174]:
for channel_name in channel_names:
    fetch_current_conditions(channel_name)

Writing data to raw/BT Sport*channels.json
Writing data to raw/BT Sport*videos.json
Writing data to raw/BT Sport*videoDetails.json
Writing data to raw/BT Sport*playlists.json
Fetching https://www.googleapis.com/youtube/v3/channels?id=UC4i_9WvfPRTuRWEaWyfKuFw&part=snippet%2Cstatistics%2CcontentDetails&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/search?channelId=UC4i_9WvfPRTuRWEaWyfKuFw&part=snippet%2Cid&order=date&maxResults=50&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=gygi4biFcXk&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=SER8xp5jEGE&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=d3P0FhE7o5Y&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=kj5N

Fetching https://www.googleapis.com/youtube/v3/videos?id=IqcR1Vl-UgY&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=YOx-hjKlzLo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=8hn_2A-AKVg&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=wVwANkN03pg&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=LleRg6HmHWk&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=OcDgoNBSYWk&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=S2bmzvJ-0cw&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.google

Fetching https://www.googleapis.com/youtube/v3/videos?id=trTuipKPxBk&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=ekt-HY5tt1M&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=ObfMlYMk5QI&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=l2tNKF7Wei8&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=OsHSkSiUwLo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=xQ6lx5ol8lw&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=cQI2ilpH_aI&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.google

Fetching https://www.googleapis.com/youtube/v3/videos?id=hEedlP_6tQg&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=exafd6P7clY&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=Zu0K1YKgUJQ&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=bBU_QmCVP2Y&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=ScGhcglTFDo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=eYSYt551Z9k&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=pE_XtyrsqR4&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.google

Fetching https://www.googleapis.com/youtube/v3/videos?id=tPhPQoX4YJo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=R7nKawZcWWo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=onH5GoXcR84&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=kAMrJJCB2tU&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=ixX92SCzwqo&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=p1XQRmN8vkA&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.googleapis.com/youtube/v3/videos?id=3RCIUXFu5O8&part=statistics%2Csnippet&key=AIzaSyAL-IoAeDByzMkpWOFinuRWq0Izc0B8zS4
Fetching https://www.google