# YouTube Data API Overview

## Building our Queries

First we need to set up our imports. As is standard, we'll import numpy and pandas for data handling, but we'll also include os and sys. OS will allow us to use our YouTube API keys without revealing them explicitly, and SYS will allow us to call on a pre-written code illustrating how I used the API for the majority of the project. Finally, build from apiclient.discovery is how we actually pull requests from the API.

In [1]:
import os
import sys
import numpy as np
import pandas as pd
from apiclient.discovery import build
sys.path.insert(0, '../YTCollab/src')

import extract_data

Now we build our query object, which we should be able to use for all requests for the duration of the overview.

In [2]:
DEVELOPER_KEY = os.environ["COLLAB_CLIENT_KEY"]
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)

Using this build object, we should be able to request any information available to the public on YouTube. Let's try it out on a sample channel: my own.

## Channels

To do this we need a YouTube Channel ID, which is typically available at the end of a channel's URL, though not always as some channels use custom URLs. Even so, channel ID's can also be requested through the API, but in this case I already know mine. Note that without authorization for a particular account or channel, information can only be requested, not updated.

Let's request the most readily available information now. The part parameter of 'snippet' gives most information visible from a channel's main page, such as the channel name, country, description, creation date, even avatar urls. This is returned to us in JSON format.

In [3]:
lichwickid = 'UCcx01rLXJ4dCtXqW3kPmBzg'

LichWick = youtube.channels().list(part='snippet', id=lichwickid).execute()
LichWick

{u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/h12BCee15OHkG0pcpT5bI02DcFo"',
 u'items': [{u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/kmpD0OQXXKhAjVbjws_SUGX-jGU"',
   u'id': u'UCcx01rLXJ4dCtXqW3kPmBzg',
   u'kind': u'youtube#channel',
   u'snippet': {u'country': u'US',
    u'description': u"Hello, I'm LichWick. I like playing games and I like being weird. So I thought why not combine the two and put it on the internet for all to enjoy~ Or not. I can't tell you what to do. \n\nIf you like the flavor of my personality or the content of my videos or even if you have nothing else to do, be sure to subscribe for more!",
    u'localized': {u'description': u"Hello, I'm LichWick. I like playing games and I like being weird. So I thought why not combine the two and put it on the internet for all to enjoy~ Or not. I can't tell you what to do. \n\nIf you like the flavor of my personality or the content of my videos or even if you have nothing else to do, be sure to subscribe for more!",
     u'title': u'

## PlaylistItems

Similarly if we use 'statistics' instead of 'snippet' we would receive a channel's various public statistics, such as subs, views, videos, and comments. However, for the next stage of the API exploration, we will need to use 'contentDetails' to find a channels videos.

In [4]:
LichWickContents = youtube.channels().list(part='contentDetails', id=lichwickid).execute()
uploads = LichWickContents['items'][0]['contentDetails']['relatedPlaylists']['uploads']
print uploads

UUcx01rLXJ4dCtXqW3kPmBzg


This is the playlist id for our channel's upload list. If we use playlistitems instead of channels in our query, we will instead receive the videos in a specific playlist instead of the data from a specific channel. In this case, the upload playlist is a full list of each video the owning channel has ever uploaded, generally ordered by publishing date descending. In other words, the more recently a video has been published the earlier it occurs in the playlist.

This rule doesn't apply to all playlists of course, but it is the default setting of the upload playlist for each channel so we can more or less make this assumption for any channel we extract from. Moving on, let's see what's in our channel's upload list.

In [5]:
LichWickUploads = youtube.playlistItems().list(part='snippet,contentDetails', playlistId=uploads, maxResults=1).execute()
LichWickUploads

{u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/OZH0iOh6gjmjetlTzomD-4SNJP8"',
 u'items': [{u'contentDetails': {u'videoId': u'rQJ-7cvRs5w'},
   u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/FoM8-O1J-B4KKUJvL2TYP1Vy5q0"',
   u'id': u'VVVjeDAxckxYSjRkQ3RYcVcza1BtQnpnLnJRSi03Y3ZSczV3',
   u'kind': u'youtube#playlistItem',
   u'snippet': {u'channelId': u'UCcx01rLXJ4dCtXqW3kPmBzg',
    u'channelTitle': u'LichWick',
    u'description': u'Today, Grandma goes back to tie up a few loose ends before venturing out into the bleak dark unknown...\n\nDark Souls 3 is an Action RPG by From Software following the Unkindled, an undead hero from an unknown land. As they journey across the ashen land of Lothric, they will meet an assortment of depressing characters and slay a veritable bounty of wretched fiends and horrifying monstrosities on a quest to unite the Lords of Cinder and bring them back to their moulding thrones. Can the Unkindled complete their task before the Age of Fire comes to a withering end, or will

To save space, we've only called the first item in the list. Once again, 'snippet' provides a lot of general information but if we're actually going to take a look at the video's data, we need to know its id, given by 'contentDetails'. So let's store that and move on to the next stage.

In [6]:
video = LichWickUploads['items'][0]['contentDetails']['videoId']
print video

rQJ-7cvRs5w


## Videos

Now the primary use of this particular type of query is to extract topic data, so I shall also give a demonstration of my method. For now though, let's get our query going using the ID we've just extracted from our upload list.

In [7]:
LichWickVideo = youtube.videos().list(part='snippet,topicDetails', id=video).execute()
LichWickVideo

{u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/gskN6kkAmeo3KG7sbKGqc3nnqx4"',
 u'items': [{u'etag': u'"Ys-tbHJFobljHLVY8LWdvmlIJ3Q/XZ6h1oOmjUH4ip4l8GgNKHvLIPI"',
   u'id': u'rQJ-7cvRs5w',
   u'kind': u'youtube#video',
   u'snippet': {u'categoryId': u'20',
    u'channelId': u'UCcx01rLXJ4dCtXqW3kPmBzg',
    u'channelTitle': u'LichWick',
    u'defaultAudioLanguage': u'en',
    u'description': u'Today, Grandma goes back to tie up a few loose ends before venturing out into the bleak dark unknown...\n\nDark Souls 3 is an Action RPG by From Software following the Unkindled, an undead hero from an unknown land. As they journey across the ashen land of Lothric, they will meet an assortment of depressing characters and slay a veritable bounty of wretched fiends and horrifying monstrosities on a quest to unite the Lords of Cinder and bring them back to their moulding thrones. Can the Unkindled complete their task before the Age of Fire comes to a withering end, or will the world be once again swallowed

We can see our topics down at the bottom of our JSON, those strings in the '/m/0xxxxxx' format. The JSON already has them structured into a list for easy use. So, we'll aggregate them into a new list that we can easily work with.

We extract the 'topicIds' field first as it generally tends to contain the most relevant topic to the video, if any. Not all videos will have a 'topicIds' so our program will have to account for this. Every video has a list of 'relevantTopicIds', however. So we'll never walk away empty handed.

In [8]:
topics = LichWickVideo['items'][0]['topicDetails']['topicIds']
topics.extend(LichWickVideo['items'][0]['topicDetails']['relevantTopicIds'])
print topics

[u'/m/013f7bmb', u'/m/03bt1gh', u'/m/06zm8z', u'/m/0403l3g', u'/m/0dgs3gt', u'/m/025fwn', u'/m/01mw1', u'/m/0p8xwrr', u'/m/025zzc', u'/m/0bzvm2', u'/m/0403l3g', u'/m/025zzc', u'/m/06zm8z']


If we construct a single list using our topics list over several videos, we can run a Counter over it to determine how frequently each topic occurs, which will be a big help in determining the current cosine similarity between two channels.

For more information on cosine similarity, be sure to check out the cosine similarity report.