# 1. Alex Jones Videos Data Collection 

On his YouTube channel, Alex Jones often posts videos attacking the credibility of mainstream media. We are interested in doing a small analysis of this phenonemon to see if we might be able to characterize these attacks. 

First, we'll need to collect the metadata for all the videos posted between January 1st, 2015 and May 4th, 2018 (as specified). 

We're going to use the pandas library because it's useful and quick when manipulating large amounts of data. 

In [None]:
import requests
import urllib
import pandas as pd
pd.set_option('display.max_columns', None)  

We'll first need to write a function that accesses the YouTube API. We'll be able to use this everytime we need to query some information. 

In [None]:
def get_response(query_type, parameters):
    '''
    inputs: query_type - the resource type that we need information for (e.g. playlists, channels, videos, etc.)
            parameters - list of tuples correpsonding to the parameter names and their values in our query 
    '''
    base_url = 'https://www.googleapis.com/youtube/v3/'
    key = "mykey"
    
    # We always need to pass our key, so we'll add it to the list here
    parameters = parameters.append(('key', key))
    
    # Takes a list of tuples and encodes it into a url like:
    # 'part=contentDetails&key=mykey&forUsername=theAlexJonesChannel'
    url_suffix = urllib.urlencode(parameters)
    
    # Constructs the url to query and returns the json response
    http_endpoint = base_url + query_type + url_suffix
    response = requests.get(http_endpoint)
    response_json = response.json()
    return response_json

To get a users' uploaded videos, we'll find the playlist called uploads and then go through the list to find the relevant information for each video. 

The YouTube API only returns a maximum of 50 results with every call, so we'll need to move through the pages to get the data using the nextPageToken. 

In [None]:
def get_all_videos(username):
    '''
    Make a pandas dataframe with all the videos uploaded by this user. 
    
    Inputs: username - the user whose videos we want to list
    '''
    
    video_data = []

    # Find the id of the UPLOADS playlist, and the channel_id
    playlist_params = [('part', 'contentDetails,id'), ('forUsername', username)]
    playlist_info = get_response('channels?', playlist_params)
    
    playlist_ID = playlist_info['items'][0]['contentDetails']['relatedPlaylists']['uploads']
    channel_Id = playlist_info['items'][0]['id']
    
    # Get videos in the playlist
    video_params = [('part','snippet'), ('playlistId',playlist_ID), ('maxResults', 50)]
    video_search = get_response('playlistItems?', video_params)
    
    
    # The information we'll be collecting for each video
    headers = ['video_id', 'channel_title', 'channel_id', 'video_publish_date', 
                'video_title', 'video_view_count', 'video_like_count', 'video_dislike_count',
                'video_comment_count', 'video_description']

    while True:
        for video in video_search['items']:
            video_snippet = video['snippet']

            # The YouTube API does not let us filter the playlist search results by date
            # so we'll check if the video falls in our desired time frame before 
            # processing it. 
            # Because the videos are largely in descending order, we can break and stop 
            #searching after we hit January 1st, 2015. 
            if video_snippet['publishedAt'] > '2018-05-05T00:00:00Z':
                continue
            elif video_snippet['publishedAt'] < '2015-01-01T00:00:00Z':
                return video_data
            
            # Get the information we need and then add to our whole list of data. 
            this_video_data = get_video_data(video_snippet, channel_Id)
            video_data.append(this_video_data)

        # Get the next video. 
        next_video = video_params + [('pageToken', video_search['nextPageToken'])]
        video_search = get_response('playlistItems?', next_video)

        # Save our data so far just to be safe 
        video_df = pd.DataFrame(video_data, columns=headers)
        video_df.to_pickle("data/video_data.pkl")

        # If there's no more results, we break and return our dataframe
        if len(video_search['items']) == 0:
            return video_df

In each video, we'll need to extract the information we want. To get the like, dislike, view and comment counts, we'll need to run one more query to the YouTube API, videos section. 

In [None]:
def get_video_data(video_snippet, channel_Id):
    
    '''
    Takes a snippet of information about a video and turns it into a set of attributes
    that we can add to our dataset. We query the YouTube API using the video ID to get the
    relevant statistics for that video. 
    
    Inputs: video_snippet - json object containing relevant information about the video
            vhannel_Id - the channel ID of our user
    '''

    video_id = video_snippet['resourceId']['videoId']

    video_channel = video_snippet['channelTitle']
    video_date = video_snippet['publishedAt']
    video_title = video_snippet['title']
    video_description = video_snippet['description'].split('\n')[0].strip()

    # Another query to find the statistics that we need for the video
    video_params = [('part', 'statistics'), ('id', video_id)]
    video_json = get_response('videos?', video_params)

    # Arrange all our data into a list and return it
    stats = video_json['items'][0]['statistics']
    
    video_view_count = get_value(stats, 'viewCount')
    video_like_count = get_value(stats, 'likeCount')
    video_dislike_count = get_value(stats, 'dislikeCount')
    video_comment_count = get_value(stats, 'commentCount')

    this_video = [video_id, video_channel, channel_Id, video_date, video_title, video_view_count,
            video_like_count, video_dislike_count, video_comment_count, video_description]
    
    return this_video

def get_value(dictionary, key):
    # Sometimes, keys don't exist so we write these as None in our data.  
    if key in dictionary:
        return dictionary[key]
    else:
        return None

In [None]:
def main():
    key = "mykey"
    username = "TheAlexJonesChannel"

    # If we haven't yet extracted the data
    video_data = get_all_videos(key, username)

    # If we have the dataset pickled
    #video_data = pd.read_pickle("data/video_data.pkl")
    
    # Write our data to a CSV
    video_data.to

In [None]:
if __name__ == "__main__":
    main()