# Youtube API -- Printing the view count of a keyword/outputting in dataframe

This jupyter notebook will give you a runthrough as to how to use the youtube api to print viewcount in a pandas dataframe and will graph it.

In [1]:
import requests
import json
import pandas as pd
import numpy as np
api_key = "" ### Enter your own api key here

Requests allows us to look at the html of the youtube page that we are using. Json files are sort of like excel files. Pandas, matplotlib, and numpy are packages that help do statistical calculations, graph things, and simply math! 

THE API KEY IS YOUR OWN API KEY THAT YOU MUST ENTER
You can use this link to open up your own project on google to get the proper creditals needed to access the data in youtube.

https://developers.google.com/youtube/
or 
https://developers.google.com/youtube/v3/getting-started


In [2]:
parameters = {"part": "snippet",
              "maxResults": 50, 
              #this will give you the number of videos that you have
              "order": "date",
              # we are "ordering" by the date of the video
              "publishedAfter": "2017-12-05T00:00:00Z",
              "publishedBefore": "2018-11-04T00:00:00Z",
              #the formatting for time is as follows %Y-%m-%dT%H:%M:%S.000Z
              "q": "",
              "key": api_key,
              "description":"",
              }
#parameters are the things that we will see when we request to see the page via the search engine on youtube
url = "https://www.googleapis.com/youtube/v3/search"

This url gives us access to the search engine in youtube. The paramters are in youtube specifications that can be found here --- https://developers.google.com/youtube/v3/docs/search/list.

This website searches for videos using any keyword.

In [3]:
parameters["q"] = "Lebron James" 
# we are passing in a keyword 'q' to the search engine
page = requests.request(method="get", url=url, params=parameters) # parameters that will be seen in the output
page.text

'{\n "kind": "youtube#searchListResponse",\n "etag": "\\"7991kDR-QPaa9r0pePmDjBEa2h8/pFLJNAO6KBU0Gp_U24h3ETLrwrw\\"",\n "nextPageToken": "CDIQAA",\n "regionCode": "US",\n "pageInfo": {\n  "totalResults": 1000000,\n  "resultsPerPage": 50\n },\n "items": [\n  {\n   "kind": "youtube#searchResult",\n   "etag": "\\"7991kDR-QPaa9r0pePmDjBEa2h8/rwoRxSHe9clkJIchwxReNlOWFsE\\"",\n   "id": {\n    "kind": "youtube#video",\n    "videoId": "CEqxMuaiHDY"\n   },\n   "snippet": {\n    "publishedAt": "2017-12-13T04:02:24.000Z",\n    "channelId": "UCHW0HD1SsRPWkk6NQ62qCdQ",\n    "title": "Cleveland-Atlanta | Lebron James\'ten Cedi Osman\'a asist | NBA | 13.12.2017",\n    "description": "",\n    "thumbnails": {\n     "default": {\n      "url": "https://i.ytimg.com/vi/CEqxMuaiHDY/default.jpg",\n      "width": 120,\n      "height": 90\n     },\n     "medium": {\n      "url": "https://i.ytimg.com/vi/CEqxMuaiHDY/mqdefault.jpg",\n      "width": 320,\n      "height": 180\n     },\n     "high": {\n      "url": 

Let's clean that up a bit

In [4]:
print(page.text)

{
 "kind": "youtube#searchListResponse",
 "etag": "\"7991kDR-QPaa9r0pePmDjBEa2h8/pFLJNAO6KBU0Gp_U24h3ETLrwrw\"",
 "nextPageToken": "CDIQAA",
 "regionCode": "US",
 "pageInfo": {
  "totalResults": 1000000,
  "resultsPerPage": 50
 },
 "items": [
  {
   "kind": "youtube#searchResult",
   "etag": "\"7991kDR-QPaa9r0pePmDjBEa2h8/rwoRxSHe9clkJIchwxReNlOWFsE\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "CEqxMuaiHDY"
   },
   "snippet": {
    "publishedAt": "2017-12-13T04:02:24.000Z",
    "channelId": "UCHW0HD1SsRPWkk6NQ62qCdQ",
    "title": "Cleveland-Atlanta | Lebron James'ten Cedi Osman'a asist | NBA | 13.12.2017",
    "description": "",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/CEqxMuaiHDY/default.jpg",
      "width": 120,
      "height": 90
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/CEqxMuaiHDY/mqdefault.jpg",
      "width": 320,
      "height": 180
     },
     "high": {
      "url": "https://i.ytimg.com/vi/CEqxMuaiHDY/hqdef

The following part takes the videos and looks at the analytics of it.

In [5]:
parameters = {"part": "statistics", # This gives us the statistics
              "id": "8nn-YiUMqmI", #Enter the id of the video
              "key": api_key,
              }

This new url will allow us to get the statistics for each individual video given the id of it. We can see the viewCount, dislikeCount, likeCount, favoriteCount, etc. Pretty much everything that you see when you look at a youtube page.

In [6]:
page = requests.request(method="get", url="https://www.googleapis.com/youtube/v3/videos", params=parameters)
print (page.text)

{
 "kind": "youtube#videoListResponse",
 "etag": "\"7991kDR-QPaa9r0pePmDjBEa2h8/wP1sx6WolrEJZ4SaIcPLCjkE83Q\"",
 "pageInfo": {
  "totalResults": 1,
  "resultsPerPage": 1
 },
 "items": [
  {
   "kind": "youtube#video",
   "etag": "\"7991kDR-QPaa9r0pePmDjBEa2h8/j30tenCOOU8kDTqORaFkvBWKdL0\"",
   "id": "8nn-YiUMqmI",
   "statistics": {
    "viewCount": "22657",
    "likeCount": "239",
    "dislikeCount": "42",
    "favoriteCount": "0",
    "commentCount": "170"
   }
  }
 ]
}



list_check is a function that takes each video in the list of videos and prepares it for analytics.

In [7]:
def list_check(q, publishedAfter, publishedBefore, pageToken):
    parameters = {"part": "id",
                  "maxResults": 50,
                  "order": "viewCount",
                  "pageToken": pageToken,
                  "q": q,
                  "type": "video",
                  "key": api_key,
                  "publishedAfter":publishedAfter,
                  "publishedBefore":publishedBefore
                  }
    page = requests.request(method="get", url = "https://www.googleapis.com/youtube/v3/search", params = parameters)
    return json.loads(page.text)

Search_list is a helper function that takes in all of the videos and uses list_check to run and prepare for analytics

In [8]:
def searchList(q, publishedAfter, publishedBefore, max_requests = 50):
    next_Page_Token=""
    final = []
    for counter in range(max_requests):
        j_results = list_check(q=q, publishedAfter=publishedAfter, publishedBefore=publishedBefore, pageToken=next_Page_Token)
        items = j_results.get("items", None)
        final += [item["id"]["videoId"] for item in j_results["items"]]
        if "nextPageToken" in j_results:
            next_Page_Token = j_results["nextPageToken"]
        else:
            return final
    return final

videoList takes care of pulling the likes, viewCount, dislikes,etc.

In [9]:
def videoList(video_id_list):
    parameters = {"part": "statistics",
                  "id": ",".join(video_id_list),
                  "key": api_key,
                  "maxResults": 50
                  }
    url = "https://www.googleapis.com/youtube/v3/videos"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    df = pd.DataFrame([item["statistics"] for item in j_results["items"]], dtype=np.int64)
    parameters["part"] = "snippet"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    return df

uses helper function to sort through list of videos

In [10]:
def video_list(video_id_list):
    values = []
    for index, item in enumerate(video_id_list[::50]):
        t_index = index * 50
        values.append(videoList(video_id_list[t_index:t_index+50]))
    return pd.concat(values)

In [11]:
def get_data(keywords, publishedAfter, publishedBefore):
    results_list = []
    for q in keywords:
        results = searchList(q=q,
                              publishedAfter=publishedAfter,
                              publishedBefore=publishedBefore,
                              max_requests=50)

        stat_data_set = video_list(results)
        stat_data_set["keyword"] = q
        results_list.append(stat_data_set)
    data_set = pd.concat(results_list)
    return data_set

Function that takes in the starting and ending time that you specify and will give you the amount of views of that specific keyword in that time interval.

In [12]:
def key_word_statisitics(keywords,
                         year_begin,
                         year_end,
                         month_begin,
                         month_end,
                         day_begin,
                         day_end):
    
        return get_data(keywords,youtube_date(year_begin,month_begin,day_begin),youtube_date(year_end,month_end,day_end))

Helper funciton that puts time period in "youtube" formatting.

In [13]:
def youtube_date(year_end,month_end,day_fin):
    if (day_fin < 10):
        day_end = "0" + str(day_fin)
    else:
        day_end = str(day_fin)
    if (month_end < 10):
        month_fin = "0" + str(month_end)
    else:
        month_fin = str(month_end)
    return ""+str(year_end)+"-"+month_fin+"-"+day_end+"T00:00:00Z"

Prints out a pandas dataframe of all of the analytics of the videos. Note that this prints out a maximum of 50 "requests" or videos per week. In addition, getting access to likeCount, dislikeCount, or commentCount requires furthur authorization.

In [15]:
keyword = ["python","javascript","java","fidget spinners","php","sorting algorithms","Machine Learning"]
data = key_word_statisitics(keyword,2017,2017,1,12,1,12)
pd.DataFrame(data=data)

Unnamed: 0,commentCount,dislikeCount,favoriteCount,likeCount,viewCount,keyword
0,916,,0,,13431961,python
1,605,4267,0,7540,10307549,python
2,649,5695,0,12677,9955663,python
3,564,2460,0,6492,8266132,python
4,8314,4284,0,17988,7778529,python
5,278,1813,0,4128,4911752,python
6,291,3382,0,5280,3855636,python
7,1724,2816,0,6504,3467276,python
8,147,,0,,2869879,python
9,286,1531,0,1304,2851312,python


In [16]:
pd.pivot_table(data, values=["viewCount",], aggfunc='sum', index="keyword")

Unnamed: 0_level_0,viewCount
keyword,Unnamed: 1_level_1
Machine Learning,10463129
fidget spinners,1444092661
java,18316154
javascript,8025141
php,9336629
python,111780361
sorting algorithms,2073713
