In [1]:
import requests
import pandas as pd
import time

In [2]:
with open('API.txt') as f:
    lines = f.readlines()
API_KEY = lines[0]
CHANNEL_ID = "UCq0hKkwnW5Cw1wQqu455WrA"

In [3]:
url_videos = "https://www.googleapis.com/youtube/v3/search?key="+API_KEY+"&channelId="+CHANNEL_ID+"&part=snippet,id&order=date&maxResults=3000"
response = requests.get(url_videos).json()

Looking through the json file, all the relevant information is under the name items so I will pull 
only that to be used in our database.

Now looking individually into what this search API call gives us, I'm going to take the first/most recent video and pull some data from it.

In [4]:
# In JSON files we can call them by name of the properties it has, therefore after
# taking items we can take only the first item by indexing a 0
response['items'][0]

{'kind': 'youtube#searchResult',
 'etag': 'hNZibq3nSrZ4aogxT-AMKGhJn98',
 'id': {'kind': 'youtube#video', 'videoId': '0V_B20DqkT0'},
 'snippet': {'publishedAt': '2022-03-19T00:27:03Z',
  'channelId': 'UCq0hKkwnW5Cw1wQqu455WrA',
  'title': 'What to Do After a Successful Reverse Diet, &amp; More (Listener Live Coaching) - 1774',
  'description': '00:00 MAPS Aesthetic Giveaway 02:25 Mind Pump Fit Tip: You want to get a bigger bench press? GET STRONGER at the ...',
  'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/0V_B20DqkT0/default.jpg',
    'width': 120,
    'height': 90},
   'medium': {'url': 'https://i.ytimg.com/vi/0V_B20DqkT0/mqdefault.jpg',
    'width': 320,
    'height': 180},
   'high': {'url': 'https://i.ytimg.com/vi/0V_B20DqkT0/hqdefault.jpg',
    'width': 480,
    'height': 360}},
  'channelTitle': 'Mind Pump Show',
  'liveBroadcastContent': 'none',
  'publishTime': '2022-03-19T00:27:03Z'}}

In [5]:
print(response['items'][0]['id']['videoId'],
response['items'][0]['snippet']['title'], 
response['items'][0]['snippet']['publishedAt'].split("T")[0])

0V_B20DqkT0 What to Do After a Successful Reverse Diet, &amp; More (Listener Live Coaching) - 1774 2022-03-19


So what we want is the videoID first so that we can pinpoint each video and find some values to 
analyze per video.

Next we want the title of the actual video to check for clickbait and controversial views to be compared the values we will obtain per video.

FInally, I want to include the date of each video just so we have a reference point to see growth of the channel.

In [6]:
for i in response['items']:
    ids = i['id']['videoId']
    titles = i['snippet']['title']
    dates = i['snippet']['publishedAt'].split("T")[0]

In [7]:
ids

'a7_pf5CgRko'

Now all the basic information that is needed from the videos is gathered, I want to start finding the specific details of each video. The initial API used was for searching for videos, now I want to gather the data for each video which means I need to use a different API.

In [8]:
url_metrics = "https://www.googleapis.com/youtube/v3/videos?id="+ids+"&part=statistics&key="+API_KEY
response_metrics = requests.get(url_metrics).json()

In [9]:
print(response_metrics['items'][0]['statistics']['viewCount'],
response_metrics['items'][0]['statistics']['likeCount'],
response_metrics['items'][0]['statistics']['commentCount'])

2117 85 4


Now to make a dataframe to tie all the data together, I want a function that can help gather information maybe not just restricted to this channnel. Majority of the work is complete we know which lines will output what, 
All that needs to be done is: 
    1. Create a dataframe
    2. Place data in dataframe
    3. Transfer to a csv

We'll leave it open with the channel_id so if in the future another analysis on a youtube channnel is required I have majority of the stats that I want.

In [10]:
def get_information(channel_id):
    pageToken = ""
    # Setting up dataframe
    categories = ("ID", "Title", "Date", "Views", "Likes", "Comments")
    df = pd.DataFrame(columns=categories)
    while 1:
        url_videos = "https://www.googleapis.com/youtube/v3/search?key="+API_KEY+"&channelId="+channel_id+"&part=snippet,id&order=date&maxResults=10000&"+pageToken
        response = requests.get(url_videos).json()
        # In case the JSON file isn't fully rendered before going to loop
        time.sleep(1)
        
        # Looping through JSON pulling the values needed in each video
        for i in response['items']:
            # Using our initial call to find basic information of videos
            ids = i['id']['videoId']
            title = i['snippet']['title']
            Date = i['snippet']['publishedAt'].split("T")[0]
            
            # Second API call to find statstics, using the information gathered in first call
            url_metrics = "https://www.googleapis.com/youtube/v3/videos?id="+ids+"&part=statistics&key="+API_KEY
            response_metrics = requests.get(url_metrics).json()
        
            views = response_metrics['items'][0]['statistics']['viewCount']
            likes = response_metrics['items'][0]['statistics']['likeCount']
            comments = response_metrics['items'][0]['statistics']['commentCount']
            
            #Filtering the data into corresponding column in dataframe
            df = df.append({"ID": ids, "Title" : title, "Date":Date, "Views" : views, "Likes" : likes,
                       "Comments" : comments}, ignore_index=True)
        
        # Need to ensure we go through the different pages given, otherwise our data is stuck at 50.
        try:
            if response['nextPageToken'] != None: 
                pageToken = "pageToken=" + response['nextPageToken']
        except:
            break
            
    return df

In [11]:
table = get_information("UCq0hKkwnW5Cw1wQqu455WrA")

In [12]:
table.size

3018

They have around 1700 podcasts, that include video, on this date however they also have specific sections for controversial topics, Q&A sections etc. So each upload of the podcast is accompanied by around 4-5 extra smaller clips. Not including other uploads of 

In [13]:
table.to_csv("MindPumpStats.csv")

Now we have all the data from MindPumps Youtube start all the way to March 19, 2022. 
And since we have the data in a csv file already, we will create a new notebook that has the analysis in it.