## Overview
- The functions used here are probably going to be used the most, but there are some other functions that I have also written in the py file. There is some documentation on each function in the py file as well. 
- Although finicky, the Youtube API is extremely powerful data collection tool and can give you access to other data I might not have included in my functions. Feel free to look at this documentation, https://developers.google.com/youtube/v3/docs, if you want to learn more! It is pretty simple to alter the code I have already written to pull other data, especially if you take a look at some of the smaller functions I used to build the functions utilized in this template
- I initially followed this tutorial to learn more about the API if you wanted to learn how I wrote these functions! https://www.youtube.com/watch?v=2mSwcRb3KjQ

In [1]:
import numpy as np
import pandas as pd
import re
from sam_fun import *

## 1) Get your own YT API key
- Use the directions in this link to get your own API key you'll need to access the YT API
- https://blog.hubspot.com/website/how-to-get-youtube-api-key

In [2]:
#add your api key
api_key = " "
from googleapiclient.discovery import build
youtube = build("youtube","v3",developerKey=api_key)

## 2) Getting Channel IDs
- Each channel on Youtube has a unique channel ID
- the function "trending_creators_by_country" takes in a country code, and returns a list of channel IDs from the specified country's trending page
- If you want to get a specifc Youtube channel's ID, 
     1. Go to their Youtube Page
     2. Right click and hit view page sources
     3. Command_F browse_id, ID is stored in value key next to it (ex. Bon Appetit's is UCbpMy0Fg74eXXkvxJrtEn3w)
- To find other country codes, look at this link https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 
    - IN is India

In [3]:
#top creators from india's trending video page
#leave youtube as your parameter, this is just to access the API
in_creator_ids= trending_creators_by_country(youtube,"US")
in_creator_ids[0:5]

['UCq0OueAsdxH6b8nyAspwViw',
 'UCET00YnetHT7tOpu12v8jxg',
 'UCy6D16zE_mMEm1HVD20WFxA',
 'UC2-BeLxzUBSs0uSrmzWhJuQ',
 'UClyGlKOhDUooPJFy4v_mqPg']

## 3) Channel Stats
- "channel_stats" will return a data frame containing overall channel stats, including total view count across all their videos, sub count, and video count
- the function takes in a list of channel IDs

In [4]:
#dataframe of stats from creators, made with the list of ids
in_stats_df = channels_stats(youtube,in_creator_ids)
in_stats_df = add_emails(in_stats_df)

## 3) Filtering Channels
- the function "top_channels" takes in a dataframe made in the previous function, and finds the top channels based on a specified column
- In this case, the top 5% of channels based on their number of subs

In [6]:
# gets top 5% of channels based off subscribers
in_95 = top_channels(in_stats_df,"subscriberCount",95)

In [70]:
# Takes a list of one or more topics and returns channels that are listed with at least one of the topics
in_category = categorize_channels(["Simulation_video_game","Film"],in_stats_df)



In [88]:
#Returns channels with subscribers between two limits
in_between_subs = between_subs(250000,400000,in_stats_df)

## 4) Video IDs
- Similar to how each channel has a unique ID, each video also has a unique ID. From the video ID, you can get video title, publish date, description, tags, view counts for that specifc video, dislike count (if not made private), comment count, whether it was made for kids, and much more (that I have not included in this function). 
- get_videoID_list takes in a channel ID and returns their entire video collection in the form of a list of video IDs

In [7]:
# gets id of every video on khordha toka channel
video_ids = get_videoID_list(youtube,"UClyGlKOhDUooPJFy4v_mqPg") # or in_stats_df[2] 


In [8]:
video_ids[0:5]

['XQfgbQfojb4', '156sdsDz66o', 'zLAn7Qp69yA', 'LnL55NBi1wk', '8sAG5PQTMQ0']

## 5) Video Stats
- the function "get_video_details" takes in a list of video IDs and returns a data frame on stats from every video. 
- In this example, I used the list from above to get stats on every video a channel has posted
- From here you can get things like average like count or most popular tags.

In [12]:

def get_video_details(youtube,video_list):
    '''
    Grabs data for all videos in a channel
    
    youtube: youtube API build()
    video_list: list containing all unique video IDs of data you want to grab
    
    returns: a list of dictionaries, each dictionary contains stats for a unique video
    
    '''
    stats_list = []
    all_stats = []
    #because YT only lets you grab 50 videos at a time
    #need to jump 0-49, 50-99 etc (count by 50)

    for i in range(0,len(video_list),50):
        request = youtube.videos().list(
            part = "snippet,contentDetails,status,statistics",
            id = video_list[i:i+50] #non inclusive, will grab 0-49
        )
        
        data = request.execute()
        
        for video in data["items"]:
            
            title = video["snippet"]['title']
            published = video['snippet']['publishedAt']
            description = video['snippet']['description']
            tags = video["snippet"].get('tags',[]) #how many tags video has bc 'tags' is a list
            postingDate = video['snippet'].get('publishedAt',None)
            description = video['snippet'].get('description',None)
            
            
            # .get ensures that if the info is unavailable (private etc), it won't throw an error, but put 0
            viewCount = video["statistics"].get("viewCount",0)
            likeCount = video["statistics"].get("likeCount",0)
            #dislike count is always private to public I think 
            dislikeCount = video["statistics"].get("dislikeCount","private")
            commentCount = video["statistics"].get("commentCount",0)
            
            duration = video["contentDetails"].get("duration",0)
           
            
            made_for_kids = video['status'].get('madeForKids',None)
            
            #makes dictionary for each video with stas
            stats_dictionary = dict(title=title, 
                                    published=published,
                                    description = description,
                                    tags = tags,
                                    viewCount = viewCount,
                                    likeCount = likeCount,
                                    dislikeCount = dislikeCount,
                                    commentCount = commentCount,
                                    duration = duration,
                                    postingDate = postingDate,
                                    made_for_kids = made_for_kids
                                    
                                    
            )
            stats_list.append(stats_dictionary)
            
    return pd.DataFrame(stats_list)


In [13]:
# dataframe containing stats of all videos of a channel
get_video_details(youtube,video_ids)

Unnamed: 0,title,published,description,tags,viewCount,likeCount,dislikeCount,commentCount,duration,postingDate,made_for_kids
0,The Great Battle of Twitch Chat vs Youtube Chat,2023-05-17T19:00:14Z,Only Twitch Chat is allowed to be good at vide...,"[DougDoug, DougDoug Youtube Channel, Channel Y...",596961,48032,private,1651,PT41M36S,2023-05-17T19:00:14Z,False
1,DougDoug buys groceries for Twitch Chat,2023-05-15T19:00:00Z,You've all gotta help me carry these bags from...,"[DougDoug, DougDoug Youtube Channel, Channel Y...",433319,45746,private,239,PT58S,2023-05-15T19:00:00Z,False
2,Can A.I. teach me to pass a real College Histo...,2023-05-09T19:00:32Z,"I will get the highest test scores of anybody,...","[DougDoug, DougDoug Youtube Channel, Channel Y...",838314,38909,private,1765,PT33M30S,2023-05-09T19:00:32Z,False
3,"GTA 5's most chaotic mod, but if I break the l...",2023-04-12T19:00:08Z,"I follow the law, no matter what. \n\nStreamin...","[DougDoug, DougDoug Youtube Channel, Channel Y...",2079901,117574,private,3806,PT25M27S,2023-04-12T19:00:08Z,False
4,"City Skylines, but I elected Twitch Chat as Mayor",2023-04-05T19:31:27Z,We're gonna need more trains. (DISCLAIMER: PAR...,"[DougDoug, DougDoug Youtube Channel, Channel Y...",2241610,94595,private,2443,PT28M36S,2023-04-05T19:31:27Z,False
...,...,...,...,...,...,...,...,...,...,...,...
150,Hearthstone Sounds in Serious Situations,2015-10-21T17:30:00Z,Made a quick little video to do something a bi...,"[Hearthstone: Heroes Of Warcraft (Video Game),...",292728,6497,private,165,PT1M18S,2015-10-21T17:30:00Z,False
151,[Hearthstone] The Story of Rappy the Raptor,2015-10-19T17:30:00Z,Come hear the story of my best friend Rappy.\n...,"[Hearthstone: Heroes Of Warcraft (Video Game),...",77360,2227,private,154,PT2M32S,2015-10-19T17:30:00Z,False
152,[Hearthstone] Jaina and Anduin get back together,2015-10-12T17:30:01Z,this shouldn't exist\n\n\nMusic: https://www.y...,"[Hearthstone: Heroes Of Warcraft (Video Game),...",75633,1798,private,125,PT2M4S,2015-10-12T17:30:01Z,False
153,[Hearthstone] Secret Card Ideas - Episode 1,2015-10-05T17:00:00Z,Check out my stupid concepts for some brand ne...,"[Hearthstone: Heroes Of Warcraft (Video Game),...",55772,1972,private,120,PT3M18S,2015-10-05T17:00:00Z,False
