<h1>Exploratory Analysis of YouTube Channels and Videos Based on Educational and Entertainment Content</h1>

# 1. Introduction | Objectives | Approach | Data Source

## 1.1. Introduction

---YouTube Background info---

YouTube had it's humble beginnings starting out in

The first video uploaded was [Me at the zoo](https://www.youtube.com/watch?v=jNQXAC9IVRw). 

---YouTube Accomplishments---

---My Motivations---

I have been utilizing YouTube as a main source of entertainment and educational content for over a decade. This is my first instance of taking a deeper look at the statistics of top content creators I have followed previously, ocassionally, or currently. This project explores the educational and entertainment content from 10-20 successful YouTube channels.


## 1.2. Objectives

The project will strengthen my understanding and explores the following:

- Learning Youtube API | Navigating documentation | Obtaining video data
- Analyzing common misconceptions of becoming <b>"successful"</b> on YouTube
- Identify trending topics through Natural Language Processing (NLP) approaches


## 1.3. Approach
1. Obtain meta data with Youtube API from 10-15 channels in entertainment and education niches
2. Cleaning raw data and develop additional features for analysis (Pandas)
3. Exploratory analysis with data visualizations (Seaborn | Matplotlib)
4. Conclusions / Findings


## 1.4. Data Source
- Existing datasets online do not have the necessary information to perform exploratory analysis in this project. Some reasons include:
  - Outdated information
- Currently YouTube API service retains information from the past 30 days. The only way
to have more statistics beyond 30 days is to continuously scrape data until the information is required in the future.
 - The downside of this approach growing collection of data


In [1]:
# Imports
from googleapiclient.discovery import build
from dateutil import parser
import pandas as pd
from IPython.display import JSON


# Data visualization packages
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker


# Loading environment variable
import os
from dotenv import load_dotenv
load_dotenv()

# API_Key
api_key = os.environ.get('YOUTUBE_API_KEY')


# Example list of channel IDs
CHANNEL_IDS = [
  "UCX6OQ3DkcsbYNE6H8uQQuVA",     # Mr Beast
  "UC-lHJZR3Gqxm24_Vd_AJ5Yw",     # PewDiePie
  "UCINb0wqPz-A0dV9nARjJlOQ",     # The Dodo
  "UCshoKvlZGZ20rVgazZp5vnQ",     # CaptainSparklez
  "UCY1kMZp36IQSyNx_9h4mpCg",     # Mark Rober
  "UC6nSFpj9HTCZ5t-N3Rm3-HA",     # Vsauce
  "UCiDJtJKMICpb9B1qf7qjEOA",     # Adam Savage's Tested
]


# Example playlist of all videos 
# ****Helpful -- (replace "UC" string from beginning of Channel Id with "UU")****
# https://www.youtube.com/playlist?list=
PLAYLIST_IDS = [
  'UUX6OQ3DkcsbYNE6H8uQQuVA'                # Mr Beast
  'PLoSWVnSA9vG9qV0CVCpg5WwEy3LiP7udY',     # Mr Beast (new uploads)
  'UULF-lHJZR3Gqxm24_Vd_AJ5Yw',             # PewDiePie
  'UUINb0wqPz-A0dV9nARjJlOQ',               # The Dodo
  'UUshoKvlZGZ20rVgazZp5vnQ',               # CaptainSparklez
  "UUY1kMZp36IQSyNx_9h4mpCg",               # Mark Rober
  "UU6nSFpj9HTCZ5t-N3Rm3-HA",               # Vsauce
  "UUiDJtJKMICpb9B1qf7qjEOA",               # Adam Savage's Tested
]

api_service_name = "youtube"
api_version = "v3"
youtube = build(api_service_name, api_version, developerKey=api_key)

In [2]:
def get_channel_stats(youtube, CHANNEL_IDS):
  all_channel_data = []

  # https://developers.google.com/youtube/v3/docs/channels/list
  request = youtube.channels().list(
    part = 'snippet, contentDetails, statistics',
    id= ','.join(CHANNEL_IDS)
  )
  response = request.execute()

  # https://developers.google.com/youtube/v3/docs/channels
  for item in response['items']:
    data = {
      'channelName': item['snippet']['title'],
      'creationDate': item['snippet']['publishedAt'],
      'subscribers': item['statistics']['subscriberCount'],
      'channelViews': item['statistics']['viewCount'],
      'totalVideos': item['statistics']['videoCount'],
      'playlistId': item['contentDetails']['relatedPlaylists']['uploads'],
    }

    all_channel_data.append(data)

  return(pd.DataFrame(all_channel_data))


In [3]:
quick_channels_stats = get_channel_stats(youtube, CHANNEL_IDS)
quick_channels_stats

Unnamed: 0,channelName,creationDate,subscribers,channelViews,totalVideos,playlistId
0,CaptainSparklez,2010-07-20T19:38:14Z,11400000,4049339101,5750,UUshoKvlZGZ20rVgazZp5vnQ
1,The Dodo,2014-03-21T20:50:16Z,14500000,10262637224,8064,UUINb0wqPz-A0dV9nARjJlOQ
2,Adam Savage’s Tested,2010-03-08T19:17:09Z,6360000,1405882070,6399,UUiDJtJKMICpb9B1qf7qjEOA
3,Vsauce,2007-07-30T20:43:33Z,19700000,3081653807,474,UU6nSFpj9HTCZ5t-N3Rm3-HA
4,Mark Rober,2011-10-20T06:17:58Z,25100000,3737344529,121,UUY1kMZp36IQSyNx_9h4mpCg
5,MrBeast,2012-02-20T00:43:50Z,180000000,31292066713,746,UUX6OQ3DkcsbYNE6H8uQQuVA
6,PewDiePie,2010-04-29T10:54:00Z,111000000,29108143374,4718,UU-lHJZR3Gqxm24_Vd_AJ5Yw


## Obtaining Video IDs
This step is crucial as it narrows down where to focus on. Before collecting video Ids, we need to obtain a playlist Id.

On YouTube, an owner's channel has the liberty to create a playlist of videos they have uploaded or videos uploaded by other channels. In this analysis, we will use the default playlist "all uploads". We can obtain this playlist Id by swapping the beginning of the channel Id "UC" with "UU" [<b>REFER TO CELL 1 ABOVE</b>]

Sometimes we do not want all video uploads playlist. Instead, we can use other playlists granted if owner created or not privated them.
<br>*---Important: Channel owner can create a playlist of videos that are not uploaded by themselves / Carefully look at the uploader of video---*

After choosing a playlist, we finally proceed to obtaining video ids. Each video has its own unique id for conclusive identification. Having a playlist ensures that all the videos uploaded from the channel owner is related and theirs.


In [4]:
playlist_id = "UUY1kMZp36IQSyNx_9h4mpCg"      #

def get_video_ids(youtube, playlist_id):
  video_ids = []

  request = youtube.playlistItems().list(
  part='snippet,contentDetails',
  playlistId=playlist_id,
  maxResults=50       # default is 5 video ids
  )
  response = request.execute()

  for item in response['items']:
    video_ids.append(item['contentDetails']['videoId'])
  
  next_page_token = response.get('nextPageToken')
  while next_page_token is not None:
    request = youtube.playlistItems().list(
      part='snippet,contentDetails',
      playlistId=playlist_id,
      maxResults=50,       # default is 5 video ids
      pageToken = next_page_token
    )
    response = request.execute()

    for item in response['items']:
      video_ids.append(item['contentDetails']['videoId'])

    next_page_token = response.get('nextPageToken')

  return video_ids

In [5]:
video_ids = get_video_ids(youtube, playlist_id)
len(video_ids)

121

## Obtaining Video Information
After collecting specific video ids, we can finally dive deeper into the videos themselves and look at the finer details

Each meta data falls under a certain category
Using the following: [--Insert link--], we are able to specify which data/item to request through the YouTube API


In [6]:

def get_video_details(youtube, video_ids):
  all_video_info = []

  for i in range(0, len(video_ids), 50):
    request = youtube.videos().list(
      part='snippet,contentDetails,statistics',
      id=','.join(video_ids[i:i+50])
    )
    response = request.execute()

    for video in response['items']:
      intended_stats = {
        'snippet': ['channelTitle', 'title', 'description', 'tags', 'publishedAt'],
        'contentDetails': ['duration', 'definition', 'caption'],
        'statistics': ['viewCount', 'likeCount', 'favoriteCount', 'commentCount']
      }

      video_info = {}
      video_info['video_id'] = video['id']

      for k in intended_stats.keys():
        for v in intended_stats[k]:
          try:
            video_info[v] = video[k][v]
          except:
            video_info[v] = None

      all_video_info.append(video_info)

  return pd.DataFrame(all_video_info)

In [7]:
video_info_df = get_video_details(youtube, video_ids)
video_info_df

Unnamed: 0,video_id,channelTitle,title,description,tags,publishedAt,duration,definition,caption,viewCount,likeCount,favoriteCount,commentCount
0,lcIObyvI3uw,Mark Rober,This Ball Is Impossible To Hit,,,2023-08-11T22:06:05Z,PT43S,hd,false,6063782,316344,0,453
1,0ENZe0ckmxA,Mark Rober,I Cured @MrBeast’s Fear Of Heights,,,2023-06-29T20:47:19Z,PT1M,hd,false,40467144,3038556,0,4602
2,md75n8cyenA,Mark Rober,How to Escape a Police Sniffing Dog,Scent trailing dogs are indistinguishable from...,,2023-06-21T21:04:23Z,PT27M37S,hd,true,13464253,420977,0,12130
3,-K8xL6laeEU,Mark Rober,Public shaming at a Sharks game,,,2023-06-11T18:55:15Z,PT1M1S,hd,false,24526331,1586367,0,6510
4,1UTjWy-vnOo,Mark Rober,I Gave the 2023 MIT Commencement Speech,,,2023-06-10T14:00:02Z,PT19M31S,hd,true,4710917,191388,0,4219
...,...,...,...,...,...,...,...,...,...,...,...,...,...
116,ZaOw8B_2kWY,Mark Rober,Gorilla lured by iPhone- how-to demo,A simple trick to get you some AWESOME zoo foo...,"[Gorilla, Lured, by, iPhone, How To Video, mir...",2012-02-14T20:16:04Z,PT2M29S,hd,false,1742101,21498,0,178
117,O1P6K_PaS0A,Mark Rober,Make a Gorilla cam- HOW TO,A simple $8 rig that takes 10 minutes and will...,"[How To Video, mirror, gorilla, iphone, monkey...",2012-02-14T19:49:49Z,PT1M27S,hd,false,406438,4734,0,203
118,YdJr1FCB0P4,Mark Rober,Always win at heads/tails- BEST METHOD,A simple trick for flipping a coin that can en...,"[flip a coin, flip a quarter, tails every time...",2012-01-18T09:59:11Z,PT5M54S,hd,false,3195449,67364,0,2119
119,7sj6Gpk3ab4,Mark Rober,Whiteboard Office Darts (using BuckyBalls),A simple alternative to using real darts and p...,"[Office, Whiteboard, darts, neoballs, zen magn...",2011-12-19T08:35:20Z,PT1M50S,hd,false,1305123,20868,0,424
