## Data Scraping from YouTube using Data API

In [1]:
import pandas as pd
import requests
import json

### Objective: 
#### For a given "YouTube channel ID"  we need to return the below mentioned fields for each video in it. 
#### Our final data should have 
        1. video_id
        2. channel_id
        3. published_date 
        4. video_title 
        5. video_description 
        6. likes 
        7. dislikes 
        8. views
        9. comment_count
        
#### Note: we need to scrape the data legally

In [2]:
#Our final data should be saved in this Data Frame
data_df = pd.DataFrame(columns=['video_id','channel_id','published_date',
                             'video_title','video_description',
                             'likes','dislikes','views','comment_count'])
data_df.head()

Unnamed: 0,video_id,channel_id,published_date,video_title,video_description,likes,dislikes,views,comment_count





#### ![image-2.png](attachment:image-2.png)

### To access this API
#### 1. We need a API key (free of cost)
(Goto: https://console.developers.google.com/apis/)


In [None]:
api_key = 'AIzaSyCI9-pNO5gYGPT8u2VUY709ALdD_2GX-5s' #Replace with your own API KEY 

#### 2. We need to call API (as per our needs)

(Goto:https://developers.google.com/youtube/v3)

In [None]:
#First we will check API response with "ChannelID"
channel_Id = 'UCpVm7bg6pXKo1Pr6k5kxG9A'
url = f"https://www.googleapis.com/youtube/v3/channels?part=statistics&key={api_key}&id={channel_Id}"
print(url)

https://www.googleapis.com/youtube/v3/channels?part=statistics&key=AIzaSyCI9-pNO5gYGPT8u2VUY709ALdD_2GX-5s&id=UCpVm7bg6pXKo1Pr6k5kxG9A


In [None]:
response_data = requests.get(url)
json_data = json.loads(response_data.text)

In [None]:
#Now we will check the API response with "Video ID"
video_Id = "RYJX_ovILjk"
url = f"https://www.googleapis.com/youtube/v3/videos?part=statistics,snippet&key={api_key}&id={video_Id}"
print(url)

https://www.googleapis.com/youtube/v3/videos?part=statistics,snippet&key=AIzaSyCI9-pNO5gYGPT8u2VUY709ALdD_2GX-5s&id=RYJX_ovILjk


In [None]:
response_data = requests.get(url)
json_data = json.loads(response_data.text)

In [None]:
json_data

{'kind': 'youtube#videoListResponse',
 'etag': '1SKocShdaa-Qw1eELVDMH0RLgyU',
 'items': [{'kind': 'youtube#video',
   'etag': 'h-xzadUS0Wey5mTzDNEBz-JI2wU',
   'id': 'RYJX_ovILjk',
   'snippet': {'publishedAt': '2020-08-17T15:00:11Z',
    'channelId': 'UCpVm7bg6pXKo1Pr6k5kxG9A',
    'title': 'A Case of Mistaken Identity | Shark vs Surfer',
    'description': "A surfer off the coast of Oahu was attacked by a tiger shark in 2017, but why did this attack occur? Professor Stephen Kajiura explains how it may have been a case of mistaken identity.\n➡ Subscribe: http://bit.ly/NatGeoSubscribe\n➡ Get more SharkFest: https://on.natgeo.com/2kISTAt\n\nAbout SharkFest:\nPut on your wetsuit to swim with the sharpest of the sea. This is the one-stop-show-shop for all things shark. No fuss, no muss, just killer episodes.\n\nAbout National Geographic:\nNational Geographic is the world's premium destination for science, exploration, and adventure. Through their world-class scientists, photographers, jou

### Now Let's work on our Objective

In [None]:
url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50"
print(url)
response_data = requests.get(url)
json_data = json.loads(response_data.text)

https://www.googleapis.com/youtube/v3/search?key=AIzaSyCI9-pNO5gYGPT8u2VUY709ALdD_2GX-5s&part=snippet&channelId=UCpVm7bg6pXKo1Pr6k5kxG9A&maxResults=50


In [None]:
json_data

{'kind': 'youtube#searchListResponse',
 'etag': 'wFoGjqRhgqFTmhL3JukwB7R0LU8',
 'nextPageToken': 'CDIQAA',
 'regionCode': 'IN',
 'pageInfo': {'totalResults': 9778, 'resultsPerPage': 50},
 'items': [{'kind': 'youtube#searchResult',
   'etag': 'xC_-rVaYCFv38fKVi6kUFc0l_a4',
   'id': {'kind': 'youtube#video', 'videoId': 'BL4dnvBytLA'},
   'snippet': {'publishedAt': '2018-02-12T16:00:01Z',
    'channelId': 'UCpVm7bg6pXKo1Pr6k5kxG9A',
    'title': 'Behind-the-Scenes: See How Elon Musk Celebrated the Falcon Heavy Launch | National Geographic',
    'description': 'This exclusive behind-the-scenes clip follows the SpaceX CEO and his team as they witness and celebrate the first launch of Falcon Heavy. ➡ Subscribe: ...',
    'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/BL4dnvBytLA/default.jpg',
      'width': 120,
      'height': 90},
     'medium': {'url': 'https://i.ytimg.com/vi/BL4dnvBytLA/mqdefault.jpg',
      'width': 320,
      'height': 180},
     'high': {'url': 'https://i.y

#### First we need to collect VideoIds 

In [None]:
limit = 5 
video_Ids = []
nextPageToken ="" #for 0th iteration let it be null
for i in range(limit):
    url = f"https://www.googleapis.com/youtube/v3/search?key={api_key}&part=snippet&channelId={channel_Id}&maxResults=50&pageToken={nextPageToken}"
    data = json.loads(requests.get(url).text)
    for item in data['items']: 
        video_Id = item['id']['videoId']
        video_Ids.append(video_Id)           
    nextPageToken = data['nextPageToken']

#### Now record the required fileds for each video

In [None]:
for i,video_Id in enumerate(video_Ids):
    url = f"https://www.googleapis.com/youtube/v3/videos?part=statistics,snippet&key={api_key}&id={video_Id}"
    data = json.loads(requests.get(url).text)
    channel_id = data['items'][0]['snippet']['channelId']      
    published_date = data['items'][0]['snippet']['publishedAt']    
    video_title =  data['items'][0]['snippet']['title']     
    video_description = data['items'][0]['snippet']['description']
    likes =  data["items"][0]["statistics"]["likeCount"]
    dislikes = data["items"][0]["statistics"]["dislikeCount"]
    views = data["items"][0]["statistics"]["viewCount"]
    comment_count = data["items"][0]["statistics"]['commentCount']
    row = [video_Id,channel_id,published_date,
           video_title,video_description,
           likes,dislikes,views,comment_count]
    data_df.loc[i]=row

In [None]:
data_df

Unnamed: 0,video_id,channel_id,published_date,video_title,video_description,likes,dislikes,views,comment_count
0,BL4dnvBytLA,UCpVm7bg6pXKo1Pr6k5kxG9A,2018-02-12T16:00:01Z,Behind-the-Scenes: See How Elon Musk Celebrate...,This exclusive behind-the-scenes clip follows ...,78419,901,2804302,2165
1,7tKZB2k14iY,UCpVm7bg6pXKo1Pr6k5kxG9A,2015-06-27T19:00:01Z,Jealousy Bites | Brain Games,There’s a clear distinction between a human br...,101602,2175,5369026,6948
2,tnvbVIcZZHc,UCpVm7bg6pXKo1Pr6k5kxG9A,2012-01-09T14:02:33Z,Killer Cuckoo Catfish | National Geographic,A mother cichlid fish protects her young by ke...,27191,877,3863697,2096
3,ZyYqyYAKGC0,UCpVm7bg6pXKo1Pr6k5kxG9A,2017-03-24T17:00:07Z,Time Is But a Stubborn Illusion - Sneak Peek |...,Watch an exclusive sneak peek from the first e...,114180,1849,6465115,4580
4,_TuGK7IS8sY,UCpVm7bg6pXKo1Pr6k5kxG9A,2013-05-07T16:07:36Z,Apollo Robbins on What You Don't Know | Brain ...,Deception specialist Apollo Robbins has many w...,3444,54,536464,119
...,...,...,...,...,...,...,...,...,...
245,ITlo2ZBJOWU,UCpVm7bg6pXKo1Pr6k5kxG9A,2019-05-15T12:00:03Z,Inside the Dark World of Captive Wildlife Tour...,"Cages, speed-breeding, fear-based training. Bl...",21983,769,706823,2295
246,AGWiZLy0YuI,UCpVm7bg6pXKo1Pr6k5kxG9A,2014-05-27T14:40:18Z,Dean Potter BASE Jumps With His Dog | National...,Filmmaker and adventurer Dean Potter doesn't a...,15917,1600,2290474,2438
247,aRIYP-HWqBU,UCpVm7bg6pXKo1Pr6k5kxG9A,2018-03-24T17:00:00Z,Take a Look inside China’s Giant Communal Home...,Tucked in the rolling subtropical mountains of...,5009,187,291619,255
248,ZRN-Lu-m-oY,UCpVm7bg6pXKo1Pr6k5kxG9A,2017-06-15T12:00:02Z,Meet a Beautiful Beetle That Loves to Eat Poop...,Watch an entomologist search beneath piles of ...,2449,61,150896,245
