## Scraping YouTube Data using YouTube API

### Install Require Libraries

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

### Import Modules

In [1]:
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

import urllib.parse as p
import re
import os
import pickle

SCOPES = ["https://www.googleapis.com/auth/youtube.force-ssl"]

### Authenticate YouTube API

In [2]:
def youtube_authenticate():
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
    api_service_name = "youtube"
    api_version = "v3"
    client_secrets_file = "credentials.json"
    creds = None
    # the file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first time
    if os.path.exists("token.pickle"):
        with open("token.pickle", "rb") as token:
            creds = pickle.load(token)
    # if there are no (valid) credentials availablle, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(client_secrets_file, SCOPES)
            creds = flow.run_local_server(port=0)
        # save the credentials for the next run
        with open("token.pickle", "wb") as token:
            pickle.dump(creds, token)

    return build(api_service_name, api_version, credentials=creds)

# authenticate to YouTube API
youtube = youtube_authenticate()

#### Extract Video ID

In [3]:
def get_video_id_by_url(url):
    """
    Return the Video ID from the video `url`
    """
    # split URL parts
    parsed_url = p.urlparse(url)
    # get the video ID by parsing the query of the URL
    video_id = p.parse_qs(parsed_url.query).get("v")
    if video_id:
        return video_id[0]
    else:
        raise Exception(f"Wasn't able to parse video URL: {url}")

#### Get YouTube Service object

The below function gets a YouTube service object (returned from youtube_authenticate() function), as well as any keyword argument accepted by the API, and returns the API response for a specific video.

We specified part of **snippet, contentDetails and statistics**, as these are the most important parts of the response in the API.

In [4]:
def get_video_details(youtube, **kwargs):
    return youtube.videos().list(
        part="snippet,contentDetails,statistics",
        **kwargs
    ).execute()

Next, let's define a function that takes a response returned from the above get_video_details() function, and prints the most useful information from a video

In [5]:
def print_video_infos(video_response):
    items = video_response.get("items")[0]
    # get the snippet, statistics & content details from the video response
    snippet         = items["snippet"]
    statistics      = items["statistics"]
    content_details = items["contentDetails"]
    # get infos from the snippet
    channel_title = snippet["channelTitle"]
    title         = snippet["title"]
    description   = snippet["description"]
    publish_time  = snippet["publishedAt"]
    # get stats infos
    comment_count = statistics["commentCount"]
    like_count    = statistics["likeCount"]
    view_count    = statistics["viewCount"]
    # get duration from content details
    duration = content_details["duration"]
    # duration in the form of something like 'PT5H50M15S'
    # parsing it to be something like '5:50:15'
    parsed_duration = re.search(f"PT(\d+H)?(\d+M)?(\d+S)", duration).groups()
    duration_str = ""
    for d in parsed_duration:
        if d:
            duration_str += f"{d[:-1]}:"
    duration_str = duration_str.strip(":")
    print(f"""\
    Title: {title}
    Description: {description}
    Channel Title: {channel_title}
    Publish time: {publish_time}
    Duration: {duration_str}
    Number of comments: {comment_count}
    Number of likes: {like_count}
    Number of views: {view_count}
    """)

#### Finally, let's use these functions to extract information from a demo video

In [6]:
video_url = "https://www.youtube.com/watch?v=jNQXAC9IVRw&ab_channel=jawed"
# parse video ID from URL
video_id = get_video_id_by_url(video_url)
# make API call to get video info
response = get_video_details(youtube, id=video_id)
# print extracted video infos
print_video_infos(response)

    Title: Me at the zoo
    Description: 
    Channel Title: jawed
    Publish time: 2005-04-24T03:31:52Z
    Duration: 19
    Number of comments: 11257889
    Number of likes: 12393490
    Number of views: 241604046
    


### Searching By Keyword

Searching using YouTube API is straightforward; we simply pass **q** parameter for query, the same query we use in the YouTube search bar.

This time we care about the snippet, and we use search() instead of videos() like in the previously defined get_video_details() function.

In [7]:
def search(youtube, **kwargs):
    return youtube.search().list(
        part="snippet",
        **kwargs
    ).execute()

Let's, for example, search for "python" and limit the results to only 2

In [8]:
# search for the query 'python' and retrieve 2 items only
response = search(youtube, q="python", maxResults=2)
items = response.get("items")
for item in items:
    # get the video ID
    video_id = item["id"]["videoId"]
    # get the video details
    video_response = get_video_details(youtube, id=video_id)
    # print the video details
    print_video_infos(video_response)
    print("="*50)

    Title: Python Tutorial - Python Full Course for Beginners
    Description: Python tutorial - Python full course for beginners - Go from Zero to Hero with Python (includes machine learning & web development projects).
🔥 Want to master Python? Get my Python mastery course: http://bit.ly/35BLHHP
👍 Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF

👉 Watch the new edition: https://youtu.be/kqtD5dpn9C8

📕 Get my FREE Python cheat sheet: http://bit.ly/2Gp80s6

Want to learn more from me? 

Courses: https://codewithmosh.com
Twitter: https://twitter.com/moshhamedani
Facebook: https://www.facebook.com/programmingwithmosh/
Blog: http://programmingwithmosh.com

#Python, #MachineLearning, #WebDevelopment

🔗 Supplementary Materials (Spreadsheet): https://bit.ly/3cb2YNo

📔 Python Exercises for Beginners: https://goo.gl/1XnQB1

⭐ My Favorite Python Books
- Python Crash Course: https://amzn.to/2GqMdjG
- Automate the Boring Stuff with Python: https://amzn.to/2N71d6S
- A Smarter W

You can also specify the order parameter in search() function to order search results, which can be 'date', 'rating', 'viewCount', 'relevance' (default), 'title', and 'videoCount'.

Another useful parameter is the type, which can be 'channel', 'playlist' or 'video', default is all of them.

### Getting YouTube Channel Details

This section will take a channel URL and extract channel information using YouTube API.

First, we need helper functions to parse the channel URL. The below functions will help us to do that:

In [9]:
def parse_channel_url(url):
    """
    This function takes channel `url` to check whether it includes a
    channel ID, user ID or channel name
    """
    path = p.urlparse(url).path
    id = path.split("/")[-1]
    if "/c/" in path:
        return "c", id
    elif "/channel/" in path:
        return "channel", id
    elif "/user/" in path:
        return "user", id

def get_channel_id_by_url(youtube, url):
    """
    Returns channel ID of a given `id` and `method`
    - `method` (str): can be 'c', 'channel', 'user'
    - `id` (str): if method is 'c', then `id` is display name
        if method is 'channel', then it's channel id
        if method is 'user', then it's username
    """
    # parse the channel URL
    method, id = parse_channel_url(url)
    if method == "channel":
        # if it's a channel ID, then just return it
        return id
    elif method == "user":
        # if it's a user ID, make a request to get the channel ID
        response = get_channel_details(youtube, forUsername=id)
        items = response.get("items")
        if items:
            channel_id = items[0].get("id")
            return channel_id
    elif method == "c":
        # if it's a channel name, search for the channel using the name
        # may be inaccurate
        response = search(youtube, q=id, maxResults=1)
        items = response.get("items")
        if items:
            channel_id = items[0]["snippet"]["channelId"]
            return channel_id
    raise Exception(f"Cannot find ID:{id} with {method} method")

Now we can parse the channel URL. Let's define our functions to call the YouTube API

In [10]:
def get_channel_videos(youtube, **kwargs):
    return youtube.search().list(
        **kwargs
    ).execute()


def get_channel_details(youtube, **kwargs):
    return youtube.channels().list(
        part="statistics,snippet,contentDetails",
        **kwargs
    ).execute()

In [11]:
channel_url = "https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ"
# get the channel ID from the URL
channel_id = get_channel_id_by_url(youtube, channel_url)
# get the channel details
response = get_channel_details(youtube, id=channel_id)
# extract channel infos
snippet = response["items"][0]["snippet"]
statistics = response["items"][0]["statistics"]
channel_country = snippet["country"]
channel_description = snippet["description"]
channel_creation_date = snippet["publishedAt"]
channel_title = snippet["title"]
channel_subscriber_count = statistics["subscriberCount"]
channel_video_count = statistics["videoCount"]
channel_view_count  = statistics["viewCount"]
print(f"""
Title: {channel_title}
Published At: {channel_creation_date}
Description: {channel_description}
Country: {channel_country}
Number of videos: {channel_video_count}
Number of subscribers: {channel_subscriber_count}
Total views: {channel_view_count}
""")
# the following is grabbing channel videos
# number of pages you want to get
n_pages = 2
# counting number of videos grabbed
n_videos = 0
next_page_token = None
for i in range(n_pages):
    params = {
        'part': 'snippet',
        'q': '',
        'channelId': channel_id,
        'type': 'video',
    }
    if next_page_token:
        params['pageToken'] = next_page_token
    res = get_channel_videos(youtube, **params)
    channel_videos = res.get("items")
    for video in channel_videos:
        n_videos += 1
        video_id = video["id"]["videoId"]
        # easily construct video URL by its ID
        video_url = f"https://www.youtube.com/watch?v={video_id}"
        video_response = get_video_details(youtube, id=video_id)
        print(f"================Video #{n_videos}================")
        # print the video details
        print_video_infos(video_response)
        print(f"Video URL: {video_url}")
        print("="*40)
    print("*"*100)
    # if there is a next page, then add it to our parameters
    # to proceed to the next page
    if "nextPageToken" in res:
        next_page_token = res["nextPageToken"]


Title: freeCodeCamp.org
Published At: 2014-12-16T21:18:48Z
Description: Learn to code for free.
Country: US
Number of videos: 1336
Number of subscribers: 6060000
Total views: 404538542

    Title: Intro to Object Oriented Programming - Crash Course
    Description: Learn the basics of object-oriented programming all in one video.

✏️ Course created by Steven from NullPointer Exception. Check out their channel: https://www.youtube.com/channel/UCmWDlvMYYEbW42B8JyxFBcA

🎥 Introduction to Programming: https://www.youtube.com/watch?v=zOjov-2OZ0E

⭐️ Course Contents ⭐️
⌨️ (00:00) Introduction
⌨️ (07:37) Encapsulation
⌨️ (12:45) Abstraction
⌨️ (17:49) Inheritance
⌨️ (22:47) Polymorphism

⭐️ Sources ⭐️
🔗 https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
🔗 https://stackify.com/oop-concept-for-beginners-what-is-encapsulation/#:~:text=Encapsulation%20is%20one%20of%20the,an%20object%20from%20the%20outside
🔗 https://press.rebus.community/programmingfundamentals/chapter/encap

    Title: HTML Full Course - Build a Website Tutorial
    Description: Learn the basics of HTML5 and web development in this awesome course for beginners. 

Want more from Mike? He's starting a coding RPG/Bootcamp - https://simulator.dev/

⭐️ Contents ⭐️
⌨️ (0:00:00) Introduction
⌨️ (0:01:54) Choosing a Text Editor
⌨️ (0:08:13) Creating an HTML file
⌨️ (0:20:31) Basic Tags
⌨️ (0:36:47) Comments
⌨️ (0:42:13) Style & Color
⌨️ (0:48:07) Formatting a Page
⌨️ (0:59:16) Links
⌨️ (1:07:33) Images
⌨️ (1:16:12) Videos & Youtube iFrames
⌨️ (1:23:00) Lists
⌨️ (1:28:53) Tables
⌨️ (1:37:21) Divs & Spans
⌨️ (1:44:54) Input & Forms
⌨️ (1:53:44) iFrames
⌨️ (1:57:21) Meta Tags

Course developed by Mike Dane. Check out his YouTube channel for more great programming courses: https://www.youtube.com/channel/UCvmINlrza7JHB1zkIOuXEbw

🐦Follow Mike on Twitter - https://twitter.com/mike_dane

🔗The Mike's website: https://www.mikedane.com/

⭐️Other full courses by Mike Dane on our channel ⭐️
💻Python: https://

****************************************************************************************************


We first get the channel ID from the URL, and then we make an API call to get channel details and print them.

After that, we specify the number of pages of videos we want to extract. The default is ten videos per page, and we can also change that by passing the  maxResults parameter.

We iterate on each video and make an API call to get various information about the video, and we use our predefined print_video_infos() to print the video information.

### Extracting YouTube Comments

YouTube API allows us to extract comments; this is useful if you want to get comments for your text classification project or something similar.

The below function takes care of making an API call to commentThreads():



In [12]:
def get_comments(youtube, **kwargs):
    return youtube.commentThreads().list(
        part="snippet",
        **kwargs
    ).execute()

#### The below code extracts comments from a YouTube video:

In [13]:
# URL can be a channel or a video, to extract comments
url = "https://www.youtube.com/watch?v=jNQXAC9IVRw&ab_channel=jawed"
if "watch" in url:
    # that's a video
    video_id = get_video_id_by_url(url)
    params = {
        'videoId': video_id, 
        'maxResults': 2,
        'order': 'relevance', # default is 'time' (newest)
    }
else:
    # should be a channel
    channel_id = get_channel_id_by_url(url)
    params = {
        'allThreadsRelatedToChannelId': channel_id, 
        'maxResults': 2,
        'order': 'relevance', # default is 'time' (newest)
    }
# get the first 2 pages (2 API requests)
n_pages = 2
for i in range(n_pages):
    # make API call to get all comments from the channel (including posts & videos)
    response = get_comments(youtube, **params)
    items = response.get("items")
    # if items is empty, breakout of the loop
    if not items:
        break
    for item in items:
        comment = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
        updated_at = item["snippet"]["topLevelComment"]["snippet"]["updatedAt"]
        like_count = item["snippet"]["topLevelComment"]["snippet"]["likeCount"]
        comment_id = item["snippet"]["topLevelComment"]["id"]
        print(f"""\
        Comment: {comment}
        Likes: {like_count}
        Updated At: {updated_at}
        ==================================\
        """)
    if "nextPageToken" in response:
        # if there is a next page
        # add next page token to the params we pass to the function
        params["pageToken"] =  response["nextPageToken"]
    else:
        # must be end of comments!!!!
        break
    print("*"*70)

        Comment: We&#39;re so honored that the first ever YouTube video was filmed here!
        Likes: 2581700
        Updated At: 2020-02-17T18:58:15Z
        Comment: Happy Anniversary to the first YouTube video! I bet you had no idea how momentous this content would be &amp; how culturally significant this platform would become! Thank you for creating something that helped entertain, inform &amp; connected the world since its inception in 2005! we Love you jawed 😉 👻...
        Likes: 299
        Updated At: 2022-08-07T13:30:30Z
**********************************************************************
        Comment: Feliz aniversário para o primeiro vídeo do YouTube! Aposto que você não tinha ideia de quão importante esse conteúdo seria e quão culturalmente significativo essa plataforma se tornaria! Obrigado por criar algo que ajudou a entreter, informar e conectar o mundo desde a sua criação em 2005!
        Likes: 831
        Updated At: 2022-08-08T02:27:27Z
<br>subir este primer v

You can also change url variable to be a YouTube channel URL so that it will pass allThreadsRelatedToChannelId instead of videoId as a parameter to commentThreads() API.