# Data Extraction from YouTube API

## **PLEASE NOTE**:
**The examples used in this code are all public data on YouTube website and the only use case here is just for educational purposes and not commercial.  
If a part of data is related to people's opinion like comments in YouTube then the content of the data itself will be not shown in this notebook.  
This Tutorial is simply for teaching how to access your desired data and nothing more.**

In this task the goal is to extract some data from YouTube API, Data can be title of videos of a channel or comments of a specific video with their number of likes and replies or ...

For accessing this API first we need to:
1. Go to Google Cloud Console
2. Make a new project
3. In API Library section search for ' YouTube Data API v3 ' and enable it
4. Then in API and sercives section --> Credentials --> CREATE CREDENTIALS --> API key 
5. After creating an API key copy this and go to a jupyter notebook.

Let's start by getting some video titles of a YouTube Channel, for example here from Allianz page.   
**Note**: If needed first install googleapiclient

In [9]:
# Import the necessary module for making API requests
from googleapiclient.discovery import build

# API key obtained from Google Cloud Console, required for authentication
api_key = 'YOUR_API_KEY_HERE'  

# Create a YouTube API client using the developer key
youtube = build('youtube', 'v3', developerKey=api_key)

# Construct a request to retrieve the statistics of a YouTube channel
request = youtube.channels().list(
    part='statistics',  # We want to retrieve statistics data
    forUsername='datev'  # Replace with the desired channel's username
)

# Execute the request and store the response
response = request.execute()

# Construct a request to retrieve the channel's content details based on its username
channel_request = youtube.channels().list(
    part='contentDetails',  # We want to retrieve content details
    forUsername='Allianz'  # Replace with the desired channel's username
)

# Execute the channel request and store the response
channel_response = channel_request.execute()

# Extract the channel ID from the response
channel_id = channel_response['items'][0]['id']

# Construct a request to retrieve the content details of the channel, including playlists
playlist_request = youtube.channels().list(
    part='contentDetails',  # We want to retrieve content details
    id=channel_id  # Use the extracted channel ID
)

# Execute the playlist request and store the response
playlist_response = playlist_request.execute()

# Extract the playlist ID for the uploaded videos from the channel
uploads_playlist_id = playlist_response['items'][0]['contentDetails']['relatedPlaylists']['uploads']

# Construct a request to retrieve videos from the uploads playlist
videos_request = youtube.playlistItems().list(
    part='snippet',  # We want to retrieve video snippet data
    playlistId=uploads_playlist_id,  # Use the extracted playlist ID
    maxResults=50  # Set the maximum number of results to retrieve
)

# Execute the videos request and store the response
videos_response = videos_request.execute()

# Extract video titles from the response
video_titles = []
for item in videos_response['items']:
    video_title = item['snippet']['title']
    video_titles.append(video_title)


print(f'Number of video titles: {len(video_titles)}')
video_titles

Number of video titles: 50


['Diversity, Equity and Inclusion at Allianz',
 'Allianz Global Benefits',
 'MoveNow Camp – Esports Edition 2023 @ Olympic Esports Week',
 'Allianz Financial Results 2Q 2023: Media Conference Call',
 'Allianz Financial Results 2Q 2023: Analyst Call',
 'Allianz Financial Results 2Q 2023',
 'Get ready for that big break',
 'Eine Minute zum Ankommen',
 'A Minute To Arrive',
 'Tomorrow - A podcast by Allianz Research: Walking the talk on green monetary policy',
 'The Squared Ball: A symbol for what women must overcome to play professional football',
 'Pass The Squared Ball',
 'Get ready for a new roommate',
 'Get ready to move',
 'Future Workout Insurance Intervals – Learning Chapter 1',
 'Future Workout Insurance Intervals – Learning Chapter 2',
 'Future Workout Insurance Intervals – Learning Chapter 3',
 'Allianz Future Workout with Thomas Röhler – Introduction',
 'Allianz Future Workout with Thomas Röhler – Starting Blocks',
 'Allianz Future Workout with Thomas Röhler – Investment',
 'A

# Extracting Comments, Likes and Replies

We need to pick a video on YouTube and put the url of that video into the variable 'video_url' in the following code.

In [10]:
# Import necessary libraries
import time
import pandas as pd
from googleapiclient.discovery import build

# Replace with your own API key
api_key = 'YOUR_API_KEY_HERE'  

# Create a YouTube API client using the developer key
youtube = build('youtube', 'v3', developerKey=api_key)

# Function to retrieve comments from a YouTube video
def retrieve_comments(video_id):
    comments_data = []  # List to store comments and related data
    next_page_token = None  # Token for paginated API responses

    while True:
        # Construct a request to retrieve comment threads for the video
        comments_request = youtube.commentThreads().list(
            part='snippet',
            videoId=video_id,
            textFormat='plainText',
            maxResults=100,  # Maximum number of comments per page
            pageToken=next_page_token  # Use token for pagination
        )
        # Execute the comments request and store the response
        comments_response = comments_request.execute()

        # Iterate through comments in the response
        for item in comments_response['items']:
            comment_snippet = item['snippet']['topLevelComment']['snippet']
            comment_text = comment_snippet['textDisplay']  # Text of the comment
            comment_likes = comment_snippet['likeCount']  # Number of likes on the comment

            replies_data = []  # List to store reply texts
            # Construct a request to retrieve replies to the current comment
            reply_request = youtube.comments().list(
                part='snippet',
                parentId=item['id'],  # ID of the current comment
                maxResults=100  # Maximum number of replies per page
            )
            # Execute the reply request and store the response
            reply_response = reply_request.execute()

            # Iterate through replies in the response
            for reply_item in reply_response['items']:
                reply_text = reply_item['snippet']['textDisplay']
                replies_data.append(reply_text)

            # Store comment data and related replies in the comments_data list
            comments_data.append({
                'comment': comment_text,
                'likes': comment_likes,
                'replies': replies_data
            })

        # Retrieve the next page token from the response
        next_page_token = comments_response.get('nextPageToken')
        # Break the loop if there are no more pages or a maximum of 500 comments are retrieved
        if not next_page_token or len(comments_data) >= 500:
            break

        time.sleep(2)  # Add a delay between API requests to avoid rate limits

    return comments_data

# Main function
def main():
    video_url = 'Paste the video url here'
    video_id = video_url.split('v=')[1]  # Extract video ID from the URL

    # Call the retrieve_comments function to get comments data
    comments_data = retrieve_comments(video_id)
    
    # Create a DataFrame using the comments data
    df = pd.DataFrame(comments_data)

    return df

# Entry point of the script
if __name__ == "__main__":
    df = main()


The Result will be store in a DataFrame called 'df' which has 3 columns ['comment', 'likes', 'replies'].  
'comment' contains text of each comment  
'like' contains the number of likes each comment received  
'replies contains' a list of all replies each comment received  

Finally let's save the DataFrame as a csv file:

In [None]:
df.to_csv('YouTube_comments_likes_replies.csv', index=False)

Reference:  
https://developers.google.com/youtube/v3