<a href="https://colab.research.google.com/github/anezm12/GoogleApi/blob/main/etl_extracting_youtube_comments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Getting comments from a YouTube Video with Google API</h1>

<h2>Installation of libraries and packages</h2>

In [None]:
pip install --upgrade google-api-python-client


Collecting google-api-python-client
  Downloading google_api_python_client-2.105.0-py2.py3-none-any.whl (12.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.6/12.6 MB[0m [31m50.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: google-api-python-client
  Attempting uninstall: google-api-python-client
    Found existing installation: google-api-python-client 2.84.0
    Uninstalling google-api-python-client-2.84.0:
      Successfully uninstalled google-api-python-client-2.84.0
Successfully installed google-api-python-client-2.105.0


In [None]:
pip install pandas



In [1]:
from googleapiclient.discovery import build

import pandas as pd

import os

<h2>Script</h2>

<h3>Defining the Key </h3>
<p>
This Key is created in https://console.cloud.google.com/ as is mentioned in the Google API Doc

</p>

In [None]:
# For security is always good practice save those key into a enviroemnt variables

api_key = 'YOUR_API_KEY'

<h3>Getting comments in a JSON </h3>

In [None]:
def get_youtube_video_comments(api_key, video_id):
    """
    Retrieves comments for a YouTube video using the YouTube Data API.

    Args:
        api_key (str): Your YouTube Data API key.
        video_id (str): The ID of the YouTube video for which you want to retrieve comments.

    Returns:
        dict: The response from the YouTube Data API, including comments for the video.
    """
    # Build a client to access the YouTube Data API
    youtube = build('youtube', 'v3', developerKey=api_key)

    # Create a request to fetch video comments
    request = youtube.commentThreads().list(
        part="snippet,replies",
        maxResults=100,  # Maximum number of comments to retrieve
        moderationStatus="published",  # Only retrieve published comments
        order="time",  # Order comments by time
        videoId=video_id  # Specify the video by its ID
    )

    # Execute the request and get the response
    response = request.execute()

    # Return the response to the caller
    return response

# Usage example
api_key = 'YOUR_API_KEY'  # Replace with your YouTube Data API key
video_id = 'VIDIO_ID'  # Replace with the ID of the YouTube video you're interested in

# Call the function to get video comments and print the result
result = get_youtube_video_comments(api_key, video_id)
print(result)

<h4>View JSON</h4>

<p>
In order to understand the structure of the JSON file we need to see the Keys in a readable format.
</p>

In [None]:
import json

# Format the JSON response for readability
formatted_response = json.dumps(response, indent=4)

# Print the formatted JSON response
print(formatted_response)

<h3>Refineing the desires key values</h3>

In [None]:
def process_youtube_comments(response):
    """
    Process and refine comments from a YouTube API response.

    Args:
        response (dict): The JSON response from the YouTube API containing comment data.

    Returns:
        list: A list of dictionaries, each representing a refined comment with author, text, and date.
    """
    comments_list = []

    # Iterate through items in the API response
    for item in response["items"]:
        topLevelComment = item["snippet"]["topLevelComment"]["snippet"]
        textDisplay = topLevelComment["textDisplay"]
        authorDisplayName = topLevelComment["authorDisplayName"]
        publishedAt = topLevelComment["publishedAt"]

        # Create a refined comment dictionary
        refined_comment = {
            'author': authorDisplayName,
            'text': textDisplay,
            'date': publishedAt
        }

        # Append the refined comment to the list
        comments_list.append(refined_comment)

    # Return the list of refined comments
    return comments_list


<h3>Working with Pandas Library</h3>

In [None]:
# Convert the list of refined comments into a DataFrame
df = pd.DataFrame(comments_list)

# Convert the "date" column to datetime format using the specified 'ISO8601' format
df["date"] = pd.to_datetime(df["date"], format='ISO8601')

# Format the "date" column into ISO 8601 date format (e.g., "YYYY-MM-DD")
df["date"] = df["date"].dt.strftime("%Y-%m-%d")

# Save the DataFrame to a CSV file named "YouTube video comments.csv" without an index column
df.to_csv("YouTube video comments.csv", index=False)
