# Divyansh's Project for Open Source Programming club
The problem statement chosen is extracting the comments of a video from YouTube using API. After that, we extract the comments of a particular user from all those comments.


Note: The original problem statement was to identify the comments of a specified account from youtube, which isnt possible to do using the API. So, We will fix one video with multiple comments of different users and after fetching the comments of that video, we will filter the comments of a particular user.



## Step 1. Fetching the API key from youtube

We need to create a new project in Google Cloud Platform, Then we search for "YouTube Data API v3", and enable that API for our project.

We generate one API key from there. The API Key generated for Divyansh's account is given below and stored in a variable named API_KEY.

In [137]:
API_KEY = "AIzaSyDpFMjUhexMay_yN0KVptSSH70Md-VYZr0"

## Step 2. Importing the necessary Python libraries


In [138]:
import googleapiclient.discovery
import pandas as pd

## Step 3. Creating the YouTube service object from the API key

In [139]:
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = API_KEY

youtube = googleapiclient.discovery.build(
    api_service_name, api_version, developerKey=DEVELOPER_KEY)

The YouTube service object has been created from the API key and is accessible using the variable "youtube".

## Step 4. Noting down the Video ID to list down the comments.

To get Video ID, we should copy the ID part from the youtube video link.

Example: "dK1wXQogTUU" is the Video ID from the youtube video link https://www.youtube.com/watch?v=dK1wXQogTUU

In [140]:
video_id = "aP-XK5fxdcs"

## Step 5. Creating and executing a request to fetch the comments for the video from youtube database through API.

In [141]:
# Returns a list of comment threads that match the API request parameters.
request = youtube.commentThreads().list(
    part="snippet",
    videoId=video_id,
    maxResults=1000
)

# Execute the request.
response = request.execute()

In [142]:
response

{'kind': 'youtube#commentThreadListResponse',
 'etag': '2O66ML0SIhdRiHOaL4WPChgQvOE',
 'nextPageToken': 'Z2V0X25ld2VzdF9maXJzdC0tQ2dnSWdBUVZGN2ZST0JJRkNJa2dHQUFTQlFpZElCZ0JFZ1VJaHlBWUFCSUZDSWdnR0FBWUFDSU9DZ3dJd1lfc3F3WVE2T0RJNWdF',
 'pageInfo': {'totalResults': 100, 'resultsPerPage': 100},
 'items': [{'kind': 'youtube#commentThread',
   'etag': 'arW-PHuPyL9yKyY28IiLJ0CMRk4',
   'id': 'UgwEnEQVmg3fMILDhHJ4AaABAg',
   'snippet': {'channelId': 'UC_aEa8K-EOJ3D6gOs7HcyNg',
    'videoId': 'aP-XK5fxdcs',
    'topLevelComment': {'kind': 'youtube#comment',
     'etag': 'SuiDFzvg54lOFRzk7jbl_1TSigY',
     'id': 'UgwEnEQVmg3fMILDhHJ4AaABAg',
     'snippet': {'channelId': 'UC_aEa8K-EOJ3D6gOs7HcyNg',
      'videoId': 'aP-XK5fxdcs',
      'textDisplay': '🎉🎉❤❤',
      'textOriginal': '🎉🎉❤❤',
      'authorDisplayName': '@zaheen_',
      'authorProfileImageUrl': 'https://yt3.ggpht.com/AW7rW_CBFJQY8RNykOKreE2hkYEAuwndTEYEP_b7nPCyYV6uxCdcINxEr7-jbPr-b-nhkeMqUw=s48-c-k-c0x00ffffff-no-rj',
      'authorCha

## Step 7. Parsing the json and making it look better in the dataframe by shortening the column names

In [143]:
df1 = pd.json_normalize(response['items'])
df1.columns = df1.columns.str.removeprefix('snippet.topLevelComment.').str.removeprefix('snippet.').str.removesuffix('.value').str.removesuffix('.comments')

df = df1

In [144]:
while (1 == 1):
  try:
   nextPageToken = response['nextPageToken']
  except KeyError:
   break
  nextPageToken = response['nextPageToken']
  # Create a new request object with the next page token.
  nextRequest = youtube.commentThreads().list(part="snippet", videoId=video_id, maxResults=1000, pageToken=nextPageToken)
  # Execute the next request.
  response = nextRequest.execute()
  df1 = pd.json_normalize(response['items'])
  df1.columns = df1.columns.str.removeprefix('snippet.topLevelComment.').str.removeprefix('snippet.').str.removesuffix('.value').str.removesuffix('.comments')

  df = df.append(df1, ignore_index=True)

  df = df.append(df1, ignore_index=True)
  df = df.append(df1, ignore_index=True)


In [145]:
df = df.drop(columns=['kind', 'etag', 'id', 'channelId', 'videoId','canRate','viewerRating','canReply','totalReplyCount','isPublic'], errors='ignore')

In [146]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 231 entries, 0 to 230
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   textDisplay            231 non-null    object
 1   textOriginal           231 non-null    object
 2   authorDisplayName      231 non-null    object
 3   authorProfileImageUrl  231 non-null    object
 4   authorChannelUrl       231 non-null    object
 5   authorChannelId        231 non-null    object
 6   likeCount              231 non-null    int64 
 7   publishedAt            231 non-null    object
 8   updatedAt              231 non-null    object
dtypes: int64(1), object(8)
memory usage: 16.4+ KB


In [147]:
from IPython.display import display, HTML

def display_comments(dataframe):
    for index, row in dataframe.iterrows():
        display(HTML(f"""
            <div style="margin-bottom: 20px; padding: 10px; border: 5px solid #ccc;">
                <img src="{row['authorProfileImageUrl']}" style="width: 30px; height: 30px; border-radius: 50%;">
                <strong>{row['authorDisplayName']}</strong>
                <p>{row['textOriginal']}</p>
                <p>Likes: {row['likeCount']}</p>
                <p>Published At: {row['publishedAt']}</p>
            </div>
        """))

print('All comments on this video: ', end='\n\n')
display_comments(df)


All comments on this video: 



In [148]:
def process_author_channel_name(channel_id):
    if channel_id.startswith('@'):
        return channel_id
    else:
        return '@' + channel_id

In [149]:
author_channel_name = input("Enter the comment author name: ")

Enter the comment author name: brrecrds1723


In [150]:
processed_channel_name = process_author_channel_name(author_channel_name)

filtered_df = df[df['authorDisplayName'] == processed_channel_name]
print(f'All comments of user {processed_channel_name} on the video are as follows: ', end='\n\n')

display_comments(filtered_df)

All comments of user @brrecrds1723 on the video are as follows: 



# So we found out the comments of a particular user on a specific video.

To save this user's comments, we can export them in a csv file also:

In [151]:
df.to_csv(f'comments-{processed_channel_name}.csv', index=False)