## Project Phase 1: Data Collection

#### Title: YouTube Gaming Comments Toxicity
#### Team name:
#### Team members: Chesie Yu, Hongfan Lu, Bella Wei
#### Problem Description: Toxicity in the gaming community is a prevalent problem that hinders the harmonious development of the gaming industry. Our objective is to tackle this concern by exploring whether the game category (Action and Non-Action) serves as a primary determinant of toxicity levels in YouTube video comments. This study focuses on the observational perspective rather than the player angle. If proven, it can offer valuable insights for gaming community management, game design, and the design of social media platforms.

#### RQ1: Do videos of action games arouse significant more toxic comments than non-action games in YouTube?

#### RQ2: Which kinds of gaming video attract most toxic comments? Any pattern behind the scene?

### Data collection steps:
#### 1. Utilizing YouTube API to access Youtube comments
#### 2. Selecting [] as keywords for ActionGames, and [] as keywords for NonActionGames.
#### 3. Collecting Top 2000 Most Relevant Videos in each category
#### 4. Random Sampling to pick 200 videos each
#### 5. Saving comments in these 200 videos into seperated csv.

In [None]:
# API_KEY = "AIzaSyA_28SsnEgcSf6wckJM46pR_ZK05A1XzT8"
# API_KEY = "AIzaSyBaSTAz4h-ed9eH0WL0P_UqPl0NNodfMOg"
# API_KEY = "AIzaSyCPo6cHarTXbYaov23Q-pqtYnb6FXJo7L8"
API_KEY = "AIzaSyCsXkbTw06trv3ll4KhkB4wtvLlAW6EKGc"

In [None]:
!pip install --upgrade google-api-python-client --quiet

In [None]:
import json

import googleapiclient
import googleapiclient.discovery
import googleapiclient.errors
import pandas as pd

In [None]:
youtube = googleapiclient.discovery.build("youtube", "v3", developerKey=API_KEY)

In [None]:
youtube

<googleapiclient.discovery.Resource at 0x114aab290>

In [None]:
channel_name_list=['FaZe Rug', 'Ninja', 'DUDU e CAROL','Jelly','invictor',
'SSundee','FGTeeV','IShowSpeed','Minecraft - Topic','LazarBeam']

In [None]:
ActionGame_keyword_list = ['Call of Duty','GTA','The Last of Us','God of War','Batman','Red Dead Redemption',
                           "Assassin's Creed",'Star Wars Jedi','Resident Evil','Cyberpunk','Fallout','Tomb Raider','Elden Ring']


In [None]:

for username in channel_name_list:
    request = youtube.channels().list(
        part="contentDetails",
        forUsername=username
    )
    keyword_res = request.execute()
    print(json.dumps(keyword_res,indent=2))
    break



{
  "kind": "youtube#channelListResponse",
  "etag": "RuuXzTIr0OoDqI4S0RU6n4FqKEM",
  "pageInfo": {
    "totalResults": 0,
    "resultsPerPage": 5
  }
}


In [None]:
import nltk
from nltk.stem import PorterStemmer
# Initialize NLTK
nltk.download("punkt")
porter = PorterStemmer()
import re

[nltk_data] Downloading package punkt to /Users/weiyue/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
def search_keyword_videos(keyword_list, channel_name):
    relevant_videos = []
    next_page_token = None
    while len(relevant_videos) < 50:
        request = youtube.search().list(
            part='snippet, id',
            q= channel_name,
            maxResults=50,
            pageToken=next_page_token,
            order='date'
        )
        response = request.execute()
        stemmed_keywords = set(porter.stem(keyword) for keyword in keyword_list)

        for item in response.get('items', []):
            video_title = item['snippet']['title'].lower()
            if any(re.search(r'\b{}\b'.format(re.escape(porter.stem(keyword))), video_title) for keyword in stemmed_keywords):
                video_id = item['id']['videoId']
                # this is for viewCount
                view_count = youtube.videos().list(
                id = video_id,
                part = 'statistics'
                ).execute()

                video_info = {
                    'Channel Name': channel_name,
                    'Video ID':video_id,
                    'Video title': item['snippet']['title'],
                    'Video creation time': item['snippet']['publishedAt'],
                    #'Video number of views': view_count['items'][0]['statistics']['viewCount']
                }
                relevant_videos.append(video_info)
            if len(relevant_videos) >= 50:
                break
        next_page_token = response.get('nextPageToken')
        if not next_page_token:
            break

    print(f"Found {len(relevant_videos)} keyword-related videos from {channel_name}.")
    return relevant_videos

In [None]:
relevant_videos

{'FaZe Rug': [],
 'Ninja': [{'Channel Name': 'Ninja',
   'Video ID': 'VeyvUwnK_SM',
   'Video title': '#automobile #gta #ninja #gaming #games ninja h2 top speed new tranding#rider',
   'Video creation time': '2024-02-17T01:49:25Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'zGOn_qhZlBo',
   'Video title': 'From Sonic To NINJA SONIC In GTA 5!',
   'Video creation time': '2023-04-04T16:44:13Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'FDaZssDxWMo',
   'Video title': 'Animated Ninja Superbike Vs. 7 Fast POLICE Cars!! *INTENSE* | GTA 5',
   'Video creation time': '2022-08-01T11:05:07Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'J4eFe_bTM7Y',
   'Video title': 'Ninja SUPERBIKE vs Drag BIKE sa GTA 5! (TOP SPEED)',
   'Video creation time': '2021-09-05T06:07:44Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'UgS0juE-l5w',
   'Video title': 'Ninja Kidz Team Up With Batman and Robin Movie Remastered!',
   'Video creation time': '2021-08-28T12:15:01Z'},
  {'Channel Name': 'Ninja',
   'V

In [None]:
print(channel_name_list)
relevant_videos.keys()

['FaZe Rug', 'Ninja', 'DUDU e CAROL', 'Jelly', 'invictor', 'SSundee', 'FGTeeV', 'IShowSpeed', 'Minecraft - Topic', 'LazarBeam']


dict_keys(['FaZe Rug', 'Ninja', 'DUDU e CAROL', 'Jelly', 'invictor', 'SSundee', 'FGTeeV', 'IShowSpeed', 'Minecraft - Topic', 'LazarBeam'])

In [None]:
for username in channel_name_list:
    if username not in relevant_videos:
        relevant_videos[username]=search_keyword_videos(ActionGame_keyword_list, username)


Found 1 keyword-related videos from IShowSpeed.
Found 0 keyword-related videos from Minecraft - Topic.
Found 11 keyword-related videos from LazarBeam.


In [None]:
relevant_videos

{'FaZe Rug': [],
 'Ninja': [{'Channel Name': 'Ninja',
   'Video ID': 'VeyvUwnK_SM',
   'Video title': '#automobile #gta #ninja #gaming #games ninja h2 top speed new tranding#rider',
   'Video creation time': '2024-02-17T01:49:25Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'zGOn_qhZlBo',
   'Video title': 'From Sonic To NINJA SONIC In GTA 5!',
   'Video creation time': '2023-04-04T16:44:13Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'FDaZssDxWMo',
   'Video title': 'Animated Ninja Superbike Vs. 7 Fast POLICE Cars!! *INTENSE* | GTA 5',
   'Video creation time': '2022-08-01T11:05:07Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'J4eFe_bTM7Y',
   'Video title': 'Ninja SUPERBIKE vs Drag BIKE sa GTA 5! (TOP SPEED)',
   'Video creation time': '2021-09-05T06:07:44Z'},
  {'Channel Name': 'Ninja',
   'Video ID': 'UgS0juE-l5w',
   'Video title': 'Ninja Kidz Team Up With Batman and Robin Movie Remastered!',
   'Video creation time': '2021-08-28T12:15:01Z'},
  {'Channel Name': 'Ninja',
   'V

In [None]:
# from urllib2 import HTTPError
def extract_100_comments(video_id_list):
    comments_per_video = []
    for vid in video_id_list:
        try:
            request = youtube.commentThreads().list(
                videoId = vid,
                part = "id,snippet,replies",
                textFormat = "plainText",
                order = "time",
                maxResults = 100
            )
            response = request.execute()

            for item in response["items"]:
                comments = item["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
                has_reply = item["snippet"]["totalReplyCount"]
                if has_reply != 0:
                    replies = []
                    for i in range(len(item["replies"]["comments"])):
                        reply = item["replies"]["comments"][i]['snippet']["textDisplay"]
                        replies.append(reply)
                else:
                    replies = None
                comment_info = {
                    'Video ID': vid,
                    'Comment id': item['snippet']['topLevelComment']['id'],
                    'Comment title': item['snippet']['topLevelComment']['snippet']['textOriginal'],
                    'Comment creation time': item['snippet']['topLevelComment']['snippet']['publishedAt'],
                    'Comment number of likes': item['snippet']['topLevelComment']['snippet']['likeCount'],
                    'Comment content': comments,
                    'Replies': replies
                }
                comments_per_video.append(comment_info)
#         except HTTPError as e:
        except:
            continue
#             if e.resp.status == 403:
#                 print(f"Comments are disabled for video with ID: {vid}")
#                 comments_per_video.append(f"Comments are disabled for video with ID: {vid}")
#             else:
# #                 raise e  # Re-raise the exception if it's not a 403 error
#                 continue
    return comments_per_video

In [None]:
video_comments = {}

In [None]:
video_comments.keys()

dict_keys(['FaZe Rug', 'Ninja', 'DUDU e CAROL', 'Jelly', 'invictor', 'SSundee', 'FGTeeV', 'IShowSpeed', 'Minecraft - Topic', 'LazarBeam'])

In [None]:
for channel_name in relevant_videos:
    if channel_name in video_comments:
        continue
    item_video_ids = [d['Video ID'] for d in relevant_videos[channel_name] if 'Video ID' in d]
    channel_video_comments = extract_100_comments(item_video_ids)
    video_comments[channel_name] = channel_video_comments

In [None]:
json.dump(video_comments, open("channel_video_comments.json", "w"), indent=2)