# Emotional Consistency among Political Ideologies: An Approach to Address Polarization on Youtube

Group 5:
- Chance Landis (ChancL), Hanna Lee (Lee10), Jason Sun (YongXs), Andy Wong (WongA22)

## Data Collection

### Sources of Information
- **AllSides**: A media bias tool that provides a rating based on "multi-partisan Editorial Reviews by trained experts and Blind Bias Surveys™ in which participants rate content without knowing the source." We used this tool to determine how we should classify the most popular (based on subscriber count) YouTube channels we found. (Source: https://www.allsides.com/media-bias/media-bias-rating-methods)
- **HypeAuitor**: A company that uses a data-driven approach to influencer marketing. In the process, they collated lists of YouTube based on category, subscriber count, and country. This allowed us to find YouTube channels that focused on news and politics with the most subscribers. (Source: https://hypeauditor.com/about/company/, https://hypeauditor.com/top-youtube-news-politics-united-states/)
- **Pew Research Center**: A nonpartisan, nonprofit organization that conducts research on public opinion, demographic trends, and social issues. It provides data-driven insights into various aspects of social science issues, explicitly stating they do not take a stance on political issues. For our research, we relied on their studies on political ideologies and alignment with political parties as a reference. (Source: https://www.pewresearch.org/about/, https://www.pewresearch.org/politics/2016/06/22/5-views-of-parties-positions-on-issues-ideologies/)
- **YouTube**: As a group, we've chosen to expand our collection of YouTube videos by selecting additional keywords associated with the ideology we're studying. Our focus will be on gathering comments from these videos to conduct our research.
    - We used a combination of Andy and Hanna's code to get the comments from YouTube channels.

### Top 5 Democratic YouTube Channels
Vice, Vox, MSNBC, The Daily Show, The Young Turks

In [1]:
pip install --upgrade pip

Collecting pip
  Downloading pip-24.0-py3-none-any.whl (2.1 MB)
     ---------------------------------------- 2.1/2.1 MB 14.9 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.4
    Uninstalling pip-22.0.4:
      Successfully uninstalled pip-22.0.4
Successfully installed pip-24.0
Note: you may need to restart the kernel to use updated packages.


In [20]:
!pip install --upgrade google-api-python-client --quiet

In [21]:
!pip install nltk



In [22]:
# imports
import json
import pandas as pd
from collections import defaultdict

import nltk

import googleapiclient
import googleapiclient.discovery
import googleapiclient.errors
from googleapiclient.errors import HttpError

import re
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.tokenize import casual

from nrclex import NRCLex

### Define API / Lists / Dictionaries

In [155]:
# API calls

# vice: AIzaSyA2rNi_MI-3LQkBzzQ6Tn4EF0lgXWoilfc, AIzaSyAfHslkgOEDPAnr5_jB1b2wxZKthApBXNw, AIzaSyCxD0YFA3bAPMoK2ovmCFnKhW7yqrNshEQ
# vox: AIzaSyAoeLCEEfqmnpRHR4xRMKt1YdbeUUw75ao, AIzaSyDnBTOIfxF4o-aFiBKJNtkZJBKajBzYDlI
# msnbc: AIzaSyAbCFiuHc9untZ17tyV5A6rlpKNib4qviE
# daily show: AIzaSyD8adQZlhLNVQrQXpU5-u3s1Y-9TZs20ik
# young turk: AIzaSyB8yyrUrfQGLrlQRmF555oc1emrIDXF7yU

# Others: API_KEY = "AIzaSyCjWja_yyRROSw5tcP_KxYjasJgHLX3oKE"
# API_KEY = "AIzaSyCjWja_yyRROSw5tcP_KxYjasJgHLX3oKE"
API_KEY = "AIzaSyB8yyrUrfQGLrlQRmF555oc1emrIDXF7yU"

youtube = googleapiclient.discovery.build("youtube", "v3", developerKey=API_KEY)

In [4]:
# Define channels
channels = ["Vice", "Vox", "msnbc", "thedailyshow", "TheYoungTurks"]

In [5]:
# Define channels for Republican YouTube channels
channels_right = ["BenShapiro", "StevenCrowder", "FoxNews", "DailyWirePlus", "dailymail"]

In [6]:
# Define keywords
keyword_lists = {
    "isis": ["ISIS", "Radicalist", "Islamic State", "Jihadist", "Syria conflict", "Iraq insurgency", "Al-Qaeda", "Radical Islam", "Suicide bombings", "Mosul"],
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "economy": ["Economy", "Budget deficit", "Unemployment rate", "Inflation", "Interest rate", "Federal reserve", "Recession", "GDP", "Consumer Price Index", "Trade Balance", "Stock Exchange", "Central bank", "Consumer spending", "NASDAQ", "Dow Jones", "S&P", "currency exchange", "Financial crisis", "Investment strategies", "Credit rating", "Commodities", "Real estate market", "Banking sector"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"], 
    "socioeco": ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution", "Minimum Wage", "Financial Insecurity", "Welfare", "Homelessness", "Financial Literacy"],
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Planned Parenthood", "Fetal rights", "Life of mother", "Reproductive", "Women's health", "Gestational", "Late-term abortion", "Post-abortion syndrome", "Safe haven laws", "Mifepristone", "Misoprostol", "Dobbs", "Pro-choice", "Anti-abortion"],
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

In [7]:
# Establish keyword dictionary for specific channel that did not require more videos
keyword_lists2 = {
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"], 
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

In [8]:
# Establish keyword dictionary for specific channel that did not require more videos
keyword_lists3 = {
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"], 
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Planned Parenthood", "Fetal rights", "Life of mother", "Reproductive", "Women's health", "Gestational", "Late-term abortion", "Post-abortion syndrome", "Safe haven laws", "Mifepristone", "Misoprostol", "Dobbs", "Pro-choice", "Anti-abortion"],
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

In [9]:
keyword_isis = {
    "isis": ["ISIS", "Radicalist", "Islamic State", "Jihadist", "Syria conflict", "Iraq insurgency", "Al-Qaeda", "Radical Islam", "Suicide bombings", "Mosul"]
}

In [10]:
keyword_economy = {
    "economy": ["Economy", "Budget deficit", "Unemployment rate", "Inflation", "Interest rate", "Federal reserve", "Recession", "GDP", "Consumer Price Index", "Trade Balance", "Stock Exchange", "Central bank", "Consumer spending", "NASDAQ", "Dow Jones", "S&P", "currency exchange", "Financial crisis", "Investment strategies", "Credit rating", "Commodities", "Real estate market", "Banking sector"]
}

In [11]:
keyword_socioeco = {
    "socioeco": ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution", "Minimum Wage", "Financial Insecurity", "Welfare", "Homelessness", "Financial Literacy"],
}

In [12]:
keyword_abortion = {
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Planned Parenthood", "Fetal rights", "Life of mother", "Reproductive", "Women's health", "Gestational", "Late-term abortion", "Post-abortion syndrome", "Safe haven laws", "Mifepristone", "Misoprostol", "Dobbs", "Pro-choice", "Anti-abortion"]
}

### Define Functions

In [24]:
# Function for getting channel id based on name
def get_channel_id(channel):  
    channel_id = youtube.search().list(
        part="snippet",
        type="channel",
        q=channel
    )

    res_channel = channel_id.execute()
    chan_id = res_channel["items"][0]["id"]["channelId"]

    return chan_id

In [25]:
# Function for retrieving the upload playlist id using channel id
def get_upload_id(channel):
    request = youtube.channels().list(
        part="contentDetails",
        id=channel
    )

    res = request.execute()
    uploads_playlist_id = res["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]

    return uploads_playlist_id

In [26]:
# Initialize PorterStemmer
ps = PorterStemmer()

# Function to check if a video title contains any of the keywords
def contains_keyword(title, keywords):
    title_lower = title.lower()
    words = word_tokenize(title_lower)
    
    # Stem each word in the title + keyword
    stemmed_words = [ps.stem(word) for word in words]
    for keyword in keywords:
        keyword_stemmed = ps.stem(keyword.lower())
        if keyword_stemmed in stemmed_words:
            return keyword
    return None

In [39]:
# function to fetch videos from a playlist and get title with keywords
def keyword_videos(playlist_id, keyword_list, channel_name, category, limit):
    videos_info = []
    next_page_token = None

    while True:
        # Make the next API request using the nextPageToken
        request = youtube.playlistItems().list(
            part="snippet",
            playlistId=playlist_id,
            pageToken=next_page_token
        ) 
        res = request.execute()

        # Process the response and save video info
        for v in res["items"]:
            video_title = v["snippet"]["title"]
            detected_word = contains_keyword(video_title, keyword_list)
            if detected_word:
                # Separate Resource Call to retrieve video views
                views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                view_temp = views.execute()
                video_views = view_temp['items'][0]['statistics'].get('viewCount', 'Not Available')

                # Append video information with views to videos_info list
                videos_info.append({
                    "channel": channel_name,
                    "id": v["snippet"]["resourceId"]["videoId"],
                    "title": video_title,
                    "keyword": detected_word,
                    "category": category,
                    "published_at": v["snippet"]["publishedAt"],
                    "VideoViews": video_views
                })
        # Update the nextPageToken for the next iteration
        next_page_token = res.get('nextPageToken')

        if not next_page_token or (len(videos_info) > limit):
            break
    return videos_info

In [52]:
# Function for getting top 30 relevant comments for a list of videos
def get_vid_comments(vid_id, category, limit):
    vids_final = []

    # Iterate through each video in the video list
    try:
        # Retrieve comments for the video
        request = youtube.commentThreads().list(
            videoId=vid_id,
            part='id,snippet,replies',
            textFormat='plainText',
            order='relevance',
            maxResults=50)
        res = request.execute()

        # Iterate through each comment
        for v in res["items"]:
            # Extract comment information and add to the final list
            comment_info = {
                'VideoId': vid_id,
                'category': category,
                'CommentId': v['id'],
                'CommentTitle': v['snippet']['topLevelComment']['snippet']['textOriginal'],
                'CommentCreationTime': v['snippet']['topLevelComment']['snippet']['publishedAt'],
                'CommentLikes': v['snippet']['topLevelComment']['snippet']['likeCount']
            }
            vids_final.append(comment_info)

            # Check if the number of saved comments exceeds the limit
            if len(vids_final) >= limit:
                return vids_final

        nextPageToken = res.get('nextPageToken')

        # Retrieve additional pages of comments if available
        while nextPageToken:
            try:
                request = youtube.commentThreads().list(
                    videoId=vid_id,
                    part='id,snippet,replies',
                    textFormat='plainText',
                    order='relevance',
                    maxResults=50,
                    pageToken=nextPageToken)
                res = request.execute()

                nextPageToken = res.get('nextPageToken')

                # Iterate through additional comments and add to the final list
                for v in res["items"]:
                    comment_info = {
                        'VideoId': vid_id,
                        'category': category,
                        'CommentId': v['id'],
                        'CommentTitle': v['snippet']['topLevelComment']['snippet']['textOriginal'],
                        'CommentCreationTime': v['snippet']['topLevelComment']['snippet']['publishedAt'],
                        'CommentLikes': v['snippet']['topLevelComment']['snippet']['likeCount']
                    }
                    vids_final.append(comment_info)

                    # Check if the number of saved comments exceeds the limit
                    if len(vids_final) >= limit:
                        return vids_final
            except KeyError:
                break

    # Error handling for videos with disabled comments
    except HttpError as e:
        if e.resp.status == 403:
            print(f"Comments are disabled for the video with videoId: {vid_id}")

        else:
            print("An HTTP error occurred:", e)

    return vids_final

#### Get Channel IDs

In [29]:
dem_up_id = []

for channel in channels:
    print(channel)
    chan_id = get_channel_id(channel)
    upload_id = get_upload_id(chan_id)
    dem_up_id.append(upload_id)

Vice
Vox
msnbc
thedailyshow
TheYoungTurks


In [30]:
dem_up_id

['UUn8zNIfYAQNdrFRrr8oibKw',
 'UULXo7UDZvByw2ixzpQCufnA',
 'UUaXkIU1QidjPwiAYu6GcHjg',
 'UUwWhs_6x42TyRM4Wstoq8HA',
 'UU1yBKRuGpC1tSM73A0ZjYjQ']

#### Vice

In [44]:
# Run function to get information about relevant videos
vice_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUn8zNIfYAQNdrFRrr8oibKw", keywords, "Vice", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    vice_vid_info.append(vid_info)

Fetching videos for category: isis
Found 11 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 4 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 7 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [46]:
vice_vid_info[:50]

[[{'channel': 'Vice',
   'id': 'SwoRx3tstxY',
   'title': 'We Uncovered an ISIS Mass Grave | Super Users',
   'keyword': 'ISIS',
   'category': 'isis',
   'published_at': '2022-04-11T15:00:12Z',
   'VideoViews': '349475'},
  {'channel': 'Vice',
   'id': 'LttCr8rudxQ',
   'title': 'How ISIS Makes Millions From Stolen Antiques | The Business of Crime',
   'keyword': 'ISIS',
   'category': 'isis',
   'published_at': '2022-02-24T16:00:17Z',
   'VideoViews': '404183'},
  {'channel': 'Vice',
   'id': 'm54L1jusS5w',
   'title': 'Safer at War than at Home | Diary of a Combat Medic Fighting ISIS (Part 3/3)',
   'keyword': 'ISIS',
   'category': 'isis',
   'published_at': '2019-04-05T16:00:09Z',
   'VideoViews': '330460'},
  {'channel': 'Vice',
   'id': '_xO5dsGEL_E',
   'title': 'Interrogating Enemy Fighters | Diary of a Combat Medic Fighting ISIS (Part 2/3)',
   'keyword': 'ISIS',
   'category': 'isis',
   'published_at': '2019-04-04T16:00:03Z',
   'VideoViews': '240256'},
  {'channel': 'Vice'

In [49]:
# Get the video IDs of the relevant videos
vice_vid_ids = set()

# Extracting all ids
for sublist in vice_vid_info:
    for video in sublist:
        vice_vid_ids.add((video['category'], video['id']))
        
len(vice_vid_ids)

77

In [51]:
vice_vid_ids

{('abortion', '6S9oUu0R3sY'),
 ('abortion', 'EEIvWNhuL8U'),
 ('abortion', 'FHlI6Vjc0tI'),
 ('abortion', 'GMdymyLNC1s'),
 ('abortion', 'jUFOECOZ1fg'),
 ('abortion', 'q2roLP6HIQA'),
 ('abortion', 'xnS2CTmdA0A'),
 ('climate', '9Okkpmbdn_o'),
 ('climate', 'Dkk1rMARYOY'),
 ('climate', 'IT8XsE0If0g'),
 ('climate', 'PfbNY2G64G8'),
 ('climate', 'RpJDwQSvXNs'),
 ('climate', 'fU3C8o8I6GQ'),
 ('climate', 'gXBR5ZrKdws'),
 ('climate', 'gdhdAktIHtg'),
 ('climate', 'hxdwbZ3Oeyc'),
 ('climate', 'tNwkY_V_BPI'),
 ('climate', 'y20Tx2eCw28'),
 ('economy', 'QX3M8Ka9vUA'),
 ('economy', 'X3ySrcI2mEA'),
 ('economy', 'd9K96fZGY64'),
 ('economy', 'dckjk1V-KRM'),
 ('guns', '-1A9v5bQDqk'),
 ('guns', '388wlVWxGz0'),
 ('guns', '4PfZlxhvdkM'),
 ('guns', '61trVLZv1-w'),
 ('guns', 'Jj-3kBi49eg'),
 ('guns', 'Jp0nqJ1yrrg'),
 ('guns', 'UnETVMI4tY8'),
 ('guns', 'V3KGKQd_4tk'),
 ('guns', 'Zh2e8nY8VJ0'),
 ('guns', 'gs26R56d3ww'),
 ('guns', 'lFIro2Dnfj8'),
 ('healthcare', '07lsXkWmpz8'),
 ('healthcare', '18_KBggvIZM'),
 ('he

In [53]:
# Get the top 30 relevant comments of each video
vice_comments = []

for category, ids in vice_vid_ids:
    vice_comm = get_vid_comments(ids, category, 30)
    vice_comments.append(vice_comm)

Comments are disabled for the video with videoId: EEIvWNhuL8U


In [54]:
vice_comments

[[{'VideoId': 'SwoRx3tstxY',
   'category': 'isis',
   'CommentId': 'Ugws1dFQrp7AovnexrB4AaABAg',
   'CommentTitle': 'Bless the hard work of journalists! Seeing the deplorable and terrible things done by monstrous groups like ISIS in one spot must be so difficult. We’re with you!',
   'CommentCreationTime': '2022-04-12T22:53:43Z',
   'CommentLikes': 146},
  {'VideoId': 'SwoRx3tstxY',
   'category': 'isis',
   'CommentId': 'UgxcNLZW2rAeMBklWD14AaABAg',
   'CommentTitle': "Also I can't imagine  the amount mental trauma this work puts these journalists and their teams  undergo having to file through hours of footage of some of the most horrific acts enacted upon people in order to try and piece together what really happened.  If they can uncover even some of truth that is a very big step forward to those victims who are still alive and hopefullt it can highlight those respsonsible",
   'CommentCreationTime': '2022-04-11T16:49:08Z',
   'CommentLikes': 726},
  {'VideoId': 'SwoRx3tstxY',
   

In [34]:
# Count comments per videoId to ensure there is enough comments
comment_count = defaultdict(int)

for video_comments in vice_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        comment_count[video_id] += 1

print("Number of comments per videoId:")
for video_id, count in comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: 07lsXkWmpz8, Number of Comments: 30
Video ID: 9Okkpmbdn_o, Number of Comments: 13
Video ID: V3KGKQd_4tk, Number of Comments: 30
Video ID: 7fJpRa7o_fQ, Number of Comments: 30
Video ID: gXTeg5LAvN8, Number of Comments: 30
Video ID: Dkk1rMARYOY, Number of Comments: 6
Video ID: egEbJ2gukxU, Number of Comments: 30
Video ID: fD93qXcMwGA, Number of Comments: 5
Video ID: dckjk1V-KRM, Number of Comments: 30
Video ID: uaDZ0bVresE, Number of Comments: 30
Video ID: FDYqe5I35KA, Number of Comments: 30
Video ID: cdZy4balvB8, Number of Comments: 30
Video ID: IT8XsE0If0g, Number of Comments: 5
Video ID: LttCr8rudxQ, Number of Comments: 30
Video ID: m54L1jusS5w, Number of Comments: 30
Video ID: PdJJxwP8NaU, Number of Comments: 30
Video ID: PfbNY2G64G8, Number of Comments: 30
Video ID: fU3C8o8I6GQ, Number of Comments: 30
Video ID: hyk5YXnag9E, Number of Comments: 6
Video ID: UnETVMI4tY8, Number of Comments: 30
Video ID: M4BOdBJFkgA, Number of Comments: 30
Video 

In [55]:
# Count comments per category/ideology for a check
comment_count_per_category = defaultdict(int)

for video_comments in vice_comments:
    for comment in video_comments:
        video_category = comment['category']
        comment_count_per_category[video_category] += 1

print("Number of comments per video category:")
for category, count in comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: isis, Number of Comments: 243
Video Category: guns, Number of Comments: 330
Video Category: economy, Number of Comments: 120
Video Category: healthcare, Number of Comments: 302
Video Category: climate, Number of Comments: 242
Video Category: socioeco, Number of Comments: 315
Video Category: abortion, Number of Comments: 130
Video Category: immigration, Number of Comments: 255


In [66]:
# Create a dict to map video ids to their corresponding details
vid_details = {vid['id']: vid for sublist in vice_vid_info for vid in sublist}

# Comebine vid detail with comments
vice_result = []

for sublist in vice_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in vid_details:
            details = vid_details[vid_id].copy()
            details.update(item)
            vice_result.append(details)

In [67]:
# Transform into DF
vice_comments_df = pd.DataFrame(vice_result)

vice_comments_df.head()
vice_comments_df.shape

(1937, 12)

#### Vox

In [56]:
# Run function to get information about relevant videos
vox_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UULXo7UDZvByw2ixzpQCufnA", keywords, "Vox", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    vox_vid_info.append(vid_info)

Fetching videos for category: isis
Found 9 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 4 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 9 videos for category socioeco
Fetching videos for category: abortion
Found 10 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [58]:
# Get the video IDs of the relevant videos
vox_vid_ids = set()

# Extracting all ids
for sublist in vox_vid_info:
    for video in sublist:
        vox_vid_ids.add((video['category'], video['id']))
        
len(vox_vid_ids)

76

In [62]:
# Get the top 30 relevant comments of each video
vox_comments = []

for category, ids in vox_vid_ids:
    vox_comm = get_vid_comments(ids, category, 30)
    vox_comments.append(vox_comm)

In [63]:
# Count comments per videoId to ensure there is enough comments
vox_comment_count = defaultdict(int)

for video_comments in vox_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        vox_comment_count[video_id] += 1

for video_id, count in vox_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Video ID: oRUjKZOhV6E, Number of Comments: 30
Video ID: Al0rBxHuVk4, Number of Comments: 30
Video ID: EyAdby3hMRM, Number of Comments: 30
Video ID: _0TCrGtTEQM, Number of Comments: 30
Video ID: -S_f-huz-EU, Number of Comments: 30
Video ID: ylLTMYt24lA, Number of Comments: 30
Video ID: UI4g_amOTSg, Number of Comments: 30
Video ID: RWSaceaSpRI, Number of Comments: 30
Video ID: tfZYfYbONmI, Number of Comments: 30
Video ID: 1xbt0ACMbiA, Number of Comments: 30
Video ID: GWH5vyi3lTk, Number of Comments: 30
Video ID: riWh6Ljgu_M, Number of Comments: 30
Video ID: pTwPHuE_HrU, Number of Comments: 30
Video ID: RBKhpV6MYto, Number of Comments: 30
Video ID: Bzuk13Ftxgo, Number of Comments: 60
Video ID: t6V9i8fFADI, Number of Comments: 30
Video ID: K3odScka55A, Number of Comments: 30
Video ID: iKHl__BEsD0, Number of Comments: 30
Video ID: Z9gQLELtbhg, Number of Comments: 30
Video ID: LJjo1kJW6To, Number of Comments: 30
Video ID: FBwFJeleth0, Number of Comments: 30
Video ID: Ry5jTjBhZpA, Number of C

In [65]:
# Count number of comments per ideology/category
vox_comment_count_per_category = defaultdict(int)

for video_comments in vox_comments:
    for comment in video_comments:
        video_category = comment['category']
        vox_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")
for category, count in vox_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: climate, Number of Comments: 330
Video Category: abortion, Number of Comments: 284
Video Category: healthcare, Number of Comments: 330
Video Category: socioeco, Number of Comments: 270
Video Category: guns, Number of Comments: 313
Video Category: immigration, Number of Comments: 330
Video Category: economy, Number of Comments: 120
Video Category: isis, Number of Comments: 263


In [68]:
# Create a dict to map video ids to their corresponding details
vox_vid_details = {vid['id']: vid for sublist in vox_vid_info for vid in sublist}

# Comebine vid detail with comments
vox_result = []

for sublist in vox_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in vox_vid_details:
            details = vox_vid_details[vid_id].copy()
            details.update(item)
            vox_result.append(details)

In [69]:
# Transform into DF
vox_comments_df = pd.DataFrame(vox_result)

vox_comments_df.head()
vox_comments_df.shape

(2240, 12)

#### MSNBC

Important Notes:
- MSNBC has more diverse set of videos, so to compensate for the lack of videos from other channels, more videos were taken for certain ideologies

In [70]:
# Run function to get information about relevant videos
msnbc_vid_info = []

In [72]:
for category, keywords in keyword_lists2.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: climate
Found 11 videos for category climate


In [73]:
for category, keywords in keyword_isis.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", category, 24)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: isis
Found 6 videos for category isis


In [74]:
for category, keywords in keyword_economy.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", category, 26)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: economy
Found 27 videos for category economy


In [75]:
for category, keywords in keyword_socioeco.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", category, 13)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: socioeco
Found 14 videos for category socioeco


In [76]:
for category, keywords in keyword_abortion.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", category, 16)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: abortion
Found 17 videos for category abortion


In [77]:
# Get the video IDs of the relevant videos
msnbc_vid_ids = set()

# Extracting all ids
for sublist in msnbc_vid_info:
    for video in sublist:
        msnbc_vid_ids.add((video['category'], video['id']))
        
len(msnbc_vid_ids)

108

In [78]:
# Get the top 30 relevant comments of each video
msnbc_comments = []

for category, ids in msnbc_vid_ids:
    msnbc_comm = get_vid_comments(ids, category, 30)
    msnbc_comments.append(msnbc_comm)

In [79]:
# Count comments per videoId to ensure there is enough comments
msnbc_comment_count = defaultdict(int)

for video_comments in msnbc_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        msnbc_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in msnbc_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: it8tZK5CSyo, Number of Comments: 30
Video ID: LyxCoBKfooI, Number of Comments: 30
Video ID: E292SryC4nk, Number of Comments: 30
Video ID: sYht9MF2QLU, Number of Comments: 30
Video ID: _-RKAvNziy4, Number of Comments: 30
Video ID: 3qiJp65MDBI, Number of Comments: 30
Video ID: r0V3FmLjIqc, Number of Comments: 26
Video ID: jdX6qpeRybc, Number of Comments: 30
Video ID: mp_-PoqnqGw, Number of Comments: 30
Video ID: wV8WU3eks2k, Number of Comments: 9
Video ID: zEB5-LYlx24, Number of Comments: 30
Video ID: M8ArLCxek5k, Number of Comments: 30
Video ID: TypI8XY50Ho, Number of Comments: 30
Video ID: dHLU4Z3D3ZY, Number of Comments: 30
Video ID: FV7KLybtNhU, Number of Comments: 30
Video ID: 3fF8MfxFgQg, Number of Comments: 7
Video ID: MgKBusMm5xM, Number of Comments: 30
Video ID: nC4ztHc515E, Number of Comments: 24
Video ID: 3OaWVIw0j9w, Number of Comments: 30
Video ID: 1VR6Se0fET8, Number of Comments: 30
Video ID: LZRgMwS7B8o, Number of Comments: 30
Vide

In [80]:
# Count comments per category/ideology for a check
msnbc_comment_count_per_category = defaultdict(int)

for video_comments in msnbc_comments:
    for comment in video_comments:
        video_category = comment['category']
        msnbc_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")
for category, count in msnbc_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")


Number of comments per video category:
Video Category: immigration, Number of Comments: 307
Video Category: abortion, Number of Comments: 510
Video Category: economy, Number of Comments: 806
Video Category: socioeco, Number of Comments: 379
Video Category: healthcare, Number of Comments: 320
Video Category: climate, Number of Comments: 330
Video Category: isis, Number of Comments: 172
Video Category: guns, Number of Comments: 330


In [81]:
# Create a dict to map video ids to their corresponding details
msnbc_vid_details = {vid['id']: vid for sublist in msnbc_vid_info for vid in sublist}

# Comebine vid detail with comments
msnbc_result = []

for sublist in msnbc_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in msnbc_vid_details:
            details = msnbc_vid_details[vid_id].copy()
            details.update(item)
            msnbc_result.append(details)

In [82]:
# Transform into DF
msnbc_comments_df = pd.DataFrame(msnbc_result)

msnbc_comments_df.head()
msnbc_comments_df.shape

(3154, 12)

#### The Daily Show

In [85]:
# Run function to get information about relevant videos
daily_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUwWhs_6x42TyRM4Wstoq8HA", keywords, "Daily Show", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    daily_vid_info.append(vid_info)

Fetching videos for category: isis
Found 6 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 10 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [86]:
# Get the video IDs of the relevant videos
daily_vid_ids = set()

# Extracting all ids
for sublist in daily_vid_info:
    for video in sublist:
        daily_vid_ids.add((video['category'], video['id']))
        
len(daily_vid_ids)

82

In [87]:
# Get the top 30 relevant comments of each video
daily_comments = []

for category, ids in daily_vid_ids:
    daily_comm = get_vid_comments(ids, category, 30)
    daily_comments.append(daily_comm)

In [88]:
# Count comments per videoId to ensure there is enough comments
daily_comment_count = defaultdict(int)

for video_comments in daily_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        daily_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in daily_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: TtziF8sgZ0I, Number of Comments: 30
Video ID: Ocx4ogv1tDI, Number of Comments: 30
Video ID: 4YMPEK1pwtQ, Number of Comments: 30
Video ID: jUcE3h2A_Cw, Number of Comments: 30
Video ID: B5t12TDa618, Number of Comments: 30
Video ID: TN5-1QX8XZI, Number of Comments: 30
Video ID: hESoqv2AwWA, Number of Comments: 30
Video ID: vbCSiHc-szY, Number of Comments: 30
Video ID: yp5HylZ05kM, Number of Comments: 30
Video ID: 5FdnVetOKkw, Number of Comments: 30
Video ID: JdWWWD29vlk, Number of Comments: 30
Video ID: 6J55FgWN85o, Number of Comments: 30
Video ID: bOAnCTvq55Y, Number of Comments: 30
Video ID: aOVKzKahIOg, Number of Comments: 30
Video ID: v2bIyik6JUI, Number of Comments: 30
Video ID: 4IIwK-ZMScA, Number of Comments: 30
Video ID: 5byqAsAqWsk, Number of Comments: 60
Video ID: F_O1_SXIdlA, Number of Comments: 30
Video ID: eMSHNS0OOA8, Number of Comments: 30
Video ID: 0e1hpD7ForE, Number of Comments: 30
Video ID: XwxIqBCrvyk, Number of Comments: 30
Vi

In [89]:
# Count comments per category/ideology for a check
daily_comment_count_per_category = defaultdict(int)

for video_comments in daily_comments:
    for comment in video_comments:
        video_category = comment['category']
        daily_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in daily_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")


Number of comments per video category:
Video Category: socioeco, Number of Comments: 327
Video Category: isis, Number of Comments: 180
Video Category: economy, Number of Comments: 300
Video Category: immigration, Number of Comments: 330
Video Category: guns, Number of Comments: 330
Video Category: healthcare, Number of Comments: 330
Video Category: abortion, Number of Comments: 327
Video Category: climate, Number of Comments: 330


In [90]:
# Create a dict to map video ids to their corresponding details
daily_vid_details = {vid['id']: vid for sublist in daily_vid_info for vid in sublist}

# Comebine vid detail with comments
daily_result = []

for sublist in daily_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in daily_vid_details:
            details = daily_vid_details[vid_id].copy()
            details.update(item)
            daily_result.append(details)

In [91]:
# Transform into DF
daily_comments_df = pd.DataFrame(daily_result)

daily_comments_df.head()
daily_comments_df.shape

(2454, 12)

#### The Young Turks

In [94]:
# Run function to get information about relevant videos
yturk_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UU1yBKRuGpC1tSM73A0ZjYjQ", keywords, "Young Turks", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    yturk_vid_info.append(vid_info)

Fetching videos for category: isis
Found 5 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 11 videos for category economy
Fetching videos for category: healthcare
Found 12 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [96]:
# Get the video IDs of the relevant videos
yturk_vid_ids = set()

# Extracting all ids
for sublist in yturk_vid_info:
    for video in sublist:
        yturk_vid_ids.add((video['category'], video['id']))
        
len(yturk_vid_ids)

83

In [97]:
# Get the top 30 relevant comments of each video
yturk_comments = []

for category, ids in yturk_vid_ids:
    yturk_comm = get_vid_comments(ids, category, 30)
    yturk_comments.append(yturk_comm)

In [98]:
# Count comments per videoId to ensure there is enough comments
yturk_comment_count = defaultdict(int)

for video_comments in yturk_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        yturk_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in yturk_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: tTJMvR-NcO8, Number of Comments: 30
Video ID: EOj5JsjOlL0, Number of Comments: 30
Video ID: FMBKS5JFAsw, Number of Comments: 30
Video ID: tZu91Y_l8dc, Number of Comments: 30
Video ID: OV3wyVxqsDU, Number of Comments: 30
Video ID: _dMtdutY2f4, Number of Comments: 30
Video ID: SkFJTsEsMVY, Number of Comments: 30
Video ID: _7nnyMHUUys, Number of Comments: 30
Video ID: chTEF4JFWls, Number of Comments: 30
Video ID: xRq0SCJCAPs, Number of Comments: 30
Video ID: DxM57pMaUOg, Number of Comments: 30
Video ID: 2Eb2kPjP60c, Number of Comments: 1
Video ID: 9p7-IdpcjVg, Number of Comments: 30
Video ID: 6l5hDmCHAKU, Number of Comments: 30
Video ID: yvfBnOLe4U8, Number of Comments: 30
Video ID: zM_1-BmuJ5U, Number of Comments: 30
Video ID: 5WbS9ekzx90, Number of Comments: 30
Video ID: vpq4A85hi74, Number of Comments: 30
Video ID: vusQXYuEhVc, Number of Comments: 30
Video ID: 61sLAMDu6-Y, Number of Comments: 30
Video ID: DjC7DAKSGhY, Number of Comments: 30
Vid

In [99]:
# Count comments per category/ideology for a check
yturk_comment_count_per_category = defaultdict(int)

for video_comments in yturk_comments:
    for comment in video_comments:
        video_category = comment['category']
        yturk_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in yturk_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: healthcare, Number of Comments: 360
Video Category: socioeco, Number of Comments: 330
Video Category: guns, Number of Comments: 300
Video Category: isis, Number of Comments: 121
Video Category: economy, Number of Comments: 330
Video Category: abortion, Number of Comments: 300
Video Category: immigration, Number of Comments: 330
Video Category: climate, Number of Comments: 330


In [100]:
# Create a dict to map video ids to their corresponding details
yturk_vid_details = {vid['id']: vid for sublist in yturk_vid_info for vid in sublist}

# Comebine vid detail with comments
yturk_result = []

for sublist in yturk_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in yturk_vid_details:
            details = yturk_vid_details[vid_id].copy()
            details.update(item)
            yturk_result.append(details)

In [101]:
# Transform into DF
yturk_comments_df = pd.DataFrame(yturk_result)

yturk_comments_df.head()
yturk_comments_df.shape

(2401, 12)

### Combine ALL DF and Save as CSV

In [102]:
dem_comment_df = pd.concat([vice_comments_df, vox_comments_df, msnbc_comments_df, daily_comments_df, yturk_comments_df], ignore_index=True)

In [103]:
# Save df to a CSV file
dem_comment_df.to_csv("democratice_comments_final.csv", index=False)

### Top 5 Republican YouTube Channels

#### Get Playlst ID

In [104]:
rep_up_id = []

for channel in channels_right:
    print(channel)
    chan_id = get_channel_id(channel)
    upload_id = get_upload_id(chan_id)
    rep_up_id.append(upload_id)

BenShapiro
StevenCrowder
FoxNews
DailyWirePlus
dailymail


In [105]:
rep_up_id

['UUnQC_G5Xsjhp9fEJKuIcrSw',
 'UUIveFvW-ARp_B_RckhweNJw',
 'UUXIJgqnII2ZOINSWNOGFThA',
 'UUaeO5vkdj5xOQHp4UmIN6dw',
 'UUw3fku0sH3qA3c3pZeJwdAw']

#### Ben Shapiro

In [109]:
# Run function to get information about relevant videos
ben_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUnQC_G5Xsjhp9fEJKuIcrSw", keywords, "Ben Shapiro", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    ben_vid_info.append(vid_info)

Fetching videos for category: isis
Found 0 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 12 videos for category immigration
Fetching videos for category: economy
Found 11 videos for category economy
Fetching videos for category: healthcare
Found 12 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [22]:
ben_vid_info[2]

[{'channel': 'Ben Shapiro',
  'id': 'VOkkGuOqQVY',
  'title': 'Illegal Immigrants Pummel Cops, Walk Free',
  'keyword': 'Immigration',
  'published_at': '2024-02-01T18:00:11Z',
  'VideoViews': '191633'},
 {'channel': 'Ben Shapiro',
  'id': 're1nbhsUCE4',
  'title': 'The "Magic Word" for Immigrants',
  'keyword': 'Immigration',
  'published_at': '2024-01-31T20:00:21Z',
  'VideoViews': '108036'},
 {'channel': 'Ben Shapiro',
  'id': 'm0An8qb5jSs',
  'title': "Biden's Immigration Policy",
  'keyword': 'Immigration',
  'published_at': '2024-01-31T00:30:32Z',
  'VideoViews': '260351'},
 {'channel': 'Ben Shapiro',
  'id': 'aWBLOqqAUTM',
  'title': 'Hamas Supporters Should Be Deported | @YAFTV',
  'keyword': 'Deportation',
  'published_at': '2023-11-11T20:00:00Z',
  'VideoViews': '157983'},
 {'channel': 'Ben Shapiro',
  'id': 'rkDsCwbcKzo',
  'title': 'Caravan Of Illegal Immigrants CHANT "Biden!"',
  'keyword': 'Immigration',
  'published_at': '2023-11-11T01:00:27Z',
  'VideoViews': '48541'},


In [110]:
# Get the video IDs of the relevant videos
ben_vid_ids = set()

# Extracting all ids
for sublist in ben_vid_info:
    for video in sublist:
        ben_vid_ids.add((video['category'], video['id']))
        
len(ben_vid_ids)

79

In [24]:
ben_vid_ids

{'-wn3RWl1WUI',
 '0MokC95_ijI',
 '0hL06Zvvipw',
 '1c2jdkCsOok',
 '1jOPgFBqN8g',
 '1uFvrqKmMEQ',
 '6Te-0iSaccg',
 '8aVSOXfBOhE',
 '9YKacT3t5YY',
 'BGaHSP0GYc8',
 'BYIe2fLLEYE',
 'D4o2b4owypQ',
 'DYPlXrqT5-E',
 'DsetH7jBLj0',
 'DuxDkxOJTNc',
 'E8EQzIrjigg',
 'FmqVgE4MKCk',
 'G9z9RnAgShg',
 'GmeYUrGiI7Q',
 'IuuYl_zXjUw',
 'JffURyCMJCk',
 'JvUmxPGMpc4',
 'LHnOZHbvigE',
 'Nz6yVnYiAgI',
 'PpEUXwFdtAU',
 'Q35mm2J-K6I',
 'Q5ExLplIevA',
 'Q97kvTp55b4',
 'Qo0ionyh4nE',
 'R8mUAySVaRA',
 'RhLbz5hGwjc',
 'S4UaolRwqig',
 'SOUXgnXHxoE',
 'TKz4C5vR3UA',
 'TudMlddmerM',
 'U3Mg30xp16w',
 'UkPXHWdZ3jY',
 'VOkkGuOqQVY',
 'VbWdz9L8PQc',
 'ZlbgNVetuA8',
 'aWBLOqqAUTM',
 'aYBy8bq_Y-w',
 'cB_2J8tePvg',
 'cF24uzD9EMI',
 'cevBEXhytbs',
 'cnh9dYmAPfY',
 'cxrGfJfwOuA',
 'd9WK5FfTiaE',
 'dFugydLI2Pk',
 'evZmi1ASvGg',
 'fe5wVWrLAlg',
 'foydspxRJ0s',
 'g7rn_74UG-s',
 'hQblL2zpuGE',
 'j9sicDl-X8Y',
 'jCntwYIHy5k',
 'k4hga3Ahh08',
 'lAazJOk6e-I',
 'lOjGSHvwmEM',
 'lit6n-nEKjI',
 'llvhELwoymw',
 'm0An8qb5jSs',
 'mBahW_

In [111]:
# Get the top 30 relevant comments of each video
ben_comments = []

for category, ids in ben_vid_ids:
    ben_comm = get_vid_comments(ids, category, 30)
    ben_comments.append(ben_comm)

In [26]:
ben_comments

[[{'VideoId': 'BGaHSP0GYc8',
   'CommentId': 'UgzItfcMxvyQB-4sHZx4AaABAg',
   'CommentTitle': 'Give a liberal what they want and they’ll be miserable',
   'CommentCreationTime': '2023-09-09T02:19:37Z',
   'CommentLikes': 866},
  {'VideoId': 'BGaHSP0GYc8',
   'CommentId': 'UgzzXWs_8jtDGI1Woz94AaABAg',
   'CommentTitle': "Be careful what you wish for. But realize that he STILL doesn't think that he is any part of the problem - he's busy blaming others...",
   'CommentCreationTime': '2023-09-09T08:17:08Z',
   'CommentLikes': 28},
  {'VideoId': 'BGaHSP0GYc8',
   'CommentId': 'UgxaOU9KT6KXpIedMIF4AaABAg',
   'CommentTitle': 'They got what they voted for. No sympathy.',
   'CommentCreationTime': '2023-09-09T04:07:48Z',
   'CommentLikes': 327},
  {'VideoId': 'BGaHSP0GYc8',
   'CommentId': 'UgzV0YdUfmYsOvnU7JF4AaABAg',
   'CommentTitle': 'It was not a crisis until it was in your back yard',
   'CommentCreationTime': '2023-09-09T01:33:10Z',
   'CommentLikes': 531},
  {'VideoId': 'BGaHSP0GYc8',


In [112]:
# Count comments per videoId to ensure there is enough comments
ben_comment_count = defaultdict(int)

for video_comments in ben_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        ben_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in ben_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: wDcTRo0DvzU, Number of Comments: 30
Video ID: cnh9dYmAPfY, Number of Comments: 30
Video ID: mHKrE8nUXu8, Number of Comments: 30
Video ID: LHnOZHbvigE, Number of Comments: 30
Video ID: uJa0Tw9H_9g, Number of Comments: 30
Video ID: 6Te-0iSaccg, Number of Comments: 30
Video ID: UkPXHWdZ3jY, Number of Comments: 30
Video ID: IuuYl_zXjUw, Number of Comments: 30
Video ID: VOkkGuOqQVY, Number of Comments: 30
Video ID: 0hL06Zvvipw, Number of Comments: 30
Video ID: DsetH7jBLj0, Number of Comments: 30
Video ID: RhLbz5hGwjc, Number of Comments: 30
Video ID: JffURyCMJCk, Number of Comments: 30
Video ID: d9WK5FfTiaE, Number of Comments: 30
Video ID: ZlbgNVetuA8, Number of Comments: 30
Video ID: dFugydLI2Pk, Number of Comments: 30
Video ID: GmeYUrGiI7Q, Number of Comments: 30
Video ID: hQblL2zpuGE, Number of Comments: 30
Video ID: rkDsCwbcKzo, Number of Comments: 30
Video ID: SOUXgnXHxoE, Number of Comments: 30
Video ID: D4o2b4owypQ, Number of Comments: 30
Vi

In [113]:
# Count comments per category/ideology for a check
ben_comment_count_per_category = defaultdict(int)

for video_comments in ben_comments:
    for comment in video_comments:
        video_category = comment['category']
        ben_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in ben_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: climate, Number of Comments: 330
Video Category: economy, Number of Comments: 330
Video Category: immigration, Number of Comments: 360
Video Category: guns, Number of Comments: 330
Video Category: socioeco, Number of Comments: 330
Video Category: healthcare, Number of Comments: 360
Video Category: abortion, Number of Comments: 330


In [114]:
# Create a dict to map video ids to their corresponding details
ben_vid_details = {vid['id']: vid for sublist in ben_vid_info for vid in sublist}

# Comebine vid detail with comments
ben_result = []

for sublist in ben_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in ben_vid_details:
            details = ben_vid_details[vid_id].copy()
            details.update(item)
            ben_result.append(details)

In [115]:
len(ben_result)

2370

In [116]:
ben_result

[{'channel': 'Ben Shapiro',
  'id': 'wDcTRo0DvzU',
  'title': 'Climate Change vs. Nuclear Warfare',
  'keyword': 'Climate',
  'category': 'climate',
  'published_at': '2023-12-31T20:00:21Z',
  'VideoViews': '653648',
  'VideoId': 'wDcTRo0DvzU',
  'CommentId': 'UgwGhq5aTTOvBekLJVN4AaABAg',
  'CommentTitle': 'The biggest threat is these people.',
  'CommentCreationTime': '2023-12-31T20:23:40Z',
  'CommentLikes': 3190},
 {'channel': 'Ben Shapiro',
  'id': 'wDcTRo0DvzU',
  'title': 'Climate Change vs. Nuclear Warfare',
  'keyword': 'Climate',
  'category': 'climate',
  'published_at': '2023-12-31T20:00:21Z',
  'VideoViews': '653648',
  'VideoId': 'wDcTRo0DvzU',
  'CommentId': 'Ugx_CORLfVd-IQZif3Z4AaABAg',
  'CommentTitle': 'You call it climate change I call it the seasons change.',
  'CommentCreationTime': '2024-01-30T14:46:19Z',
  'CommentLikes': 16},
 {'channel': 'Ben Shapiro',
  'id': 'wDcTRo0DvzU',
  'title': 'Climate Change vs. Nuclear Warfare',
  'keyword': 'Climate',
  'category': '

In [117]:
ben_comments_df = pd.DataFrame(ben_result)

ben_comments_df.head()
ben_comments_df.shape

(2370, 12)

#### Steven Crowder

In [118]:
# Run function to get information about relevant videos
steven_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUIveFvW-ARp_B_RckhweNJw", keywords, "Steven Crowder", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    steven_vid_info.append(vid_info)

Fetching videos for category: isis
Found 2 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 3 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 1 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [34]:
steven_vid_info

[[{'channel': 'Steven Crowder',
   'id': '-25FgQnq2rY',
   'title': 'Is America Any Better Than ISIS? | Louder With Crowder',
   'keyword': 'ISIS',
   'published_at': '2016-03-28T22:58:03Z',
   'VideoViews': '234329'},
  {'channel': 'Steven Crowder',
   'id': 'UtPoRGvlRWA',
   'title': 'Famous Imam Praises ISIS, Condemns Christians! || Louder With Crowder',
   'keyword': 'ISIS',
   'published_at': '2015-01-29T00:31:16Z',
   'VideoViews': '473413'}],
 [{'channel': 'Steven Crowder',
   'id': 'M_BtwL83Sd0',
   'title': 'Super Bowl Parade Mass Shooting Cover Up & Putin Claims Biden Better Than Trump!',
   'keyword': 'Shooting',
   'published_at': '2024-02-15T16:23:24Z',
   'VideoViews': '184955'},
  {'channel': 'Steven Crowder',
   'id': 'nTirW5M8LW4',
   'title': '“You Should Probably Get a Gun.”',
   'keyword': 'Gun',
   'published_at': '2023-11-15T00:00:30Z',
   'VideoViews': '3354988'},
  {'channel': 'Steven Crowder',
   'id': 'ayVmDY-U4p8',
   'title': 'GUN WEEK w/ Mrgunsngear | Ep 5.

In [122]:
steven_vid_ids = set()

# Extracting all ids
for sublist in steven_vid_info:
    for video in sublist:
        steven_vid_ids.add((video['category'], video['id']))
        
len(steven_vid_ids)

61

In [125]:
steven_vid_ids

{('abortion', '2XIthI3y54A'),
 ('abortion', '9kX-Sbq1mQA'),
 ('abortion', 'CZzNEbhJnlU'),
 ('abortion', 'EvJwa5wAg4g'),
 ('abortion', 'NxrlTq_HOqI'),
 ('abortion', 'QpCY2aMtLCI'),
 ('abortion', 'Yl9Vrbe6l2Y'),
 ('abortion', 'b0gkLtNsGKI'),
 ('abortion', 'hOVe73tTKSk'),
 ('abortion', 'huqBj8o_Z-A'),
 ('abortion', 'tHubbUQHWzQ'),
 ('climate', '8fK_9p6ZFHc'),
 ('climate', '9Vqlq3wsWrI'),
 ('climate', 'CJBrJRCXJmA'),
 ('climate', 'KZRY1nYV51s'),
 ('climate', 'XmxPPx4S6zI'),
 ('climate', 'ZDK1aCqqZkQ'),
 ('climate', 'cVOOMyYde0c'),
 ('climate', 'hCoya-34xrQ'),
 ('climate', 'k5qumG-bBCo'),
 ('climate', 'keZnn4-Ec3Q'),
 ('climate', 'tuV3a-kcMaw'),
 ('economy', '2wly3eAr6Ko'),
 ('economy', 'XUT4QU7rTJs'),
 ('economy', 'wkx86NocjZc'),
 ('guns', '6cOlSIy8SDI'),
 ('guns', '7qFVyJitrYU'),
 ('guns', 'CJ-T8tFjnhc'),
 ('guns', 'COeI1ZHnVWY'),
 ('guns', 'IZ_iL3ra7a8'),
 ('guns', 'M_BtwL83Sd0'),
 ('guns', 'SRaTFGzOnSU'),
 ('guns', 'ayVmDY-U4p8'),
 ('guns', 'ddIv414Lu5Q'),
 ('guns', 'ifCw92mxLQo'),
 ('g

In [128]:
# Get the top 30 relevant comments of each video
steven_comments = []

for category, ids in steven_vid_ids:
    steven_comm = get_vid_comments(ids, category, 30)
    steven_comments.append(steven_comm)

In [129]:
len(steven_comments)

61

In [130]:
# Count comments per videoId to ensure there is enough comments
steven_comment_count = defaultdict(int)

for video_comments in steven_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        steven_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in steven_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: j9zIzIePjpo, Number of Comments: 30
Video ID: keZnn4-Ec3Q, Number of Comments: 30
Video ID: CJBrJRCXJmA, Number of Comments: 30
Video ID: YJGvoX73ylo, Number of Comments: 30
Video ID: 2XIthI3y54A, Number of Comments: 30
Video ID: COeI1ZHnVWY, Number of Comments: 30
Video ID: NxXwl4E9NcU, Number of Comments: 30
Video ID: hOVe73tTKSk, Number of Comments: 30
Video ID: 7qFVyJitrYU, Number of Comments: 30
Video ID: 6cOlSIy8SDI, Number of Comments: 60
Video ID: ddIv414Lu5Q, Number of Comments: 30
Video ID: onzWFIqKbaU, Number of Comments: 30
Video ID: 2wly3eAr6Ko, Number of Comments: 30
Video ID: -25FgQnq2rY, Number of Comments: 30
Video ID: ICnflzCW2GU, Number of Comments: 30
Video ID: huqBj8o_Z-A, Number of Comments: 30
Video ID: hCoya-34xrQ, Number of Comments: 30
Video ID: XmxPPx4S6zI, Number of Comments: 60
Video ID: DW-wJ1XA8dg, Number of Comments: 30
Video ID: J0-9alCN8wg, Number of Comments: 30
Video ID: sOH9v3300Ys, Number of Comments: 30
Vi

In [131]:
# Count comments per category/ideology for a check
steven_comment_count_per_category = defaultdict(int)

for video_comments in steven_comments:
    for comment in video_comments:
        video_category = comment['category']
        steven_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in steven_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: immigration, Number of Comments: 330
Video Category: climate, Number of Comments: 330
Video Category: abortion, Number of Comments: 330
Video Category: guns, Number of Comments: 330
Video Category: healthcare, Number of Comments: 330
Video Category: economy, Number of Comments: 90
Video Category: isis, Number of Comments: 60
Video Category: socioeco, Number of Comments: 30


In [132]:
# Create a dict to map video ids to their corresponding details
steven_vid_details = {vid['id']: vid for sublist in steven_vid_info for vid in sublist}

# Comebine vid detail with comments
steven_result = []

for sublist in steven_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in steven_vid_details:
            details = steven_vid_details[vid_id].copy()
            details.update(item)
            steven_result.append(details)

In [40]:
len(steven_result)

1770

In [133]:
steven_comments_df = pd.DataFrame(steven_result)

steven_comments_df.head()
steven_comments_df.shape

(1830, 12)

#### Fox News

Important Notes:
- MSNBC has more diverse set of videos, so to compensate for the lack of videos from other channels, more videos were taken for certain ideologies

In [135]:
fox_vid_info = []

In [136]:
# Run function to get information about relevant videos
for category, keywords in keyword_lists3.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUXIJgqnII2ZOINSWNOGFThA", keywords, "Fox News", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    fox_vid_info.append(vid_info)

Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [137]:
for category, keywords in keyword_isis.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUXIJgqnII2ZOINSWNOGFThA", keywords, "Fox News", category, 45)
    print(f"Found {len(vid_info)} videos for category {category}")
    fox_vid_info.append(vid_info)

Fetching videos for category: isis
Found 17 videos for category isis


In [138]:
for category, keywords in keyword_economy.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUXIJgqnII2ZOINSWNOGFThA", keywords, "Fox News", category, 20)
    print(f"Found {len(vid_info)} videos for category {category}")
    fox_vid_info.append(vid_info)

Fetching videos for category: economy
Found 21 videos for category economy


In [141]:
for category, keywords in keyword_socioeco.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUXIJgqnII2ZOINSWNOGFThA", keywords, "Fox News", category, 24)
    print(f"Found {len(vid_info)} videos for category {category}")
    fox_vid_info.append(vid_info)

Fetching videos for category: socioeco
Found 25 videos for category socioeco


In [91]:
fox_vid_info

[[{'channel': 'Fox News',
   'id': 'HfjRhwbs6TE',
   'title': 'Two minors charged for Chiefs parade shooting',
   'keyword': 'Shooting',
   'published_at': '2024-02-17T00:45:02Z',
   'VideoViews': '74413'},
  {'channel': 'Fox News',
   'id': 'l-4VF0ZqHCM',
   'title': 'Heroes stand out in Kansas City parade shooting | Will Cain Show',
   'keyword': 'Shooting',
   'published_at': '2024-02-15T18:08:42Z',
   'VideoViews': '11479'},
  {'channel': 'Fox News',
   'id': 'XOAyAUhuzLk',
   'title': 'Kansas City police give update on Chiefs parade shooting',
   'keyword': 'Shooting',
   'published_at': '2024-02-15T16:50:11Z',
   'VideoViews': '18998'},
  {'channel': 'Fox News',
   'id': 'GxZzruQK-GI',
   'title': 'Two suspects detained after Chiefs Super Bowl rally shooting',
   'keyword': 'Shooting',
   'published_at': '2024-02-14T22:36:24Z',
   'VideoViews': '45528'},
  {'channel': 'Fox News',
   'id': '-j0leLmhmCk',
   'title': "Police give update on Chiefs' Super Bowl rally shooting",
   'ke

In [142]:
# Get the video IDs of the relevant videos
fox_vid_ids = set()

# Extracting all ids
for sublist in fox_vid_info:
    for video in sublist:
        fox_vid_ids.add((video['category'], video['id']))
        
len(fox_vid_ids)

118

In [93]:
fox_vid_ids

{'-j0leLmhmCk',
 '0CDi9VE1ZfQ',
 '0UyURovQgLY',
 '0YCEZ6pkY0E',
 '10OXhm_rIEs',
 '1C5R-51-dec',
 '1d_Yq-qrPCg',
 '4-1rbmYDFNc',
 '4YBV_Mln6Jc',
 '4dySfZHaXdo',
 '4ewv6K8AegQ',
 '54I7g42B3g4',
 '54YeD03PrRE',
 '59W8eQDPtY0',
 '6-HOr_l-E6g',
 '6XdVo1wLU24',
 '76oJy8RaOaI',
 '7LdTuL8pD4w',
 '8ZqnrAcEpf8',
 '8gtuxrKPixk',
 '9GkC0cZ4ulM',
 '9ueobQdZvQY',
 'AA1O2Gme21g',
 'AKWr2kKHMwU',
 'AcfFsOpmrcU',
 'BwA5Ux3x8IQ',
 'C4uqtWyEdvU',
 'CxxwpCXuyNM',
 'D3OhKAH7e_k',
 'Ezchzw3IQsg',
 'FU-pweYiKFo',
 'FcmOqAyHrkU',
 'FmHUnRky3tE',
 'GV4OhZmNE4g',
 'GVkQzOVBpYQ',
 'Ga0zbSORBaQ',
 'GxZzruQK-GI',
 'H228eTsUsqE',
 'Hb0t7ufHJQI',
 'HfjRhwbs6TE',
 'IqE4duWrLMQ',
 'JJ1lVWSzc_Q',
 'JRYUmI8B9Vk',
 'Jw6-nZALSyc',
 'KbpdolRdrq0',
 'LETI51vmuds',
 'L_ZnuoZ2fUU',
 'LpYXsqTJPqI',
 'MNRhBQu6E94',
 'M_e1M9nKv0U',
 'NQUhmeDzmYI',
 'NcNArL1mdMQ',
 'OMPvJ3FGbUA',
 'PHla4wwFjVo',
 'PIUC0a3Qrh8',
 'PnZZU4YXhU4',
 'PzybJfdetNU',
 'QaHLt6Y886w',
 'RY1ltDb0CBA',
 'Rh3Fj6TbxKw',
 'RyGGK-ZvVj4',
 'S8JsCYET4qk',
 'S9XIHK

In [143]:
# Get the top 30 relevant comments of each video
fox_comments = []

for category, ids in fox_vid_ids:
    fox_comm = get_vid_comments(ids, category, 30)
    fox_comments.append(fox_comm)

Comments are disabled for the video with videoId: XOAyAUhuzLk
Comments are disabled for the video with videoId: NQUhmeDzmYI
Comments are disabled for the video with videoId: zbWfFpeRHoA
Comments are disabled for the video with videoId: LETI51vmuds
Comments are disabled for the video with videoId: FF585zJva30
Comments are disabled for the video with videoId: 8ZqnrAcEpf8


In [97]:
len(fox_comments)

120

In [144]:
# Count comments per videoId to ensure there is enough comments
fox_comment_count = defaultdict(int)

for video_comments in fox_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        fox_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in fox_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: z8d_DImqB3Q, Number of Comments: 30
Video ID: WJy0Y3pmHKQ, Number of Comments: 30
Video ID: wksImnIpb8s, Number of Comments: 30
Video ID: krmym8PeRjA, Number of Comments: 30
Video ID: ebPA9D92MY0, Number of Comments: 30
Video ID: LpYXsqTJPqI, Number of Comments: 30
Video ID: u5DWiQnRbso, Number of Comments: 30
Video ID: S9XIHKGw3PI, Number of Comments: 30
Video ID: AcfFsOpmrcU, Number of Comments: 30
Video ID: m4C5-uZv0kM, Number of Comments: 30
Video ID: AgP4aoQ2uMc, Number of Comments: 30
Video ID: 9GkC0cZ4ulM, Number of Comments: 30
Video ID: kw15JRoqMMs, Number of Comments: 30
Video ID: gnW4F-C7h0k, Number of Comments: 30
Video ID: JJ1lVWSzc_Q, Number of Comments: 30
Video ID: nc0LOj6-Yx4, Number of Comments: 30
Video ID: 10OXhm_rIEs, Number of Comments: 30
Video ID: tjlUVGJtUEE, Number of Comments: 30
Video ID: kkoM5hqrp9s, Number of Comments: 30
Video ID: yFmIaWsIABs, Number of Comments: 30
Video ID: l-4VF0ZqHCM, Number of Comments: 7
Vid

In [145]:
# Count comments per category/ideology for a check
fox_comment_count_per_category = defaultdict(int)

for video_comments in fox_comments:
    for comment in video_comments:
        video_category = comment['category']
        fox_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in fox_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")


Number of comments per video category:
Video Category: socioeco, Number of Comments: 747
Video Category: economy, Number of Comments: 630
Video Category: isis, Number of Comments: 450
Video Category: healthcare, Number of Comments: 330
Video Category: guns, Number of Comments: 277
Video Category: climate, Number of Comments: 316
Video Category: immigration, Number of Comments: 330
Video Category: abortion, Number of Comments: 240


In [146]:
# Create a dict to map video ids to their corresponding details
fox_vid_details = {vid['id']: vid for sublist in fox_vid_info for vid in sublist}

# Comebine vid detail with comments
fox_result = []

for sublist in fox_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in fox_vid_details:
            details = fox_vid_details[vid_id].copy()
            details.update(item)
            fox_result.append(details)

In [99]:
len(fox_result)

3381

In [100]:
fox_result

[{'channel': 'Fox News',
  'id': '4YBV_Mln6Jc',
  'title': 'Hunter Biden tries to get gun indictment dismissed',
  'keyword': 'Gun',
  'published_at': '2024-01-30T20:41:34Z',
  'VideoViews': '71229',
  'VideoId': '4YBV_Mln6Jc',
  'CommentId': 'Ugz1MXz2EMzjVp51Xtp4AaABAg',
  'CommentTitle': "Hunter committed a class II felony by lying on a federal firearm application to purchase a firearm illegally. In my state that's automatic jail time. The prosecutors have the evidence, the application with his signature, case closed !",
  'CommentCreationTime': '2024-01-30T21:14:36Z',
  'CommentLikes': 187},
 {'channel': 'Fox News',
  'id': '4YBV_Mln6Jc',
  'title': 'Hunter Biden tries to get gun indictment dismissed',
  'keyword': 'Gun',
  'published_at': '2024-01-30T20:41:34Z',
  'VideoViews': '71229',
  'VideoId': '4YBV_Mln6Jc',
  'CommentId': 'UgyYiCOPqUo4K6Yb5Et4AaABAg',
  'CommentTitle': 'Calling Hunter "the smartest man I know" is like calling him a tax paying citizen.',
  'CommentCreationTim

In [147]:
fox_comments_df = pd.DataFrame(fox_result)

fox_comments_df.head()
fox_comments_df.shape

(3320, 12)

#### Daily Wire Plus

In [148]:
# Run function to get information about relevant videos
dwire_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaeO5vkdj5xOQHp4UmIN6dw", keywords, "Daily Wire Plus", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    dwire_vid_info.append(vid_info)

Fetching videos for category: isis
Found 5 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 11 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [48]:
dwire_vid_info

[[{'channel': 'Daily Wire Plus',
   'id': 'a2njhiDJLL4',
   'title': 'The Left Targets Trump For Killing ISIS Leader',
   'keyword': 'ISIS',
   'published_at': '2019-10-28T21:48:28Z',
   'VideoViews': '88498'},
  {'channel': 'Daily Wire Plus',
   'id': 'UBS2X7N1J5E',
   'title': 'ISIS Is Dead, Kanye Is Alive | The Michael Knowles Show Ep. 439',
   'keyword': 'ISIS',
   'published_at': '2019-10-28T16:40:11Z',
   'VideoViews': '64681'},
  {'channel': 'Daily Wire Plus',
   'id': 'dvaSkAPjhO8',
   'title': 'ISIS Brides Now Want To Return To The USA',
   'keyword': 'ISIS',
   'published_at': '2019-02-22T16:10:18Z',
   'VideoViews': '15098'},
  {'channel': 'Daily Wire Plus',
   'id': 'e_6G1OAzEPk',
   'title': 'WINNING: ISIS Is Over | The Michael Knowles Show Ep. 62',
   'keyword': 'ISIS',
   'published_at': '2017-12-04T16:48:59Z',
   'VideoViews': '90249'},
  {'channel': 'Daily Wire Plus',
   'id': '9nVXMkqckjk',
   'title': 'Trump Ended ISIS In 11 Months',
   'keyword': 'ISIS',
   'publish

In [149]:
# Get the video IDs of the relevant videos
dwire_vid_ids = set()

# Extracting all ids
for sublist in dwire_vid_info:
    for video in sublist:
        dwire_vid_ids.add((video['category'], video['id']))
        
len(dwire_vid_ids)

82

In [50]:
dwire_vid_ids

{'20IRkFIeADU',
 '2mrdFcaPh7c',
 '4qUwGj3lXWM',
 '5NZDS9GUHJk',
 '5bT5BgojL2Y',
 '6nQtWfsTNns',
 '7RF35XKV0zs',
 '7yrwm-h_xsE',
 '8RVooYlyl20',
 '8Y92lXJOGuU',
 '8YFJRRcd20o',
 '9EKwY8M5UvQ',
 '9nVXMkqckjk',
 '9ySzcI1POFQ',
 'A63t0c2N8t4',
 'Aph1n8Ac6Po',
 'Av-PTU3uHhE',
 'AyqDFoO2Hsg',
 'B6W11MN3UqI',
 'DYJfH2TrhP0',
 'E6nh6dKlhhw',
 'F_pRtd0q_4k',
 'GBCVBn14nK0',
 'HhK-9YsGWtw',
 'HmPn02UGVvc',
 'JN9L1fN_Z74',
 'JVQeicBftmQ',
 'JppVNkU_hlE',
 'KSowkZo1YTg',
 'KUZYcJa3ESE',
 'L32m5VVuqnQ',
 'LNz7FbC-mGQ',
 'LeUfc0Lmn5A',
 'M6SnrPMJt9c',
 'MLGGzsiONZk',
 'MQd3PXJRbk0',
 'OFEe7r5y6vk',
 'OPW_l91e5Vw',
 'OsKo5sUQmPM',
 'P21rUAVZ37k',
 'Po5bnZevXF8',
 'Q8n5oxUi9bU',
 'Qze9fGnqmSg',
 'S3YpKXBMUI0',
 'TsctxEtXML0',
 'UBS2X7N1J5E',
 'UKjZu04EKaY',
 'UffQZLSlKy8',
 'VaM33tLIBWU',
 'Vg8mMGZKhiM',
 'W7P2E9yStx4',
 'W9gQovB9Az8',
 'Y02OPZoLWTI',
 'Y0NxK6PkwEc',
 'Y8WlxjQtBUw',
 'Z1vyU5qVPTE',
 'Z6oi-dHemkc',
 '_0gxO2m39UY',
 '_sVxAocfNIA',
 'a2njhiDJLL4',
 'c-7ef9Zv2fE',
 'cBEmK39XFYQ',
 'dLLJlk

In [150]:
# Get the top 30 relevant comments of each video
dwire_comments = []

for category, ids in dwire_vid_ids:
    dwire_comm = get_vid_comments(ids, category, 30)
    dwire_comments.append(dwire_comm)

In [52]:
dwire_comments

[[{'VideoId': 'kL6X6OYN4fg',
   'CommentId': 'UgyPyboikxu0i4IRKct4AaABAg',
   'CommentTitle': 'I would like to thank BLM for re-electing Donald J Trump as our President of law and order.',
   'CommentCreationTime': '2020-08-29T01:17:45Z',
   'CommentLikes': 1411},
  {'VideoId': 'kL6X6OYN4fg',
   'CommentId': 'UgzP7JGQ-yXBlJIiTh14AaABAg',
   'CommentTitle': 'When anything is reported on the mainstream news I always immediately think to myself "I wonder what really happened?"',
   'CommentCreationTime': '2020-08-29T02:37:15Z',
   'CommentLikes': 280},
  {'VideoId': 'kL6X6OYN4fg',
   'CommentId': 'UgzuC_tJn6N3XGRZe-J4AaABAg',
   'CommentTitle': 'There\'s something extremely perverse to me about only being outraged if the "victim" was of a certain skin tone.',
   'CommentCreationTime': '2020-08-29T06:11:55Z',
   'CommentLikes': 388},
  {'VideoId': 'kL6X6OYN4fg',
   'CommentId': 'Ugxt9ieNug9YhM2TedV4AaABAg',
   'CommentTitle': 'Y’all are sayin the name of a rapist and raising money for him,

In [151]:
# Count comments per videoId to ensure there is enough comments
dwire_comment_count = defaultdict(int)

for video_comments in dwire_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        dwire_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in dwire_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: 7yrwm-h_xsE, Number of Comments: 30
Video ID: qRtVgsHiWyA, Number of Comments: 30
Video ID: 4qUwGj3lXWM, Number of Comments: 30
Video ID: pLkx96GwZps, Number of Comments: 30
Video ID: uvBpx7wtdrc, Number of Comments: 30
Video ID: E6nh6dKlhhw, Number of Comments: 30
Video ID: uC0_G1WF8-s, Number of Comments: 30
Video ID: M6SnrPMJt9c, Number of Comments: 30
Video ID: UKjZu04EKaY, Number of Comments: 30
Video ID: Po5bnZevXF8, Number of Comments: 30
Video ID: MQd3PXJRbk0, Number of Comments: 30
Video ID: 2mrdFcaPh7c, Number of Comments: 30
Video ID: dLLJlk6b9RA, Number of Comments: 30
Video ID: Y8WlxjQtBUw, Number of Comments: 30
Video ID: A63t0c2N8t4, Number of Comments: 30
Video ID: cBEmK39XFYQ, Number of Comments: 30
Video ID: B6W11MN3UqI, Number of Comments: 30
Video ID: Aph1n8Ac6Po, Number of Comments: 30
Video ID: Av-PTU3uHhE, Number of Comments: 30
Video ID: k9P3Evu0mFQ, Number of Comments: 30
Video ID: W9gQovB9Az8, Number of Comments: 30
Vi

In [152]:
# Count comments per category/ideology for a check
dwire_comment_count_per_category = defaultdict(int)

for video_comments in dwire_comments:
    for comment in video_comments:
        video_category = comment['category']
        dwire_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in dwire_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")


Number of comments per video category:
Video Category: guns, Number of Comments: 330
Video Category: climate, Number of Comments: 330
Video Category: abortion, Number of Comments: 319
Video Category: healthcare, Number of Comments: 330
Video Category: economy, Number of Comments: 330
Video Category: socioeco, Number of Comments: 316
Video Category: immigration, Number of Comments: 308
Video Category: isis, Number of Comments: 144


In [153]:
# Create a dict to map video ids to their corresponding details
dwire_vid_details = {vid['id']: vid for sublist in dwire_vid_info for vid in sublist}

# Comebine vid detail with comments
dwire_result = []

for sublist in dwire_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in dwire_vid_details:
            details = dwire_vid_details[vid_id].copy()
            details.update(item)
            dwire_result.append(details)

In [57]:
len(dwire_result)

2407

In [58]:
dwire_result

[{'channel': 'Daily Wire Plus',
  'id': 'kL6X6OYN4fg',
  'title': 'Narrative-Busting Details Emerge in Kenosha Shooting',
  'keyword': 'Shooting',
  'published_at': '2020-08-29T01:13:20Z',
  'VideoViews': '828292',
  'VideoId': 'kL6X6OYN4fg',
  'CommentId': 'UgyPyboikxu0i4IRKct4AaABAg',
  'CommentTitle': 'I would like to thank BLM for re-electing Donald J Trump as our President of law and order.',
  'CommentCreationTime': '2020-08-29T01:17:45Z',
  'CommentLikes': 1411},
 {'channel': 'Daily Wire Plus',
  'id': 'kL6X6OYN4fg',
  'title': 'Narrative-Busting Details Emerge in Kenosha Shooting',
  'keyword': 'Shooting',
  'published_at': '2020-08-29T01:13:20Z',
  'VideoViews': '828292',
  'VideoId': 'kL6X6OYN4fg',
  'CommentId': 'UgzP7JGQ-yXBlJIiTh14AaABAg',
  'CommentTitle': 'When anything is reported on the mainstream news I always immediately think to myself "I wonder what really happened?"',
  'CommentCreationTime': '2020-08-29T02:37:15Z',
  'CommentLikes': 280},
 {'channel': 'Daily Wire

In [154]:
dwire_comments_df = pd.DataFrame(dwire_result)

dwire_comments_df.head()
dwire_comments_df.shape

(2407, 12)

#### Daily Mail

In [156]:
# Run function to get information about relevant videos
dmail_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUw3fku0sH3qA3c3pZeJwdAw", keywords, "Daily Mail", category, 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    dmail_vid_info.append(vid_info)

Fetching videos for category: isis
Found 2 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 11 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 12 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [70]:
dmail_vid_info

[[{'channel': 'Daily Mail',
   'id': 'RWNTKCX3R48',
   'title': '"Isis bride" Shamima Begum loses challenge over British citizenship',
   'keyword': 'ISIS',
   'published_at': '2023-02-22T13:15:00Z',
   'VideoViews': '1606'},
  {'channel': 'Daily Mail',
   'id': 'SugmVXfr61Y',
   'title': "Biden reveals 'cowardly' ISIS leader is dead in US Commandos raid",
   'keyword': 'ISIS',
   'published_at': '2022-02-03T17:41:52Z',
   'VideoViews': '32245'}],
 [{'channel': 'Daily Mail',
   'id': '8drtj3SPW1A',
   'title': "Israel's Iron Dome shoots down Hezbollah rockets fired from Lebanon",
   'keyword': 'Shooting',
   'published_at': '2024-02-12T12:14:21Z',
   'VideoViews': '17386'},
  {'channel': 'Daily Mail',
   'id': 'vSBCTxBLSdc',
   'title': 'Ukrainian soldiers fire at Russian targets with heavy machine guns',
   'keyword': 'Gun',
   'published_at': '2024-02-07T16:34:54Z',
   'VideoViews': '47307'},
  {'channel': 'Daily Mail',
   'id': 'Rq-qbGcIKHs',
   'title': 'Ukraine brigades charge at 

In [157]:
# Get the video IDs of the relevant videos
dmail_vid_ids = set()

# Extracting all ids
for sublist in dmail_vid_info:
    for video in sublist:
        dmail_vid_ids.add((video['category'], video['id']))
        
len(dmail_vid_ids)

80

In [72]:
dmail_vid_ids

{'0CNdU-5MK2A',
 '1Y9TcMSucJY',
 '1hnQl7yaj8U',
 '2z-ROR4dsZU',
 '3MCWL6FRw-g',
 '3_9N-v7peR8',
 '4qYOA0q2YYY',
 '51ORO-mrWls',
 '5Vc9zPldmws',
 '5aI3cgVmUr0',
 '5mDmcre_D4Y',
 '5uxlWd_WxfE',
 '7V0tjdZV7Fo',
 '8EVhYMW0CrU',
 '8YjEbzdljOo',
 '8drtj3SPW1A',
 '9bBA5JiwvrA',
 'AL04SgvxisQ',
 'BxtMkctwVWI',
 'DkJ1pznb4vM',
 'E6-Y0RFcjxw',
 'E9LhGW57xhQ',
 'H8UGpQqZwu0',
 'IbtUtNUqMh0',
 'JUMfXIgd8gA',
 'JuuJ6zNWM0M',
 'LHM_SBxVmEA',
 'LMes4dOjBJs',
 'LohK0pnqyW8',
 'MhN5Jo-byEE',
 'NcPFKEp3-8U',
 'NyMQo8Yrr2A',
 'O92bsxHx0cI',
 'PJeX75_j7nI',
 'PqvjFUAjbk8',
 'RWNTKCX3R48',
 'RbpZbP_30BA',
 'Rq-qbGcIKHs',
 'SugmVXfr61Y',
 'TanWlYE85Z0',
 'U3dOyBCMxY0',
 'Ucwinxq8-I0',
 'VWXsze99UgE',
 'VZttJ3OzBH0',
 'W5eUEUOE_IA',
 'WILTO-XGWzQ',
 'XKXWpmQZzH8',
 'Y8dRxzQdx8g',
 'ZW55qqa89BU',
 '_95SNvMUm88',
 '_Gi9WpSh1OY',
 '_vw2Q-yQ9VA',
 'aAmZuXt4nlQ',
 'azLtJS3YOTc',
 'bwa1ezCOMew',
 'cC8cguUeRQg',
 'cRa3mYk7Sfs',
 'cYyXnLyLodQ',
 'ec2LUct5rVc',
 'f-LQKgWnd1c',
 'fFFoK0gOt7I',
 'hrHn_TZdnn8',
 'k_p0Yl

In [158]:
# Get the top 30 relevant comments of each video
dmail_comments = []

for category, ids in dmail_vid_ids:
    dmail_comm = get_vid_comments(ids, category, 30)
    dmail_comments.append(dmail_comm)

Comments are disabled for the video with videoId: ZW55qqa89BU


In [74]:
dmail_comments

[[{'VideoId': 'RWNTKCX3R48',
   'CommentId': 'UgxP34s9rNgTXlzs5Ax4AaABAg',
   'CommentTitle': 'Who is paying  Her   Lawyers  I hope it is not the British taxpayer Someone should find out',
   'CommentCreationTime': '2023-02-22T22:22:39Z',
   'CommentLikes': 7},
  {'VideoId': 'RWNTKCX3R48',
   'CommentId': 'UgwslEkG_bdg8B49iPp4AaABAg',
   'CommentTitle': 'Why does it say 5 comments but only see 1',
   'CommentCreationTime': '2023-02-23T00:27:44Z',
   'CommentLikes': 1},
  {'VideoId': 'RWNTKCX3R48',
   'CommentId': 'UgwlDvnAulA-0O7J7a54AaABAg',
   'CommentTitle': 'Hiding comments that’s shocking man',
   'CommentCreationTime': '2023-02-23T00:28:18Z',
   'CommentLikes': 0},
  {'VideoId': 'RWNTKCX3R48',
   'CommentId': 'UgwYRaHf2YviaoNtqiB4AaABAg',
   'CommentTitle': 'Its probably the british government paying for the lawyers hahaha full circle as fk',
   'CommentCreationTime': '2023-02-23T05:43:43Z',
   'CommentLikes': 2}],
 [{'VideoId': 'k_p0YlvlAHc',
   'CommentId': 'UgwZhcLuwjDRfsMrHMt

In [159]:
# Count comments per videoId to ensure there is enough comments
dmail_comment_count = defaultdict(int)

for video_comments in dmail_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        dmail_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in dmail_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: 5mDmcre_D4Y, Number of Comments: 17
Video ID: Ucwinxq8-I0, Number of Comments: 30
Video ID: ze6Vcc6BRck, Number of Comments: 5
Video ID: 9bBA5JiwvrA, Number of Comments: 30
Video ID: WILTO-XGWzQ, Number of Comments: 30
Video ID: sob8qbvx6rQ, Number of Comments: 6
Video ID: aAmZuXt4nlQ, Number of Comments: 30
Video ID: cYyXnLyLodQ, Number of Comments: 13
Video ID: DkJ1pznb4vM, Number of Comments: 2
Video ID: W5eUEUOE_IA, Number of Comments: 1
Video ID: Rq-qbGcIKHs, Number of Comments: 30
Video ID: azLtJS3YOTc, Number of Comments: 30
Video ID: xgbtAA8sGcY, Number of Comments: 1
Video ID: 0CNdU-5MK2A, Number of Comments: 9
Video ID: 8drtj3SPW1A, Number of Comments: 18
Video ID: JUMfXIgd8gA, Number of Comments: 1
Video ID: E6-Y0RFcjxw, Number of Comments: 9
Video ID: qj8Z0YempXY, Number of Comments: 9
Video ID: 8EVhYMW0CrU, Number of Comments: 7
Video ID: JuuJ6zNWM0M, Number of Comments: 29
Video ID: E9LhGW57xhQ, Number of Comments: 30
Video ID: 1h

In [160]:
# Count comments per category/ideology for a check
dmail_comment_count_per_category = defaultdict(int)

for video_comments in dmail_comments:
    for comment in video_comments:
        video_category = comment['category']
        dmail_comment_count_per_category[video_category] += 1

print("Number of comments per video category:")

for category, count in dmail_comment_count_per_category.items():
    print(f"Video Category: {category}, Number of Comments: {count}")

Number of comments per video category:
Video Category: socioeco, Number of Comments: 225
Video Category: climate, Number of Comments: 116
Video Category: guns, Number of Comments: 254
Video Category: immigration, Number of Comments: 135
Video Category: abortion, Number of Comments: 192
Video Category: healthcare, Number of Comments: 121
Video Category: economy, Number of Comments: 67
Video Category: isis, Number of Comments: 34


In [161]:
# Create a dict to map video ids to their corresponding details
dmail_vid_details = {vid['id']: vid for sublist in dmail_vid_info for vid in sublist}

# Comebine vid detail with comments
dmail_result = []

for sublist in dmail_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in dmail_vid_details:
            details = dmail_vid_details[vid_id].copy()
            details.update(item)
            dmail_result.append(details)

In [77]:
len(dmail_result)

1116

In [78]:
dmail_result

[{'channel': 'Daily Mail',
  'id': 'RWNTKCX3R48',
  'title': '"Isis bride" Shamima Begum loses challenge over British citizenship',
  'keyword': 'ISIS',
  'published_at': '2023-02-22T13:15:00Z',
  'VideoViews': '1606',
  'VideoId': 'RWNTKCX3R48',
  'CommentId': 'UgxP34s9rNgTXlzs5Ax4AaABAg',
  'CommentTitle': 'Who is paying  Her   Lawyers  I hope it is not the British taxpayer Someone should find out',
  'CommentCreationTime': '2023-02-22T22:22:39Z',
  'CommentLikes': 7},
 {'channel': 'Daily Mail',
  'id': 'RWNTKCX3R48',
  'title': '"Isis bride" Shamima Begum loses challenge over British citizenship',
  'keyword': 'ISIS',
  'published_at': '2023-02-22T13:15:00Z',
  'VideoViews': '1606',
  'VideoId': 'RWNTKCX3R48',
  'CommentId': 'UgwslEkG_bdg8B49iPp4AaABAg',
  'CommentTitle': 'Why does it say 5 comments but only see 1',
  'CommentCreationTime': '2023-02-23T00:27:44Z',
  'CommentLikes': 1},
 {'channel': 'Daily Mail',
  'id': 'RWNTKCX3R48',
  'title': '"Isis bride" Shamima Begum loses cha

In [162]:
dmail_comments_df = pd.DataFrame(dmail_result)

dmail_comments_df.head()
dmail_comments_df.shape

(1144, 12)

### Combine ALL DF and Save as CSV

In [163]:
repub_comment_df = pd.concat([ben_comments_df, steven_comments_df, fox_comments_df, dwire_comments_df, dmail_comments_df], ignore_index=True)

In [164]:
repub_comment_df.shape

(11071, 12)

In [165]:
# Save df to a CSV file
repub_comment_df.to_csv("republican_comments.csv", index=False)

In [171]:
vid_count_df = repub_comment_df.groupby(['category']).agg({'id': 'nunique'})

In [172]:
comment_count_df = repub_comment_df.groupby(['category']).agg({'CommentId': 'count'})

In [175]:
# Combine tables on the 'category' index
repub_count_df = pd.concat([vid_count_df, comment_count_df], axis=1)
repub_count_df.columns = ['Video_Count', 'Comment_Count']
repub_count_df.index.name = 'ideology'

In [176]:
repub_count_df

Unnamed: 0_level_0,Video_Count,Comment_Count
ideology,Unnamed: 1_level_1,Unnamed: 2_level_1
abortion,53,1411
climate,55,1422
economy,55,1447
guns,54,1521
healthcare,56,1471
immigration,54,1463
isis,24,688
socioeco,59,1648


In [177]:
dem_vid_count_df = dem_comment_df.groupby(['category']).agg({'id': 'nunique'})

In [178]:
dem_comment_count_df = dem_comment_df.groupby(['category']).agg({'CommentId': 'count'})

In [179]:
# Combine tables on the 'category' index
dem_count_df = pd.concat([dem_vid_count_df, dem_comment_count_df], axis=1)
dem_count_df.columns = ['Video_Count', 'Comment_Count']
dem_count_df.index.name = 'ideology'

In [181]:
dem_count_df

Unnamed: 0_level_0,Video_Count,Comment_Count
ideology,Unnamed: 1_level_1,Unnamed: 2_level_1
abortion,54,1551
climate,55,1562
economy,56,1676
guns,54,1603
healthcare,56,1642
immigration,55,1552
isis,36,979
socioeco,56,1621


# Step 3

In [11]:
right_comment_df = pd.read_csv('Project_yt_comments.csv')
right_title_df = pd.read_csv('Project_yt_titles.csv')
demo_df = pd.read_csv('combine_democ_comments.csv')

  right_comment_df = pd.read_csv('Project_yt_comments.csv')


In [12]:
def textcleaner(row):
    row = str(row)
    row = row.lower()
    # remove punctuation
    row = re.sub(r'[^\w\s]', '', row)
    #remove urls
    row  = re.sub(r'http\S+', '', row)
    #remove mentions
    row = re.sub(r"(?<![@\w])@(\w{1,25})", '', row)
    #remove hashtags
    row = re.sub(r"(?<![#\w])#(\w{1,25})", '',row)
    #remove other special characters
    row = re.sub('[^A-Za-z .-]+', '', row)
        #remove digits
    row = re.sub('\d+', '', row)
    row = row.strip(" ")
    row = re.sub('\s+', ' ', row)
    return row
    
stopeng = set(stopwords.words('english'))
def remove_stop(text):
    try:
        words = text.split(' ')
        valid = [x for x in words if x not in stopeng]
        return(' '.join(valid))
    except AttributeError:
        return('')

In [36]:
# Drop NaN
right_comment_df = right_comment_df.dropna()
right_title_df = right_title_df.dropna()
demo_df = demo_df.dropna()

In [37]:
# Change from datetime to date
right_comment_df['CommentCreationTime'] = right_comment_df['CommentCreationTime'].apply(lambda x: datetime.strptime(str(x)[0:10], '%Y-%m-%d').date())
right_title_df['published_at'] = right_title_df['published_at'].apply(lambda x: datetime.strptime(str(x)[0:10], '%Y-%m-%d').date())
demo_df['CommentCreationTime'] = demo_df['CommentCreationTime'].apply(lambda x: datetime.strptime(str(x)[0:10], '%Y-%m-%d').date())
demo_df['published_at'] = demo_df['published_at'].apply(lambda x: datetime.strptime(str(x)[0:10], '%Y-%m-%d').date())

In [40]:
# Tokenize
right_comment_df['TweetToken'] = right_comment_df['CommentTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))
right_title_df['TweetToken'] = right_title_df['title'].apply(lambda x: casual.TweetTokenizer().tokenize(x))
demo_df['TweetTokenTitle'] = demo_df['title'].apply(lambda x: casual.TweetTokenizer().tokenize(x))
demo_df['TweetTokenComment'] = demo_df['CommentTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))


In [41]:
# Clean text
right_comment_df['CommentCleaned'] = right_comment_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))
right_title_df['TitleCleaned'] = right_title_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))
demo_df['TitleCleaned'] = demo_df['TweetTokenTitle'].apply(lambda x: remove_stop(textcleaner(x)))
demo_df['CommentCleaned'] = demo_df['TweetTokenComment'].apply(lambda x: remove_stop(textcleaner(x)))


In [44]:
def nrc_sen(text, cat):
    sen = NRCLex(text)
    if cat == 'pos':
        return sen.affect_frequencies['positive']
    else:
        return sen.affect_frequencies['negative']

In [45]:
right_comment_df['PositiveScore'] = right_comment_df['CommentCleaned'].apply(lambda x: nrc_sen(x, 'pos'))
right_comment_df['NegativeScore'] = right_comment_df['CommentCleaned'].apply(lambda x: nrc_sen(x, 'neg'))        
right_title_df['PositiveScore'] = right_title_df['TitleCleaned'].apply(lambda x: nrc_sen(x, 'pos'))
right_title_df['NegativeScore'] = right_title_df['TitleCleaned'].apply(lambda x: nrc_sen(x, 'neg'))        

demo_df['PositiveScoreTitle'] = demo_df['TitleCleaned'].apply(lambda x: nrc_sen(x, 'pos'))
demo_df['NegativeScoreTitle'] = demo_df['TitleCleaned'].apply(lambda x: nrc_sen(x, 'neg'))    
demo_df['PositiveScoreComment'] = demo_df['CommentCleaned'].apply(lambda x: nrc_sen(x, 'pos'))    
demo_df['NegativeScoreComment'] = demo_df['CommentCleaned'].apply(lambda x: nrc_sen(x, 'neg'))        

In [48]:
def nrc_emo(text, ver):
    emo = NRCLex(text).affect_frequencies
    max_emo = max(emo, key=emo.get)
    max_score = emo[max_emo]
    if ver == 'score':
        return max_score
    else:
        return max_emo

In [52]:
right_comment_df['Emotion'] = right_comment_df['CommentCleaned'].apply(lambda x: nrc_emo(x, 'emo'))
right_comment_df['EmotionScore'] = right_comment_df['CommentCleaned'].apply(lambda x: nrc_emo(x, 'score'))        
right_title_df['Emotion'] = right_title_df['TitleCleaned'].apply(lambda x: nrc_emo(x, 'emo'))
right_title_df['EmotionScore'] = right_title_df['TitleCleaned'].apply(lambda x: nrc_emo(x, 'score'))        

demo_df['EmotionTitle'] = demo_df['TitleCleaned'].apply(lambda x: nrc_emo(x, 'emo'))
demo_df['EmotionScoreTitle'] = demo_df['TitleCleaned'].apply(lambda x: nrc_emo(x, 'score'))    
demo_df['EmotionComment'] = demo_df['CommentCleaned'].apply(lambda x: nrc_emo(x, 'emo'))
demo_df['EmotionScoreComment'] = demo_df['CommentCleaned'].apply(lambda x: nrc_emo(x, 'score'))    

In [54]:
demo_df

Unnamed: 0,channel,video_id,title,keyword,published_at,CommentId,CommentTitle,CommentCreationTime,CommentLikes,TitleCleaned,...,PositiveScoreTitle,NegativeScoreTitle,PositiveScoreComment,NegativeScoreComment,Emotion,EmotionScore,EmotionTitle,EmotionScoreTitle,EmotionComment,EmotionScoreComment
0,Vice,SwoRx3tstxY,We Uncovered an ISIS Mass Grave | Super Users,ISIS,2022-04-11,Ugws1dFQrp7AovnexrB4AaABAg,Bless the hard work of journalists! Seeing the...,2022-04-12,146,uncovered isis mass grave super users,...,0.000000,0.333333,0.066667,0.133333,fear,0.333333,fear,0.333333,fear,0.200000
1,Vice,SwoRx3tstxY,We Uncovered an ISIS Mass Grave | Super Users,ISIS,2022-04-11,UgxcNLZW2rAeMBklWD14AaABAg,Also I can't imagine the amount mental trauma...,2022-04-11,726,uncovered isis mass grave super users,...,0.000000,0.333333,0.230769,0.076923,fear,0.333333,fear,0.333333,positive,0.230769
2,Vice,SwoRx3tstxY,We Uncovered an ISIS Mass Grave | Super Users,ISIS,2022-04-11,UgzFcqbEHILJ93hjvqh4AaABAg,This is so heartbreaking. What a horrific disp...,2022-04-11,251,uncovered isis mass grave super users,...,0.000000,0.333333,0.000000,0.222222,fear,0.333333,fear,0.333333,fear,0.222222
3,Vice,SwoRx3tstxY,We Uncovered an ISIS Mass Grave | Super Users,ISIS,2022-04-11,UgxYrGf2RJZeKbFEap14AaABAg,7:35 Social-media shouldn't be just summarily ...,2022-04-11,219,uncovered isis mass grave super users,...,0.000000,0.333333,0.166667,0.166667,fear,0.333333,fear,0.333333,fear,0.166667
4,Vice,SwoRx3tstxY,We Uncovered an ISIS Mass Grave | Super Users,ISIS,2022-04-11,Ugx5rmcME6ua2pMkojt4AaABAg,VICE NEVER Dissapoints! Amazing documentaries!...,2022-04-11,127,uncovered isis mass grave super users,...,0.000000,0.333333,0.285714,0.071429,fear,0.333333,fear,0.333333,positive,0.285714
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2416,MSNBC,BZ_f66aoZ0I,Kimberly Atkins Stohr: GA District Attorney Fa...,Gas,2024-02-16,UgzBO4V2k1_ayMG_1tF4AaABAg,We don't trust bank s,2024-02-18,1,kimberly atkins stohr ga district attorney fan...,...,0.333333,0.000000,0.000000,0.000000,fear,0.333333,fear,0.333333,trust,1.000000
2417,MSNBC,BZ_f66aoZ0I,Kimberly Atkins Stohr: GA District Attorney Fa...,Gas,2024-02-16,UgxJiE3xA5crSIatswx4AaABAg,MSNBC at its worst. Fani Willis the new atm ca...,2024-02-18,3,kimberly atkins stohr ga district attorney fan...,...,0.333333,0.000000,0.142857,0.000000,fear,0.333333,fear,0.333333,trust,0.285714
2418,MSNBC,BZ_f66aoZ0I,Kimberly Atkins Stohr: GA District Attorney Fa...,Gas,2024-02-16,UgxXK_3Y17CdewRaIMJ4AaABAg,I bet Farni is not the only prosecutor in Ga. ...,2024-02-17,9,kimberly atkins stohr ga district attorney fan...,...,0.333333,0.000000,0.000000,1.000000,fear,0.333333,fear,0.333333,negative,1.000000
2419,MSNBC,BZ_f66aoZ0I,Kimberly Atkins Stohr: GA District Attorney Fa...,Gas,2024-02-16,Ugzd5fBC6pIR8k7JI214AaABAg,Exactly what defence can she use and according...,2024-02-16,6,kimberly atkins stohr ga district attorney fan...,...,0.333333,0.000000,0.074074,0.111111,fear,0.333333,fear,0.333333,anger,0.222222


## Andy's Section

In [None]:
# Function for retrieving the upload playlist id of a channel
def get_upload_id(channel):
    request = youtube.channels().list(part='contentDetails', forUsername=channel)
    res = request.execute()
    return res["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]

# Function for retrieving all vids within the upload playlist of a channel, stopping once a limit INT has been reached
def get_vids(channel, limit, keywords, ideology):
    
    # Output list
    vid_lst=[]

    request = youtube.playlistItems().list(part='snippet',playlistId=get_upload_id(channel),maxResults=50)
        
    res = request.execute()
    nextPageToken = res['nextPageToken']

    # Iterate through each video in the playlist
    for v in res["items"]:

        # Normalization of video title to check for keywords
        title = v['snippet']['title']
        title = title.lower()
        title = re.sub(r'[^\w\s]','', title)

        # Check for key words. If key word detected, then counter +1. If counter > 0, then the post will be flagged and added.
        counter = 0
        for word in title.split():
            counter = 0
            if word in keywords:
                counter += 1
        if counter == 0:
            continue

        # Create temp dictionary per video, and add video-specific information to dictionary
        vid_dict = {}
        vid_dict['ChannelName'] = v['snippet']['channelTitle']
        vid_dict['VideoId'] = v['snippet']['resourceId']['videoId']
        vid_dict['VideoTitle'] = v['snippet']['title']
        vid_dict['Ideology'] = ideology

        # Separate Resource Call to retrieve video views
        views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
        view_temp = views.execute()
        vid_dict['VideoViews'] = view_temp['items'][0]['statistics']['viewCount']

        # Append dictionary to greater list
        vid_lst.append(vid_dict)

    # Iterate until no more next page
    while nextPageToken:
        try:
            request = youtube.playlistItems().list(part='snippet', playlistId=get_upload_id(channel), maxResults=50, pageToken = res['nextPageToken'])                
            res = request.execute()

            # Redefine next page token to check @ next iteration
            nextPageToken = res['nextPageToken']

            # Iterate through each video
            for v in res["items"]:

                # Normalization of video title to check for keywords
                title = v['snippet']['title']
                title = title.lower()
                title = re.sub(r'[^\w\s]','', title)

                # Check for key words. If key word detected, then counter +1. If counter > 0, then the post will be flagged and added.
                counter = 0
                for word in title.split():
                    if word in keywords:
                        counter += 1
                if counter == 0:
                    continue

                # Create temp dictionary per video, and add video-specific information to dictionary
                vid_dict = {}
                vid_dict['ChannelName'] = v['snippet']['channelTitle']
                vid_dict['VideoId'] = v['snippet']['resourceId']['videoId']
                vid_dict['VideoTitle'] = v['snippet']['title']
                                
                # Separate Resource Call to retrieve video views
                views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                view_temp = views.execute()
                vid_dict['VideoViews'] = view_temp['items'][0]['statistics']['viewCount']
                
                vid_lst.append(vid_dict)

            # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
            if len(vid_lst) >= limit:
                return(vid_lst)

        # Error case handling
        except KeyError:
            break

# Function for getting top 30 relevant comments for a list of videos
def get_vid_comments(vid_lst, limit):
    vids_final = []

    # Iterate through each video in the video list
    for vid in vid_lst:
        
        request = youtube.commentThreads().list(videoId=vid['VideoId'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50)
        res = request.execute()

        # Iterate through each comment
        for v in res["items"]:
            
            # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
            vid_temp = copy.copy(vid)
            vid_temp.update({'CommentId':v['id']})
            vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
            vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
            vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
            vids_final.append(vid_temp)

        while nextPageToken:
            try:
                request = youtube.commentThreads().list(videoId=vid['VideoId'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50)
                res = request.execute()
        
                nextPageToken = res['nextPageToken']
                
                for v in res["items"]:
                    # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                    vid_temp = copy.copy(vid)
                    vid_temp.update({'CommentId':v['id']})
                    vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                    vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                    vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                    vids_final.append(vid_temp)
                    
                # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
                if len(vids_final) >= limit:
                    return(vids_final)
            except KeyError:
                break
            
    return vids_final

# from Lab9
def textcleaner(row):
    row = str(row)
    row = row.lower()
    # remove punctuation
    row = re.sub(r'[^\w\s]', '', row)
    #remove urls
    row  = re.sub(r'http\S+', '', row)
    #remove mentions
    row = re.sub(r"(?<![@\w])@(\w{1,25})", '', row)
    #remove hashtags
    row = re.sub(r"(?<![#\w])#(\w{1,25})", '',row)
    #remove other special characters
    row = re.sub('[^A-Za-z .-]+', '', row)
        #remove digits
    row = re.sub('\d+', '', row)
    row = row.strip(" ")
    row = re.sub('\s+', ' ', row)
    return row
    
stopeng = set(stopwords.words('english'))
def remove_stop(text):
    try:
        words = text.split(' ')
        valid = [x for x in words if x not in stopeng]
        return(' '.join(valid))
    except AttributeError:
        return('')

def df_clean_process(df):

    # Change datetime to date
    df['VideoPublishedDate'] = df['VideoPublishedDate'].apply(lambda x: datetime.strptime(x[0:10], '%Y-%m-%d').date())
    df['CommentCreationTime'] = df['CommentCreationTime'].apply(lambda x: datetime.strptime(x[0:10], '%Y-%m-%d').date())

    # Check NaN, if < 10% of total dataset, drop NaN
    if df.isnull().values.any():
        if len(df[df.isna().any(axis=1)]) < len(df) * 0.1:
            df = df.dropna()

    # Split into separate df for computational load reduction
    title_df = df[['ChannelName', 'VideoTitle', 'VideoPublishedDate', 'VideoViews', 'Ideology']].drop_duplicates()
    comment_df = df[['ChannelName', 'VideoViews', 'CommentTitle', 'CommentCreationTime', 'CommentLikes', 'Ideology']]

    # tokenize
    title_df['TweetToken'] = title_df['VideoTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))
    comment_df['TweetToken'] = comment_df['CommentTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))

    # clean
    title_df['Cleaned'] = title_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))
    comment_df['Cleaned'] = comment_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))

    return (title_df, comment_df)

    # Sentiment analysis

In [None]:
# define channels
channels_left = ['VICE', 'Vox', 'MSNBC', 'The Daily Show', 'TheYoungTurks']
channels_right = ['Fox News', 'Ben Shapiro', 'StevenCrowder', 'Daily Mail', 'DailyWire+']

# define key ideologies/associated keywords to look for in title
isis_keywords = ['terrorism', 'terrorist', 'extremism', 'radicalist', 'radicalism']
guns_keywords = ['shooting', 'shootings', 'school shooting', 'school shootings', 'firearms', 'firearm', 'gun', 'gun control', 'guns', 'nra', 'second amendment']
immigration_keywords = ['border control', 'mexico', 'visa', 'citizenship', 'asylum', 'deportation', 'refugee']
economy_keywords = ['budget', 'budget deficit', 'unemployed', 'inflation', 'interest rate',' federal reserve', 'market', 'employment']
health_care_keywords = ['medicaid', 'covid', 'obamacare', 'public health', 'insurance']
socioeconomic_keywords = ['rich', 'poor', 'income inequality', 'poverty',' wealth distribution']
abortion_keywords = ['pregnancy', 'unwanted pregnancy', 'roe', 'wade', 'abortion', 'pro-life', 'rape', 'incest', 'life of mother', 'religion']
climate_change_keywords = ['global warming', 'carbon', 'alternative energy', 'climate', 'methane', 'emissions','gas','greenhouse']

# Define for iteration
keywords = [isis_keywords, guns_keywords, immigration_keywords, economy_keywords, health_care_keywords, socioeconomic_keywords, abortion_keywords, climate_change_keywords]

# Pre-define empty df
left_df = pd.DataFrame(columns=['ChannelName', 'VideoId', 'VideoTitle', 'Ideology', 'VideoPublishedDate', 'VideoViews', 'CommentId', 'CommentTitle', 'CommentCreationTime', 'CommentLikes'])

# Loop through all left channels
for channel in channels_left:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        left_df = pd.concat([left_df,temp_df])

# Pre-define empty df
right_df = pd.DataFrame(columns=['ChannelName', 'VideoId', 'VideoTitle', 'Ideology', 'VideoPublishedDate', 'VideoViews', 'CommentId', 'CommentTitle', 'CommentCreationTime', 'CommentLikes'])
for channel in channels_right:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        right_df = pd.concat([right_df,temp_df])

(left_title_df, left_comment_df) = df_clean_process(left_df)
(right_title_df, right_comment_df) = df_clean_process(right_df)
# Loop through all right channels
for channel in channels_right:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        right_df = pd.concat([right_df,temp_df])

(left_title_df, left_comment_df) = df_clean_process(left_df)
(right_title_df, right_comment_df) = df_clean_process(right_df)