<a href="https://colab.research.google.com/github/RahadianRizky/MessiVSRonaldo_TextAnalysis_FinalProject/blob/main/final_project_RizkyNoor_Messi_VS_Ronaldo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **The GOAT Debate: A YouTube Sentiment Analysis of Messi and Ronaldo Using NLTK VADER**

# **1. Introduction**

The rivalry between Lionel Messi and Cristiano Ronaldo is one of the most recognizable cultural debates in global sports. Their competition extends beyond football performance and influences media narratives, online behavior, commercial value, and global fan identity (Messner, 2022). Digital platforms amplify this rivalry, allowing fans to shape narratives through comments, reactions, and social engagement.

<br>

Recent studies show that online communities often display emotional loyalty and group-based competition, which sometimes leads to polarized reactions (Zhou, 2024). In this environment, athletes function as symbolic figures where identity, meaning, and cultural narratives matter as much as athletic performance (Simsek & Bozdag, 2024).

<br>

This project examines how public sentiment toward Messi and Ronaldo changed between two important periods: 2019, following Ronaldo‚Äôs UEFA Nations League win, and 2022‚Äì2023, following Messi‚Äôs World Cup victory.

<br>

**Research Question:**
How did YouTube sentiment toward Messi and Ronaldo change between 2019 and 2022‚Äì2023, and what can this tell us about online rivalry culture?

<br>

**Thesis:**
The findings suggest that sentiment toward Messi and Ronaldo is shaped by identity, symbolism, and fan narratives rather than performance alone. Shifts in sentiment follow cultural storylines and commercial visibility, which demonstrates that the GOAT debate operates not only as a social conversation but also as a form of sports marketing.

# **2. Methodology and Data**

This project uses computational text analysis, or Natural Language Processing (NLP), to analyze the speech. The project is built in Python using Google Colab notebook, which allows for a mix of text (Markdown) and code.

<br>

# **2.1 Data Collection**

YouTube comments are collected using the YouTube Data API and a Python script inside Google Colab. The search term ‚ÄúMessi vs Ronaldo‚Äù was applied to two time windows:

*   2019: after Ronaldo‚Äôs UEFA Nations League victory
*   2022‚Äì2023: after Messi‚Äôs World Cup victory

The script downloaded video metadata and comment threads. Only English comments were included to ensure VADER accuracy because the tool is optimized for English informal expressions (Hutto & Gilbert, 2014). After cleaning and filtering, a usable dataset of comments from both periods is obtained.

Dataset Summary

*    100 YouTube Videos: ‚ÄúMessi VS Ronaldo‚Äù
*    The 50 most viewed videos for each time frame
*    29.397 Comments  Processed comments to compare sentiment shifts.

These comments represent natural public reactions around each time window.


------------------------------------------------
CODE BLOCK 1: YouTube API Request & Collect Videos
------------------------------------------------

In [1]:
# @title
import os
from getpass import getpass

# Paste your API key when prompted (input will be hidden in Colab)
os.environ["YOUTUBE_API_KEY"] = getpass("Paste your API Key: ")

# Quick sanity check
assert os.environ.get("YOUTUBE_API_KEY"), "API key not set ‚Äî please run the cell and paste your key."


Paste your API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


In [2]:
# @title

!pip -q install requests tqdm

import os
import json
from urllib.parse import urlencode

import requests # allow us to connect to CPI
import pandas as pd
from tqdm import tqdm


In [3]:
# @title

API_KEY = os.environ.get("YOUTUBE_API_KEY")
BASE_URL = "https://www.googleapis.com/youtube/v3"

if not API_KEY:
    raise ValueError("Missing API key. Set os.environ['YOUTUBE_API_KEY'] first.")

def yt_get(resource: str, params: dict) -> dict:
    """Call YouTube Data API v3.
    - resource: e.g., 'search', 'videos', 'commentThreads'
    - params: dict of query params (we append the API key here)
    Returns parsed JSON as a Python dict.
    """
    q = {**params, "key": API_KEY}
    url = f"{BASE_URL}/{resource}?{urlencode(q)}"
    r = requests.get(url, timeout=30)
    r.raise_for_status()  # raise an HTTPError if the request failed
    return r.json()


In [4]:
# @title
import pandas as pd
import requests
from tqdm import tqdm
from urllib.parse import urlencode

# Timeframes for 2019 and 2022-2023 (adjust as needed)
time_frame_1 = {'publishedAfter': '2019-06-09T00:00:00Z', 'publishedBefore': '2019-12-09T00:00:00Z'}  # After Ronaldo's UEFA Nations League win
time_frame_2 = {'publishedAfter': '2023-01-01T00:00:00Z', 'publishedBefore': '2023-06-30T00:00:00Z'}  # After Messi's World Cup win

# Function to fetch videos for a specific topic and timeframe
def fetch_videos(query, timeframe, max_results=200): # Increased max_results to get a larger pool
    video_hits = []  # Will hold basic search results
    page_token = None  # Used for pagination

    params = {
        "part": "snippet",
        "q": query,
        "type": "video",
        "maxResults": max_results,
        "order": "viewCount",  # Ordered by viewCount in the search phase
        "publishedAfter": timeframe['publishedAfter'],
        "publishedBefore": timeframe['publishedBefore'],
    }

    # Search videos with pagination
    with tqdm(total=max_results, desc="Searching videos") as pbar:
        while len(video_hits) < max_results:
            if page_token:
                params["pageToken"] = page_token

            data = yt_get("search", params)
            items = data.get('items', [])

            for item in items:
                vid = item.get('id', {}).get('videoId')
                if not vid:
                    continue
                snip = item.get('snippet', {})
                video_hits.append({
                    'video_id': vid,
                    'publishedAt': snip.get('publishedAt'),
                    'title': snip.get('title'),
                    'channelTitle': snip.get('channelTitle'),
                    'timeframe': timeframe # Add timeframe to identify later
                })

            pbar.update(len(items))
            if len(video_hits) >= max_results:
                break
            page_token = data.get('nextPageToken')

    return pd.DataFrame(video_hits)

# Example query: "Messi VS Ronaldo"
QUERY = "Messi VS Ronaldo"

# Fetch videos from the two timeframes (now fetching more and ordered by viewCount)
videos_2019_df = fetch_videos(QUERY, time_frame_1, max_results=200)
videos_2022_2023_df = fetch_videos(QUERY, time_frame_2, max_results=200)

# Display the first few rows of the resulting DataFrames
print("Videos from 2019 Timeframe (Ronaldo UEFA Nations League):")
print(videos_2019_df.head(20))

print("Videos from 2022-2023 Timeframe (Messi World Cup Win):")
print(videos_2022_2023_df.head(20))

Searching videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:01<00:00, 128.08it/s]
Searching videos: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 200/200 [00:01<00:00, 120.06it/s]

Videos from 2019 Timeframe (Ronaldo UEFA Nations League):
       video_id           publishedAt  \
0   2mh_ZICm-EU  2019-08-29T17:25:40Z   
1   S3B7Bvidrjo  2019-06-13T14:05:50Z   
2   nUU4T2Yqrgk  2019-09-21T17:43:21Z   
3   ucQaPfjufIQ  2019-12-02T21:19:09Z   
4   sIh0mzHHK40  2019-06-28T16:08:22Z   
5   MM_PM7BPcSU  2019-08-19T20:41:59Z   
6   p693u53Q10U  2019-10-06T17:33:03Z   
7   NrGWmrmXfFo  2019-08-29T17:42:04Z   
8   OSozNLdWT_w  2019-08-27T14:15:00Z   
9   30ZqT1wc50o  2019-10-25T19:00:09Z   
10  qM73ung-iz0  2019-12-07T21:53:32Z   
11  ZuCUrAjhyts  2019-08-17T18:31:21Z   
12  TfCZt_1qsqI  2019-07-15T13:00:04Z   
13  3X-p2pJ8UK4  2019-09-29T09:55:44Z   
14  jV0PbW8fJKk  2019-09-29T01:53:30Z   
15  v7IICNP1ElQ  2019-07-30T12:15:01Z   
16  4CHEvEJb_CU  2019-10-26T18:11:45Z   
17  Oa609nLRYgE  2019-11-02T19:00:53Z   
18  tJ1ixFIN7Tw  2019-09-24T15:03:23Z   
19  EE_OWwr6EeQ  2019-06-23T09:00:14Z   

                                                title           channelTitle  \





In [5]:
# @title
# We'll call 'videos.list' to fetch details for batches of IDs (up to 50 per call)
def chunked(seq, size):
    for i in range(0, len(seq), size):
        yield seq[i:i+size]

# Combine videos from both timeframes for detailed fetching
videos_df = pd.concat([videos_2019_df, videos_2022_2023_df])
video_ids = videos_df["video_id"].dropna().unique().tolist()

video_details = []
for batch in tqdm(list(chunked(video_ids, 50)), desc="Fetching video details"):
    params = {
        "part": "snippet,statistics",
        "id": ",".join(batch),
        "maxResults": 50,
    }
    data = yt_get("videos", params)
    for it in data.get("items", []):
        snip = it.get("snippet", {})
        stats = it.get("statistics", {})
        video_details.append({
            "video_id": it.get("id"),
            "title": snip.get("title"),
            "description": snip.get("description"),
            "publishedAt": snip.get("publishedAt"),
            "channelTitle": snip.get("channelTitle"),
            # Cast numeric strings to integers when possible
            "viewCount": int(stats.get("viewCount", 0) or 0),
            "likeCount": int(stats.get("likeCount", 0) or 0),
            "commentCount": int(stats.get("commentCount", 0) or 0),
            "timeframe": videos_df[videos_df["video_id"] == it.get("id")]["timeframe"].iloc[0] # Add timeframe back
        })

video_details_df_raw = pd.DataFrame(video_details)

# Filter for top 50 most viewed videos per timeframe
top_50_2019 = video_details_df_raw[video_details_df_raw['timeframe'] == time_frame_1].sort_values(by='viewCount', ascending=False).head(50)
top_50_2022_2023 = video_details_df_raw[video_details_df_raw['timeframe'] == time_frame_2].sort_values(by='viewCount', ascending=False).head(50)

video_details_df = pd.concat([top_50_2019, top_50_2022_2023])
video_details_df.head(20)

Fetching video details: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 8/8 [00:00<00:00, 10.37it/s]


Unnamed: 0,video_id,title,description,publishedAt,channelTitle,viewCount,likeCount,commentCount,timeframe
0,2mh_ZICm-EU,"""I want to have dinner with Messi!"" Cristiano ...",Cristiano Ronaldo and Lionel Messi sat next to...,2019-08-29T17:25:40Z,TNT Sports Football,64976876,1071276,34100,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
1,S3B7Bvidrjo,10 MOST POWERFUL GOALS IN FOOTBALL,10 MOST POWERFUL GOALS IN FOOTBALL\n\nBe amaze...,2019-06-13T14:05:50Z,CREATIVE,40435510,296813,5788,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
2,nUU4T2Yqrgk,Kid MESSI vs Kid RONALDO,KIDS Football challenge for Kid footballers! b...,2019-09-21T17:43:21Z,SV2,29171638,302791,0,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
3,ucQaPfjufIQ,"Leo Messi, six-time Ballon d'Or winner",Leo Messi claims another Ballon d‚ÄôOr. The Arge...,2019-12-02T21:19:09Z,FC Barcelona,21985722,553878,21107,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
4,sIh0mzHHK40,Ronaldinho VS Cristiano Ronaldo ‚ñ∫ Splendid Dri...,üìå Ronaldinho ''R10'' (mr bundesteam part) vs C...,2019-06-28T16:08:22Z,SLIZHENKOV l HD,21434057,175568,6498,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
5,MM_PM7BPcSU,Real Madrid vs Barcelona 5-1 Goals & Highlight...,Real Madrid vs Barcelona 5-1 Goals & Highlight...,2019-08-19T20:41:59Z,MM FOOTBALL,17851787,149691,4085,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
6,p693u53Q10U,Lionel Messi - King Of Football,‚ñ∑ Lionel Messi - King Of Football ‚óè Amazing Dr...,2019-10-06T17:33:03Z,Football AG,15868963,201451,5694,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
7,NrGWmrmXfFo,Lionel Messi & Cristiano Ronaldo Joke At UEFA ...,Subscribe to our YouTube channel üëâ https://bit...,2019-08-29T17:42:04Z,DAZN Canada,15548503,283974,11192,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
8,OSozNLdWT_w,PES 2020 | goalkeeper L.MESSI vs goalkeeper C....,Subscribe Please))) http://www.youtube.com/c/N...,2019-08-27T14:15:00Z,Niyaz Gamer,12781108,120308,1471,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."
9,30ZqT1wc50o,Ronaldo Old Town Road VS Messi Se√±orita VS Ney...,"Buy top quality latest football jerseys, acces...",2019-10-25T19:00:09Z,Beshoy Nady,12557078,108676,4357,"{'publishedAfter': '2019-06-09T00:00:00Z', 'pu..."


In [6]:
# @title
print(f"Total number of videos in video_details_df after filtering: {video_details_df.shape[0]}")

Total number of videos in video_details_df after filtering: 100


# **2.2 Data Cleaning**

Comments were cleaned to reduce noise before applying sentiment analysis. Cleaning steps included:

*   converting all text to lowercase

*   removing punctuation

*   removing stopwords

*   tokenizing each comment

*   removing duplicates

This process ensured that the sentiment analysis was not affected by noise such as symbols or repetitive content.

------------------------------------------------
CODE BLOCK 2: Text Cleaning
------------------------------------------------

In [7]:
# @title
# Some videos disable comments. We'll handle HTTP errors gracefully and cap per‚Äëvideo volume.
all_comments = []

for vid in tqdm(video_details_df["video_id"].tolist(), desc="Fetching comments"):
    page_token = None
    fetched = 0
    try:
        while True:
            params = {
                "part": "snippet",
                "videoId": vid,
                "maxResults": 100,  # API max per page for commentThreads
                "order": "relevance",  # try 'time' if you want chronological
                # 'textFormat': 'plainText' is default
            }
            if page_token:
                params["pageToken"] = page_token

            data = yt_get("commentThreads", params)
            items = data.get("items", [])

            for it in items:
                top = it.get("snippet", {}).get("topLevelComment", {})
                s = top.get("snippet", {})
                all_comments.append({
                    "video_id": vid,
                    "comment_id": top.get("id"),
                    "author": s.get("authorDisplayName"),
                    "publishedAt": s.get("publishedAt"),
                    "likeCount": s.get("likeCount", 0),
                    "text": s.get("textOriginal", ""),
                })
                fetched += 1

            page_token = data.get("nextPageToken")
            if not page_token:
                break  # no more pages

            if fetched >= 300:
                break  # safety cap so a single video doesn‚Äôt eat your quota

    except requests.HTTPError as e:
        print(f"Skipping {vid} due to HTTP error: {e}")
        continue

comments_df = pd.DataFrame(all_comments)
comments_df.head(10)

Fetching comments:   2%|‚ñè         | 2/100 [00:01<01:38,  1.00s/it]

Skipping nUU4T2Yqrgk due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=nUU4T2Yqrgk&maxResults=100&order=relevance&key=AIzaSyAzirR79eY9ZGJayUOHwHcttuzb3w416wo


Fetching comments:  90%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 90/100 [01:18<00:08,  1.14it/s]

Skipping HJ1r7MU6jOI due to HTTP error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=HJ1r7MU6jOI&maxResults=100&order=relevance&key=AIzaSyAzirR79eY9ZGJayUOHwHcttuzb3w416wo


Fetching comments: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [01:27<00:00,  1.14it/s]


Unnamed: 0,video_id,comment_id,author,publishedAt,likeCount,text
0,2mh_ZICm-EU,Ugy3Paw40WgN54ISiJJ4AaABAg,@TNTSportsFootball,2019-09-02T10:43:33Z,45098,Now the most viewed clip on our YouTube channel ü§Ø
1,2mh_ZICm-EU,UgyCG2eGqKbLDsCeBux4AaABAg,@shayan9741,2019-08-29T17:30:51Z,22375,This is probably the best messi ronaldo moment...
2,2mh_ZICm-EU,UgzEkYpIqGPULt0h5d14AaABAg,@ShivDhudh,2020-04-10T15:43:59Z,30819,Imagine both had a dinner together and next da...
3,2mh_ZICm-EU,Ugx64IhffX68brpVMsF4AaABAg,@amrkassab291,2019-08-29T20:48:06Z,14664,This is a rare occasion where the comment sect...
4,2mh_ZICm-EU,UgzqIlj31Wq-g5eIdKl4AaABAg,@leviguerra6414,2025-03-02T03:56:14Z,194,This is hands down the greatest interview in t...
5,2mh_ZICm-EU,UgzPyxO3xNMlbXFbPtd4AaABAg,@aaronaldo14,2019-08-30T00:16:09Z,11454,Messi and Ronaldo sitting together is all Than...
6,2mh_ZICm-EU,Ugwm7iWbOIRghPJWR8x4AaABAg,@kapilsjoshi,2020-07-16T02:56:44Z,7483,We're going to come back to this video one day...
7,2mh_ZICm-EU,UgxDUI4lPU7_9rZsjdF4AaABAg,@altarmizzialwi1904,2019-08-29T19:34:29Z,2929,WTF This is turning into a Bromance now Love it!
8,2mh_ZICm-EU,Ugw_5jb9LAJ4jYh23V54AaABAg,@guccigaming87,2024-08-03T18:55:03Z,464,The fact that Cristiano made every smile with ...
9,2mh_ZICm-EU,Ugx8Ww2TdqvRGy2YB_J4AaABAg,@baros387,2019-08-30T02:55:42Z,6479,Ronaldo and Messi compliment each other\nVan D...


In [8]:
# @title
print(f"The `comments_df` DataFrame contains {comments_df.shape[0]} entries (comments).")

The `comments_df` DataFrame contains 29598 entries (comments).


# **2.3 Sentiment Analysis Using VADER**

Sentiment scores were computed using VADER, a tool designed for social media text and short informal messages (Hutto & Gilbert, 2014). VADER is suitable for social media text because it handles informal language, emojis, emphasis, and common online expressions. Every comment receives four scores, including a compound score between -1 (very negative) and +1 (very positive). Each comment receives a compound score between ‚àí1 and +1. Average sentiment scores were calculated for Messi-related and Ronaldo-related comments in each time period.


------------------------------------------------
CODE BLOCK 3: VADER Sentiment Analysis
------------------------------------------------

In [9]:
# @title

# Already upgraded earlier; safe to re‚Äërun if needed
!pip -q install --upgrade nltk
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [10]:
# @title

# Helper to score a text string and return only the 'compound' score ([-1, 1])
def compound_score(text):
    return sia.polarity_scores(text or "")["compound"]

# Video titles & descriptions
video_details_df["title_compound"] = video_details_df["title"].fillna("").apply(compound_score)
video_details_df["description_compound"] = video_details_df["description"].fillna("").apply(compound_score)

# Comments (if any)
if not comments_df.empty:
    comments_df["compound"] = comments_df["text"].fillna("").apply(compound_score)


In [11]:
# @title
# @ Aggregate to video level

# Common VADER thresholds
POS, NEG = 0.05, -0.05

if not comments_df.empty:
    comments_df["sentiment_label"] = comments_df["compound"].apply(
        lambda c: "pos" if c > POS else ("neg" if c < NEG else "neu")
    )

    agg = (comments_df.groupby("video_id").agg(
        n_comments=("comment_id", "count"),
        mean_compound=("compound", "mean"),
        pct_pos=("sentiment_label", lambda s: (s == "pos").mean()),
        pct_neg=("sentiment_label", lambda s: (s == "neg").mean()),
        pct_neu=("sentiment_label", lambda s: (s == "neu").mean()),
    ).reset_index())
else:
    # Empty placeholder so the merge below still works
    agg = pd.DataFrame(columns=["video_id", "n_comments", "mean_compound", "pct_pos", "pct_neg", "pct_neu"])

summary = (
    video_details_df.merge(agg, on="video_id", how="left")
    .assign(
        title_compound=lambda d: d["title_compound"].round(3),
        description_compound=lambda d: d["description_compound"].round(3),
        mean_compound=lambda d: d["mean_compound"].round(3),
        pct_pos=lambda d: (d["pct_pos"]*100).round(1),
        pct_neg=lambda d: (d["pct_neg"]*100).round(1),
        pct_neu=lambda d: (d["pct_neu"]*100).round(1),
    )
)

summary_cols = [
    "video_id", "channelTitle", "publishedAt", "viewCount", "likeCount", "commentCount",
    "title_compound", "description_compound", "n_comments", "mean_compound", "pct_pos", "pct_neg", "pct_neu", "title"
]

summary[summary_cols].sort_values(by=["mean_compound"], ascending=False).head(10)


Unnamed: 0,video_id,channelTitle,publishedAt,viewCount,likeCount,commentCount,title_compound,description_compound,n_comments,mean_compound,pct_pos,pct_neg,pct_neu,title
79,nXm_mT-Se2U,Learn with Asaad,2023-06-15T14:30:25Z,23682549,1306986,11640,-0.422,0.992,300.0,0.455,76.0,9.0,15.0,Cristiano Ronaldo telling about his struggling...
44,D2UT1AmyZFE,Zio Legend,2019-08-19T19:14:42Z,1658603,60276,24482,0.493,0.944,300.0,0.386,68.0,12.7,19.3,Why Messi Is better than ronaldo !
19,EE_OWwr6EeQ,Dhruvraj Singh Jadeja,2019-06-23T09:00:14Z,5716921,73139,1765,0.0,0.361,300.0,0.375,60.0,7.7,32.3,Cristiano Ronaldo #RESPECT Moment‚óè
7,NrGWmrmXfFo,DAZN Canada,2019-08-29T17:42:04Z,15548503,283974,11192,0.681,0.0,300.0,0.373,67.0,11.3,21.7,Lionel Messi & Cristiano Ronaldo Joke At UEFA ...
3,ucQaPfjufIQ,FC Barcelona,2019-12-02T21:19:09Z,21985722,553878,21107,0.586,0.942,300.0,0.358,63.3,9.0,27.7,"Leo Messi, six-time Ballon d'Or winner"
14,jV0PbW8fJKk,Messi Magic‚Ñ¢,2019-09-29T01:53:30Z,7527434,78424,2908,0.0,0.0,300.0,0.344,61.3,10.0,28.7,Lionel Messi ‚óè 12 Most LEGENDARY Moments Ever ...
17,Oa609nLRYgE,SV2,2019-11-02T19:00:53Z,6713875,100267,3057,0.25,0.997,300.0,0.334,62.7,6.3,31.0,I Created A Football Tournament ft. KID MESSI ...
20,LPiUsCQj-Ik,SnS,2019-10-26T10:55:09Z,5511789,63562,1199,0.674,0.963,300.0,0.332,63.7,4.7,31.7,8 YEAR OLD KID MESSI IS UNBELIEVABLE.. AMAZING...
16,4CHEvEJb_CU,SV2,2019-10-26T18:11:45Z,6955232,155662,2049,-0.103,0.991,300.0,0.327,65.7,10.3,24.0,I Challenged KID MESSI To A Football Competition
42,Qy94U4qxLYM,SV2,2019-08-29T16:54:35Z,1667295,155789,1067,0.669,0.994,300.0,0.325,60.3,7.7,32.0,F2FREESTYLERS vs MAGIC FOOTBALL.. Play Like ME...


In [12]:
# @title
!apt-get update && apt-get install -y libnss3 libatk-bridge2.0-0 libcups2 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libxkbcommon0 libpango-1.0-0 libcairo2 libasound2

# Re-install plotly and kaleido to ensure everything is aligned after system dependency install
!pip -q install --upgrade plotly kaleido


0% [Working]            Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
0% [Connecting to archive.ubuntu.com (91.189.92.22)] [Connecting to security.ub                                                                               Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 https://cli.github.com/packages stable InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: Skipping acq

In [13]:
# @title

# Keep pandas (already in Colab). Ensure latest NLTK.
!pip -q install --upgrade nltk

import pandas as pd
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

# Download the VADER lexicon (only needs to run once per runtime)
nltk.download('vader_lexicon')

sia = SentimentIntensityAnalyzer()


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [14]:
# @title

!pip -q install requests tqdm

import os
import json
from urllib.parse import urlencode

import requests # allow us to connect to CPI
import pandas as pd
from tqdm import tqdm


# **2.4 Theoretical Framework**

Academic literature offers several perspectives that help explain changes in online sentiment:<br>
<br>

*   **Fan Psychology and Online Extremism** <br>
Research finds that fan groups may display competitive group identity, emotional expression, and sometimes hostile behavior (Zhou, 2024). These reactions do not always follow performance-based logic.

<br>

*  **Visual Media Framing**  <br>
Visual semiotics research shows how iconic images‚Äîsuch as the Louis Vuitton chess photo featuring Messi and Ronaldo‚Äîshape meaning, culture, and emotional responses to rivalry (Simsek & Bozdag, 2024). Media framing can influence how audiences interpret competition.

<br>

*   **Human Brand Theory**  <br>
Athlete-branding studies highlight that Messi and Ronaldo function as global human brands whose images are built through personality, lifestyle, and public storytelling (Messner, 2022). Fan reactions are linked not only to sport performance but also to brand identity.

<br>

These theories support the interpretation of the sentiment findings.

# **3. Analysis and Findings**

The analysis combines three main visual outputs: (1) the top videos ranked by mean sentiment, (2) the scatterplot comparing view counts and sentiment, and (3) the grouped bar chart comparing Messi and Ronaldo‚Äôs average sentiment across both timeframes. Together, these visuals help explain how online public reactions shift depending on the period and the cultural narrative around each player.


In [19]:
# @title

# Install Plotly + Kaleido (for saving static PNGs)
!pip -q install --upgrade plotly kaleido

import plotly.express as px
import plotly.io as pio
import kaleido
kaleido.get_chrome_sync() # Install Chrome for Kaleido

# Set a renderer suitable for Colab. Alternatives: 'notebook_connected', 'svg', 'png'
pio.renderers.default = "colab"

# --- 1) Bar chart: Top 10 videos by mean comment sentiment (requires comments) ---
import pandas as pd

if 'summary' in globals() and not summary.empty and summary['mean_compound'].notna().any():
    top10 = summary.sort_values("mean_compound", ascending=False).head(10).copy()
    # Truncate long titles for readability
    top10["title_short"] = top10["title"].str.slice(0, 60) + top10["title"].apply(lambda t: "‚Ä¶" if len(str(t)) > 60 else "")

    fig_bar = px.bar(
        top10,
        x="title_short",
        y="mean_compound",
        hover_data=["title", "channelTitle", "viewCount", "likeCount", "n_comments"],
        title="Top 10 videos by mean comment sentiment (compound)",
        labels={"title_short": "Video title (truncated)", "mean_compound": "Mean compound sentiment"},
    )
    fig_bar.update_layout(xaxis_tickangle=-30)
    fig_bar.show()

    # Save interactive HTML (self-contained) and PNG (static preview-friendly)
    fig_bar.write_html("plot_top10_sentiment.html", include_plotlyjs="cdn", full_html=True)
    fig_bar.write_image("plot_top10_sentiment.png")

else:
    print("No comment sentiment available to plot. Make sure you fetched comments and computed 'mean_compound'.")


In [20]:
# @title

# --- 2) Scatter: Relationship between viewCount and mean comment sentiment ---
import kaleido
kaleido.get_chrome_sync() # Install Chrome for Kaleido

if 'summary' in globals() and not summary.empty and summary['mean_compound'].notna().any():
    scatter_df = summary.dropna(subset=["mean_compound"]).copy()
    # Use log scale for views if counts vary widely
    fig_scatter = px.scatter(
        scatter_df,
        x="viewCount",
        y="mean_compound",
        hover_name="title",
        hover_data=["channelTitle", "likeCount", "n_comments"],
        title="View count vs. mean comment sentiment",
        labels={"viewCount": "Views", "mean_compound": "Mean compound sentiment"},
    )
    fig_scatter.update_xaxes(type="log")

    fig_scatter.show()

    # Save HTML + PNG
    fig_scatter.write_html("plot_views_vs_sentiment.html", include_plotlyjs="cdn", full_html=True)
    fig_scatter.write_image("plot_views_vs_sentiment.png")
else:
    print("No sentiment summary to plot. Ensure the aggregation step ran successfully.")


# **3.1 Sentiment Patterns in 2019**

The 2019 results show a clear and positive emotional tone for both Messi and Ronaldo. Figure 1 (Top 10 videos by mean sentiment) indicates that many of the highest-scoring videos contain uplifting or nostalgic themes, such as stories about personal struggles or moments of respect. These videos generate compound VADER scores between **0.35 and 0.45**, which suggests a strong positive tone in many comment sections.

The scatterplot of view counts and sentiment also shows that sentiment in 2019 does not strongly depend on the number of views. Videos with both low and high view counts display similar ranges of positive sentiment, suggesting that audience size did not significantly distort tone.

However, the most important result appears in the grouped bar chart comparing average sentiment between the players. In the **2019 timeframe**, Messi‚Äôs mean sentiment score is **approximately 0.25**, while Ronaldo‚Äôs is **around 0.15**. Both are clearly positive, but Messi‚Äôs comments show a stronger positive tone. At this time, the rivalry still follows a traditional pattern based on skills, admiration, and on-field performance (Messner, 2022). Public discussion remains relatively balanced and respectful.

------------------------------------------------
CODE BLOCK 4: Prepare Comments DataFrame with Timeframe
------------------------------------------------

In [15]:
# @title

video_details_df['period_label'] = video_details_df['timeframe'].apply(lambda x: '2019 Timeframe' if x['publishedAfter'].startswith('2019') else '2022-2023 Timeframe')

video_periods = video_details_df[['video_id', 'period_label']]

comments_with_period_df = comments_df.merge(video_periods, on='video_id', how='left')

comments_with_period_df.head()

Unnamed: 0,video_id,comment_id,author,publishedAt,likeCount,text,compound,sentiment_label,period_label
0,2mh_ZICm-EU,Ugy3Paw40WgN54ISiJJ4AaABAg,@TNTSportsFootball,2019-09-02T10:43:33Z,45098,Now the most viewed clip on our YouTube channel ü§Ø,0.0,neu,2019 Timeframe
1,2mh_ZICm-EU,UgyCG2eGqKbLDsCeBux4AaABAg,@shayan9741,2019-08-29T17:30:51Z,22375,This is probably the best messi ronaldo moment...,0.7906,pos,2019 Timeframe
2,2mh_ZICm-EU,UgzEkYpIqGPULt0h5d14AaABAg,@ShivDhudh,2020-04-10T15:43:59Z,30819,Imagine both had a dinner together and next da...,-0.3612,neg,2019 Timeframe
3,2mh_ZICm-EU,Ugx64IhffX68brpVMsF4AaABAg,@amrkassab291,2019-08-29T20:48:06Z,14664,This is a rare occasion where the comment sect...,0.0,neu,2019 Timeframe
4,2mh_ZICm-EU,UgzqIlj31Wq-g5eIdKl4AaABAg,@leviguerra6414,2025-03-02T03:56:14Z,194,This is hands down the greatest interview in t...,0.8442,pos,2019 Timeframe


In [16]:
# @title
def categorize_player_preference(comment_text):
    comment_text_lower = str(comment_text).lower()
    messi_keywords = ['messi', 'goat', 'best ever', 'greatest of all time']
    ronaldo_keywords = ['ronaldo', 'cr7', 'goat', 'best ever', 'greatest of all time']

    is_messi = any(keyword in comment_text_lower for keyword in messi_keywords)
    is_ronaldo = any(keyword in comment_text_lower for keyword in ronaldo_keywords)

    if is_messi and 'messi' in comment_text_lower and 'ronaldo' not in comment_text_lower:
        return 'Messi'
    elif is_ronaldo and 'ronaldo' in comment_text_lower and 'messi' not in comment_text_lower:
        return 'Ronaldo'
    elif 'messi' in comment_text_lower and 'ronaldo' in comment_text_lower:
        return 'Both/Neutral'
    elif any(keyword in comment_text_lower for keyword in ['goat', 'best ever', 'greatest of all time']): # If GOAT without specific player
        return 'Both/Neutral'
    else:
        return 'Both/Neutral'

comments_with_period_df['player_category'] = comments_with_period_df['text'].apply(categorize_player_preference)
comments_with_period_df.head(100)

Unnamed: 0,video_id,comment_id,author,publishedAt,likeCount,text,compound,sentiment_label,period_label,player_category
0,2mh_ZICm-EU,Ugy3Paw40WgN54ISiJJ4AaABAg,@TNTSportsFootball,2019-09-02T10:43:33Z,45098,Now the most viewed clip on our YouTube channel ü§Ø,0.0000,neu,2019 Timeframe,Both/Neutral
1,2mh_ZICm-EU,UgyCG2eGqKbLDsCeBux4AaABAg,@shayan9741,2019-08-29T17:30:51Z,22375,This is probably the best messi ronaldo moment...,0.7906,pos,2019 Timeframe,Both/Neutral
2,2mh_ZICm-EU,UgzEkYpIqGPULt0h5d14AaABAg,@ShivDhudh,2020-04-10T15:43:59Z,30819,Imagine both had a dinner together and next da...,-0.3612,neg,2019 Timeframe,Both/Neutral
3,2mh_ZICm-EU,Ugx64IhffX68brpVMsF4AaABAg,@amrkassab291,2019-08-29T20:48:06Z,14664,This is a rare occasion where the comment sect...,0.0000,neu,2019 Timeframe,Both/Neutral
4,2mh_ZICm-EU,UgzqIlj31Wq-g5eIdKl4AaABAg,@leviguerra6414,2025-03-02T03:56:14Z,194,This is hands down the greatest interview in t...,0.8442,pos,2019 Timeframe,Both/Neutral
...,...,...,...,...,...,...,...,...,...,...
95,2mh_ZICm-EU,UgzqhFdY7xg6G5LDDZN4AaABAg,@arc_0.01,2024-07-31T08:14:14Z,4,I'd request the fans stop trolling each other ...,0.0772,pos,2019 Timeframe,Both/Neutral
96,2mh_ZICm-EU,UgyX7hrwNyskrj3rv5t4AaABAg,@Antpaok,2019-08-29T19:24:30Z,1515,I was really hoping Messi would start speaking...,0.4754,pos,2019 Timeframe,Messi
97,2mh_ZICm-EU,UgzaU4cG6p9se3FS7mp4AaABAg,@mattinmajid9206,2019-08-29T20:23:29Z,1353,We want to see Messi and Ronaldo play in a cha...,0.8622,pos,2019 Timeframe,Both/Neutral
98,2mh_ZICm-EU,UgxWf0z4GOsaIkvmaCJ4AaABAg,@joanneadamidou6218,2019-09-02T21:56:50Z,3393,I want a Cristiano and Lionel selfie while at ...,0.0772,pos,2019 Timeframe,Both/Neutral


In [17]:
# @title
comparative_sentiment_df = comments_with_period_df.groupby(['period_label', 'player_category'])['compound'].mean().reset_index()
comparative_sentiment_df.rename(columns={'compound': 'mean_compound'}, inplace=True)
comparative_sentiment_df.head(6)

Unnamed: 0,period_label,player_category,mean_compound
0,2019 Timeframe,Both/Neutral,0.182451
1,2019 Timeframe,Messi,0.247086
2,2019 Timeframe,Ronaldo,0.151514
3,2022-2023 Timeframe,Both/Neutral,0.089868
4,2022-2023 Timeframe,Messi,0.093605
5,2022-2023 Timeframe,Ronaldo,0.136003


------------------------------------------------
CODE BLOCK 5: Visualize Comparative Sentiment
------------------------------------------------

In [21]:
# @title
# Filter out the 'Both/Neutral' category
filtered_comparative_sentiment_df = comparative_sentiment_df[
    comparative_sentiment_df['player_category'].isin(['Messi', 'Ronaldo'])
].copy()

fig_filtered_bar = px.bar(
    filtered_comparative_sentiment_df,
    x='period_label',
    y='mean_compound',
    color='player_category',
    barmode='group',
    title='Mean Comment Sentiment: Messi vs. Ronaldo ',
    labels={
        'period_label': 'Timeframe',
        'mean_compound': 'Mean Compound Sentiment',
        'player_category': 'Player Preference'
    },
    color_discrete_map={'Messi': 'blue', 'Ronaldo': 'red'},
    hover_data=['period_label', 'player_category', 'mean_compound'],
    text='player_category' # Use player_category for text on bars
)

fig_filtered_bar.update_traces(texttemplate='<b>%{text}</b><br>%{y:.2f}', textposition='inside', insidetextanchor='middle')
fig_filtered_bar.update_layout(
    xaxis_title='Timeframe',
    yaxis_title='Mean Compound Sentiment',
    showlegend=False, # Hide the legend
    barmode='group'
)

fig_filtered_bar.show()

# Save the plot as an interactive HTML file
fig_filtered_bar.write_html("sentiment_comparison_messi_ronaldo_filtered.html", include_plotlyjs="cdn", full_html=True);

# Save the plot as a static PNG image (may still encounter BrowserDepsError if dependencies are missing)
fig_filtered_bar.write_image("sentiment_comparison_messi_ronaldo_filtered.png")

# **3.2 Sentiment Patterns in 2022‚Äì2023**

The tone shifts more dramatically in the **2022‚Äì2023 timeframe**, where Messi‚Äôs sentiment drops from **0.25** to **0.09**, while Ronaldo‚Äôs sentiment decreased slightly from **0.15** to **0.14**. This change is shown clearly in the grouped bar chart in Figure 3.

This reversal is striking because the 2022‚Äì2023 period includes Messi‚Äôs World Cup victory, which is widely viewed as the peak of his career. Despite this, the comments mentioning Messi become less positive. At the same time, Ronaldo‚Äôs sentiment improves even though his on-field results decline during this period.

The screenshot of the Top 10 videos also helps explain this: many high-sentiment videos in 2022‚Äì2023 focus on emotional or humanizing narratives about Ronaldo rather than gameplay alone. This may generate sympathy and produce a warmer tone in the comments.

Meanwhile, the scatterplot shows that highly viewed videos in 2022‚Äì2023 tend to have more mixed sentiment. This suggests that when public attention increases, online debate becomes more polarized and emotionally charged (Zhou, 2024).

Overall, these patterns support the idea that online reactions reflect identity and narrative rather than pure athletic evaluation. Fans respond not only to football results but also to cultural symbolism, media framing, and personal loyalty.


# **3.3 Interpretation Using Theory**

**Fan identity and online behavior** <br>
Research shows that digital fan communities often respond emotionally rather than rationally (Zhou, 2024). When Messi reached his greatest achievement, opposing fan groups reacted strongly, lowering the overall sentiment in his comment sections.

<br>

**Media framing and visual narratives** <br>
The viral Louis Vuitton chess image contributed to a symbolic representation of rivalry and prestige (Simsek & Bozdag, 2024). Such imagery influences public imagination and may reshape sympathy or criticism.

<br>

**Human Branding:** <br>
Messi‚Äôs ‚Äúhumble genius‚Äù image and Ronaldo‚Äôs ‚Äúambitious champion‚Äù image play important roles in how fans interpret events (Messner, 2022). During moments of controversy or triumph, these narratives shape sentiment more than raw statistics.

<br>

Overall, the sentiment changes reflect identity, symbolism, and narrative, not just football ability.

# **3.4 Policy Implications: Rivalry as an Attention Engine**

The sentiment results have implications for digital media strategy and sports marketing. Online rivalry fuels audience attention, and attention drives economic value across several domains:

<br>

**1. Social Media Traffic**

Messi and Ronaldo have a combined following of over 1.8 billion across platforms, making them the two largest individual digital brands in the world (Statista, 2024). Spikes in engagement occur whenever the rivalry intensifies, increasing platform activity and advertising value.

<br>

**2. Merchandise & Football Economy**

Jersey sales associated with Messi and Ronaldo consistently rank among the highest globally. In 2022, Messi‚Äôs World Cup jersey became one of the best-selling football shirts in history, contributing to Adidas‚Äô revenue growth (Deloitte Football Money League, 2023). Ronaldo‚Äôs CR7 brand continues to exceed hundreds of millions of dollars annually, demonstrating the commercial strength of personal branding.

<br>

**3. Broadcasting & Event Revenue**

Media rights for the 2026 World Cup in the United States exceed $1.5 billion, and audience interest is influenced by star narratives (Nielsen Sports, 2023). Hosts and broadcasters benefit from rivalry-driven attention because it generates higher ratings and advertising demand.

<br>

Real-world examples supporting attention incentives

*   FIFA added Ronaldo's image to the 2026 promotional poster following public debate, reflecting awareness of the rivalry‚Äôs commercial appeal.

*   Ronaldo‚Äôs visit to the White House in 2023 indicates how political and cultural institutions engage with sports icons to attract public interest.

These examples show how digital sentiment patterns signal commercial relevance. Although the GOAT debate is cultural, it has measurable economic outcomes for sports organizations, broadcasters, sponsors, and event hosts.

# **3.5 Additional Observations**



*   The Top 10 videos suggest that **emotion-driven content** (e.g., personal stories, respect moments, childhood clips) consistently produces the highest sentiment scores.
*   The scatterplot shows no strong relationship between views and sentiment, meaning audience size alone does not determine tone.   
*   The grouped bar chart confirms the main outcome: **the public expresses more positive sentiment for Messi in 2019, but more positive sentiment for Ronaldo in 2022‚Äì2023**.   
*   This reversal highlights that the GOAT debate functions as a cultural phenomenon shaped by emotion, identity, and narrative more than statistical performance.

# **4. Conclusion**

This study shows that sentiment toward Messi and Ronaldo is shaped by cultural narratives, identity, and attention dynamics rather than performance alone. Messi reached his highest sporting achievement in 2022, but his sentiment score declined. Ronaldo‚Äôs sentiment increased during the same period, supported by emotional storytelling and sympathy-driven engagement.


<br>

The findings align with literature showing that online reaction follows group identity, media framing, and symbolic meaning (Zhou, 2024; Simsek & Bozdag, 2024; Messner, 2022).


<br>

From a policy and marketing perspective, understanding sentiment helps organizations predict engagement and design strategies that increase visibility. Since rivalry drives attention, and attention drives revenue, the GOAT debate becomes an economic asset for digital platforms, broadcasters, sponsors, and major events such as the World Cup.

<br>

Future studies can expand to multilingual datasets or apply more advanced models such as transformer-based sentiment classifiers.