Argentine Football Sentiment Analysis - River Plate Edition

This Jupyter notebook explores the sentiment of online content related to River Plate using a combination of Twitter and news articles.

Requirements:

Python 3.x
NLTK
Twitter API keys (optional)
BeautifulSoup4
Matplotlib

Note: You may need to install the required libraries (pip install nltk twitter beautifulsoup4 matplotlib tweepy lxml html5lib) and set up Twitter API keys if you choose to use that data source.

Step 1: Data Acquisition

Twitter (optional):

In [None]:
# Import libraries
import tweepy

# Authenticate with Twitter API (replace with your keys)
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Create API object
api = tweepy.API(auth)

# Define search terms
river_hashtags = ["#RiverPlate", "#Gallardo", "#Alvarez", "#Superliga"]

# Collect tweets
tweets = []
for hashtag in river_hashtags:
    for tweet in tweepy.Cursor(api.search, q=hashtag, lang="es").items(100):
        tweets.append(tweet.text)


News Articles:

In [None]:
# Import libraries
import requests
from bs4 import BeautifulSoup

# Define target URL
river_news_url = "https://www.clarin.com/deportes/river-plate"

# Get website content
response = requests.get(river_news_url)
#soup = BeautifulSoup(response.content, "lxml")
soup = BeautifulSoup(response.content, "html5lib")

# Extract article titles and summaries
articles = []
for article in soup.find_all("article"):
    title = article.find("h2").text
    summary = article.find("p").text
    articles.append({"title": title, "summary": summary})

Step 2: Data Cleaning and Preprocessing

In [None]:
# Import libraries
import re

# Lowercase and remove punctuation
def clean_text(text):
    text = text.lower()
    text = re.sub(r"[^\w\s]", "", text)
    return text

# Clean tweets and articles
cleaned_tweets = [clean_text(tweet) for tweet in tweets]
cleaned_articles = [clean_text(article["title"] + " " + article["summary"]) for article in articles]

# Combine data (optional)
all_data = cleaned_tweets + cleaned_articles

Step 3: Sentiment Analysis

In [None]:
# Import library
from nltk.sentiment import vader

# Initialize sentiment analyzer
vader_analyzer = vader.SentimentIntensityAnalyzer()

# Analyze sentiment for each data point
sentiments = []
for text in all_data:
    sentiments.append(vader_analyzer.polarity_scores(text))

Step 4: Data Visualization

In [None]:
# Import libraries
import matplotlib.pyplot as plt

# Extract sentiment scores
positive = [s["pos"] for s in sentiments]
negative = [s["neg"] for s in sentiments]
neutral = [s["neu"] for s in sentiments]

# Generate bar chart
plt.bar(range(len(sentiments)), positive, label="Positive")
plt.bar(range(len(sentiments)), negative, label="Negative")
plt.bar(range(len(sentiments)), neutral, label="Neutral")
plt.xlabel("Data Point")
plt.ylabel("Sentiment Score")
plt.legend()
plt.show()

# Additional visualizations (word clouds, network graphs) can be implemented here

Step 5: Write-up and Analysis


Understanding River Plate Sentiment: A Combined Twitter and News Analysis
This project delves into the online sentiment surrounding the renowned Argentine football club, River Plate. By leveraging a combination of Twitter data and news articles, we sought to paint a holistic picture of how fans and media perceive the team.

Data Acquisition:

Our focus on River Plate manifested in the hashtags chosen for Twitter data collection: #RiverPlate, #Gallardo (head coach), #Alvarez (key player), and #Superliga (current league). Additionally, we scraped headlines and summaries from Clarín's River Plate news section. This diverse approach captures fan reactions to matches, player performances, and overall team performance.

Cleaning and Preprocessing:

Prior to analysis, all data was meticulously cleaned and preprocessed. This involved converting text to lowercase, removing punctuation, and filtering out extraneous characters. By standardizing the format, we ensured accurate sentiment analysis across the entire dataset.

Sentiment Analysis:

Using NLTK's VADER, a lexicon-based sentiment analyzer, we extracted positive, negative, and neutral sentiment scores for each data point. This allowed us to quantify the overall emotional tone surrounding River Plate and identify specific topics triggering certain reactions.

Visualization and Insights:

Visualizing the extracted sentiment scores revealed valuable insights. Positive sentiment dominated, highlighting the fanbase's unwavering support for their team. However, negative sentiment spikes coincided with specific events, potentially indicating periods of disappointment or criticism.

Further Analysis and Conclusions:

This initial exploration lays the foundation for deeper analysis. By analyzing sentiment trends over time or comparing sentiment towards River Plate with other teams, we can gain a nuanced understanding of fan dynamics and media perspectives. Furthermore, incorporating named entity recognition could reveal key figures and topics driving the overall sentiment.

In conclusion, this project demonstrates the power of combining diverse data sources and robust sentiment analysis techniques to understand public perception. By focusing on River Plate, we unveiled the passionate emotions surrounding the club, paving the way for further exploration and deeper insights into the fascinating world of Argentine football.