In [10]:
'''
Brett Mitchell

2/8/2024 was a big day for Dallas-Fort Worth sports fans - On this day, the Cowboys hired a new defensive coordinator, the
Rangers resigned one their all-star players, and the Mavericks traded for two key players right before the NBA trade 
deadline. As a result, I was curious to find out fan sentiment for all three teams. As a result, I scraped the most recent
Reddit data from all three teams and compared the results to see which set of fans are the most generally positive and
negative about their team.

I predict that each team will show a mostly positive sentiment about their team, with the Rangers showing the most. The
Rangers are coming off of their first World Series win and after months of waiting, finally re-signed arguably their best
playoff contributor. As for the Cowboys, I predict that their fans will show the most negative sentiment towards their team,
as they suffered a huge playoff loss and the new defensive coordinator hire may be a bit controversial.

As for the VADER model, I predict that it will show a very high level of accuracy in predicting sentiment of the web scraped
Reddit comments due to its high level of previous training data and its well-known regard for sentiment analysis.
'''

import praw
import pandas as pd

# Reddit client credentials
reddit = praw.Reddit(
    client_id="client_id",
    client_secret="client_secret",
    user_agent="brtm23_sentiment_analysis"
)

def sa_scraper(subreddits):
    subreddit_dfs = {}  # holds subreddit DataFrames
    
    for subreddit_name in subreddits:
        print(f"Collecting data from /r/{subreddit_name}")
        subreddit = reddit.subreddit(subreddit_name)
        comments_data = []  # collects comments
        
        for submission in subreddit.new(limit=330):
            submission.comments.replace_more(limit=0)  # Load comments
            for comment in submission.comments.list():
                # Collect comment data
                comments_data.append({'comment_body': comment.body})
        
        # Create DataFrame for the current subreddit
        subreddit_dfs[subreddit_name] = pd.DataFrame(comments_data)
    
    return subreddit_dfs

# Subreddits of interest with respective limits
subreddits = ['TexasRangers','cowboys','Mavericks']

# Collecting data
subreddit_dataframes = sa_scraper(subreddits)

# Accessing each DataFrame
rangers_comments_df = subreddit_dataframes['TexasRangers']
cowboys_comments_df = subreddit_dataframes['cowboys']
mavericks_comments_df = subreddit_dataframes['Mavericks']

Collecting data from /r/TexasRangers
Collecting data from /r/cowboys
Collecting data from /r/Mavericks


In [11]:
print(rangers_comments_df.head())

print(rangers_comments_df.shape)
print(cowboys_comments_df.shape)
print(mavericks_comments_df.shape)

                                        comment_body
0                         14 1/2 minutes of pure sex
1  14 minutes…I’m not going to wa- OH I REMEMBER ...
2                  Those Garcia HRs hit harder today
3  Crazy he is just 35. Older yes but to think he...
4  Man, it was fun watching this guy just throw c...
(9245, 1)
(29818, 1)
(21875, 1)


In [12]:
rangers_comments_df.to_csv("rangers_comments_sentiment_analysis", index=False)
cowboys_comments_df.to_csv("cowboys_comments_sentiment_analysis", index=False)
mavericks_comments_df.to_csv("mavericks_comments_sentiment_analysis", index=False)