# **Social Media Sentiment Analysis**

This notebook analyzes sentiment trends on Reddit to gauge market sentiment for a selected stock ticker over the past 365 days. We extract discussions from relevant financial subreddits and apply FinBERT-based sentiment analysis to determine the daily sentiment score.

The final output provides a daily aggregated sentiment score, indicating whether Reddit discussions suggest a bullish (positive) or bearish (negative) outlook for the stock.

# **Full Description & Details**

**Data Collection Process**

**Stock Ticker Selection**

Reads the Top 10 Ticker CSV to identify the most-discussed stock for analysis.
Matches the ticker with its company name using the NASDAQ-listed companies file.
Reddit Data Extraction

**Collects posts from relevant finance subreddits:**
r/wallstreetbets
r/stocks
r/investing
r/securityanalysis
r/stockmarket
Only posts containing the selected ticker (e.g., TSLA) are retrieved.
Filters out posts with fewer than 5 comments to ensure only active discussions are analyzed.
Fetches top comments from each post for additional sentiment insights.
Sentiment Analysis Using FinBERT

**Classifies comments into:**
Negative
Neutral
Positive
Calculates sentiment scores:
Sentiment Score = Positive Sentiment - Negative Sentiment
Aggregates daily average sentiment scores to track trends over time.


**Final Output**

Daily Sentiment Scores (Exported as CSV)
Negative Sentiment %
Neutral Sentiment %
Positive Sentiment %
Overall Sentiment Score (Daily average of positive - negative)
Ticker for reference


In [1]:
pip install asyncpraw

Collecting asyncpraw
  Downloading asyncpraw-7.8.1-py3-none-any.whl.metadata (9.0 kB)
Collecting aiofiles (from asyncpraw)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting aiosqlite<=0.17.0 (from asyncpraw)
  Downloading aiosqlite-0.17.0-py3-none-any.whl.metadata (4.1 kB)
Collecting asyncprawcore<3,>=2.4 (from asyncpraw)
  Downloading asyncprawcore-2.4.0-py3-none-any.whl.metadata (5.5 kB)
Collecting update_checker>=0.18 (from asyncpraw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading asyncpraw-7.8.1-py3-none-any.whl (196 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.4/196.4 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading aiosqlite-0.17.0-py3-none-any.whl (15 kB)
Downloading asyncprawcore-2.4.0-py3-none-any.whl (19 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Downloading aiofiles-24.1.0-py3-none-any.whl (15 kB)
Installing collected packages: aiosqlite, aiofiles, update

In [2]:
!pip install asyncpraw nest_asyncio



In [3]:
import asyncpraw
import nest_asyncio
import requests
import pandas as pd
from datetime import datetime
from time import sleep
from transformers import pipeline
from google.colab import drive
from datetime import datetime, timedelta

In [4]:
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


In [5]:
tickers_csv_path = "/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/Top Ten Tickers.csv"
df_tickers = pd.read_csv(tickers_csv_path)
selected_ticker = df_tickers.iloc[0, 0]

print(f"Selected Ticker: {selected_ticker}")

Selected Ticker: NVDA


In [6]:
def get_nasdaq_symbols():
    file_path = '/content/drive/My Drive/StockDashboard_Automation/nasdaqlisted.txt'
    df = pd.read_csv(file_path, delimiter='\t')
    symbols_df = df[['Symbol', 'Security Name']]
    symbols_df.columns = ['Ticker', 'Company Name']
    return symbols_df

nasdaq_symbols = get_nasdaq_symbols()
print(f"Loaded {len(nasdaq_symbols)} NASDAQ Companies.")

Loaded 4804 NASDAQ Companies.


In [7]:
company_name = nasdaq_symbols[nasdaq_symbols["Ticker"] == selected_ticker]["Company Name"].values[0]
print(f"Selected Company Name: {company_name}")

Selected Company Name: NVIDIA Corporation - Common Stock


In [8]:
reddit = asyncpraw.Reddit(
    client_id="VEB2VxRgVPH5RD8t6CBxiw",
    client_secret="KxTkyViIMNdYW2gZpdPwhqysD4HMgw",
    username="novicestockbot",
    password="Capstone606!",
    user_agent="novicestockbot by u/novicestockbot"
)

subreddits = ["wallstreetbets", "stocks", "investing", "securityanalysis", "stockmarket"]
days_back = 30
start_time = datetime.utcnow() - timedelta(days=days_back)

In [9]:
async def get_reddit_data(ticker, reddit):
    posts_data = []
    comments_data = []

    for sub in subreddits:
        subreddit = await reddit.subreddit(sub)

        async for submission in subreddit.new(limit=500):
            if ticker.lower() in submission.title.lower() or ticker.lower() in (submission.selftext or "").lower():
                post_time = datetime.utcfromtimestamp(submission.created_utc)

                if post_time >= start_time:
                    post_id = submission.id
                    posts_data.append({
                        "Post ID": post_id,
                        "Title": submission.title,
                        "Text": submission.selftext,
                        "Date": post_time.strftime("%Y-%m-%d"),
                        "Score": submission.score,
                        "Comments": submission.num_comments,
                        "Upvote Ratio": submission.upvote_ratio,
                        "Subreddit Name": sub,
                        "URL": submission.url,
                        "Author": submission.author.name if submission.author else "Unknown",
                        "Type": "Post"
                    })

                    await submission.load()
                    await submission.comments.replace_more(limit=0)
                    top_comments = sorted(submission.comments.list(), key=lambda c: c.score, reverse=True)[:10]

                    for comment in top_comments:
                        comment_time = datetime.utcfromtimestamp(comment.created_utc)
                        if comment_time >= start_time:
                            comments_data.append({
                                "Post ID": post_id,
                                "Title": "",
                                "Text": comment.body,
                                "Date": comment_time.strftime("%Y-%m-%d"),
                                "Score": comment.score,
                                "Comments": None,
                                "Upvote Ratio": None,
                                "Subreddit Name": sub,
                                "URL": submission.url,
                                "Author": comment.author.name if comment.author else "Unknown",
                                "Type": "Top Comment (from relevant post)"
                            })

    for sub in subreddits:
        subreddit = await reddit.subreddit(sub)

        async for comment in subreddit.comments(limit=500):
            comment_time = datetime.utcfromtimestamp(comment.created_utc)
            if comment_time >= start_time and ticker.lower() in comment.body.lower():
                comments_data.append({
                    "Post ID": comment.parent_id,
                    "Title": "",
                    "Text": comment.body,
                    "Date": comment_time.strftime("%Y-%m-%d"),
                    "Score": comment.score,
                    "Comments": None,
                    "Upvote Ratio": None,
                    "Subreddit Name": sub,
                    "URL": f"https://www.reddit.com{comment.permalink}",
                    "Author": comment.author.name if comment.author else "Unknown",
                    "Type": "Comment (directly mentions ticker)"
                })

    return pd.DataFrame(posts_data + comments_data)

In [10]:
df = await get_reddit_data(selected_ticker, reddit)


In [11]:
if not df.empty:
    print(f"Retrieved {len(df)} records (Posts + Comments) for {selected_ticker}")
    display(df)
else:
    print(f"⚠ No posts found for {selected_ticker} in the last month.")


Retrieved 347 records (Posts + Comments) for NVDA


Unnamed: 0,Post ID,Title,Text,Date,Score,Comments,Upvote Ratio,Subreddit Name,URL,Author,Type
0,1kfd2s1,-$52K Unrealized. Jumped into calls too early....,Tried to catch the bounce too early after the ...,2025-05-05,172,68.0,0.94,wallstreetbets,https://i.redd.it/fffyxo4k4zye1.png,Ancient-Mud8359,Post
1,1kd3go4,"$43K Profit – My “Triple Call Stack” on SPY, N...",**🧠 The Thinking:**\n\nThis wasn’t some overen...,2025-05-02,47,9.0,0.90,wallstreetbets,https://i.redd.it/rli2oyymxdye1.jpeg,Ancient-Mud8359,Post
2,1kczwui,NVDA sitting at $113 premarket — What’s your t...,"Hey everyone, just wanted to get some thoughts...",2025-05-02,25,36.0,0.74,wallstreetbets,https://www.reddit.com/r/wallstreetbets/commen...,Connect_Stick_4035,Post
3,1kc9fms,NVDA returns at 130%,Despite the pullback in NVIDIA's stock price a...,2025-05-01,42,10.0,0.90,wallstreetbets,https://i.redd.it/p25wl6f4e6ye1.jpeg,Aluseda,Post
4,1kc95nr,"$HOOD 10K 1DTE YOLO 47P, Post Earnings Sell Off",A degenerate's intuition on a good earnings se...,2025-05-01,10,10.0,0.82,wallstreetbets,https://www.reddit.com/r/wallstreetbets/commen...,SuperBearPut,Post
...,...,...,...,...,...,...,...,...,...,...,...
342,t3_1kf2vm8,,"they have been in business for so long, and al...",2025-05-05,1,,,wallstreetbets,https://www.reddit.com/r/wallstreetbets/commen...,hil_ton,Comment (directly mentions ticker)
343,t3_1kf7s2h,,NVDA 🏰 114,2025-05-05,4,,,wallstreetbets,https://www.reddit.com/r/wallstreetbets/commen...,HND171,Comment (directly mentions ticker)
344,t3_1kf7s2h,,The crooked market is keeping nvda down. This ...,2025-05-05,2,,,wallstreetbets,https://www.reddit.com/r/wallstreetbets/commen...,DementiaDonaldTrump,Comment (directly mentions ticker)
345,t3_1kfemtf,,"Well you could, but the profit you made with c...",2025-05-05,-1,,,investing,https://www.reddit.com/r/investing/comments/1k...,pvnieuw,Comment (directly mentions ticker)


In [12]:
social_data_records_path = f"/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/social_raw.csv"
df.to_csv(social_data_records_path, index=False)

In [13]:
post_counts = df[df["Type"] == "Post"].groupby("Date").size().reset_index(name="Post Count")

In [14]:
comment_counts = df[df["Type"] != "Post"].groupby("Date").size().reset_index(name="Comment Count")

In [15]:
sentiment_pipeline = pipeline("sentiment-analysis", model="ProsusAI/finbert")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


In [16]:
def analyze_sentiment(text):
    if not text.strip():
        return {"positive": "-", "neutral": "-", "negative": "-"}

    result = sentiment_pipeline(text[:512])[0]

    return {
        "positive": round(result["score"] if result["label"] == "positive" else 0, 4),
        "neutral": round(result["score"] if result["label"] == "neutral" else 0, 4),
        "negative": round(result["score"] if result["label"] == "negative" else 0, 4)
    }

def compute_sentiment_score(row):
    pos = row["Positive_Sentiment_Social"]
    neg = row["Negative_Sentiment_Social"]
    neu = row["Neutral_Sentiment_Social"]

    if pos == "-" and neg == "-" and neu == "-":
        return "-"

    pos = 0 if pos == "-" else pos
    neg = 0 if neg == "-" else neg
    neu = 0 if neu == "-" else neu

    return round((pos - neg) / (1 - neu) if (1 - neu) > 0 else 0, 4)

if not df.empty:
    df["Sentiment"] = df.apply(lambda row: analyze_sentiment(
        (row["Title"] if pd.notna(row["Title"]) else "") + " " +
        (row["Text"] if pd.notna(row["Text"]) else "")
    ), axis=1)

    df["Negative_Sentiment_Social"] = df["Sentiment"].apply(lambda x: x["negative"])
    df["Neutral_Sentiment_Social"] = df["Sentiment"].apply(lambda x: x["neutral"])
    df["Positive_Sentiment_Social"] = df["Sentiment"].apply(lambda x: x["positive"])

    df["Social_Sentiment_Score"] = df.apply(compute_sentiment_score, axis=1)

    df.drop(columns=["Sentiment"], inplace=True)

    daily_sentiment = df.groupby("Date")[
        ["Negative_Sentiment_Social", "Neutral_Sentiment_Social", "Positive_Sentiment_Social", "Social_Sentiment_Score"]
    ].mean().reset_index()



    daily_sentiment["Social_Sentiment_Score"] = ((daily_sentiment["Social_Sentiment_Score"] + 1) / 2) * 100

    daily_sentiment = pd.merge(daily_sentiment, post_counts, on="Date", how="left")
    daily_sentiment = pd.merge(daily_sentiment, comment_counts, on="Date", how="left")

    daily_sentiment.fillna("-", inplace=True)
    daily_sentiment["Ticker"] = selected_ticker

social_sentiment_path = f"/content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/social_sentiment.csv"
daily_sentiment.to_csv(social_sentiment_path, index=False)

print(f"Exported Social Sentiment to {social_sentiment_path}")

  daily_sentiment.fillna("-", inplace=True)


Exported Social Sentiment to /content/drive/My Drive/StockDashboard_Automation/ExportToGitHub/social_sentiment.csv
