#**Purpose**

The purpose of this notebook is to provide the top ten tickers mentioned in the past 24 hours across five financial subreddits using the reddit api. There is a weighted calculation where the top ticker is then used to calculate both the news and social media sentiment over the last 365 days.


# **Initial Setup**

In [1]:
!pip install praw


Collecting praw
  Downloading praw-7.8.1-py3-none-any.whl.metadata (9.4 kB)
Collecting prawcore<3,>=2.4 (from praw)
  Downloading prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update_checker>=0.18 (from praw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading praw-7.8.1-py3-none-any.whl (189 kB)
   ---------------------------------------- 0.0/189.3 kB ? eta -:--:--
   ------------------------------------ --- 174.1/189.3 kB 3.5 MB/s eta 0:00:01
   ---------------------------------------- 189.3/189.3 kB 3.8 MB/s eta 0:00:00
Downloading prawcore-2.4.0-py3-none-any.whl (17 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Installing collected packages: update_checker, prawcore, praw
Successfully installed praw-7.8.1 prawcore-2.4.0 update_checker-0.18.0



[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import pandas as pd
from collections import Counter
from datetime import datetime, timedelta
import praw


In [4]:
def get_nasdaq_symbols():
    file_path = r"C:\Users\amcco\Documents\stock-dashboard\app\data\nasdaqlisted.txt"
    df = pd.read_csv(file_path, delimiter='\t')
    symbols_df = df[['Symbol', 'Security Name']]
    symbols_df.columns = ['Ticker', 'Company Name']
    return symbols_df

nasdaq_symbols = get_nasdaq_symbols()
ticker_symbols = nasdaq_symbols['Ticker'].tolist()
company_names = nasdaq_symbols['Company Name'].tolist()

print(f"✅ Loaded {len(nasdaq_symbols)} NASDAQ companies.")

✅ Loaded 4804 NASDAQ companies.


# **Connect to the Reddit API**

In [5]:
reddit = praw.Reddit(
    client_id="P3eK5XHjb3GSsQ2-xC4JtA",
    client_secret="_8X7bpAo3V1BZAGl_zKjBtHrbzfl4w",
    user_agent="Mehul Satish Lad"
)

# **Extract Tickers from Titles, Posts  & Comments**

In [7]:
subreddits = ["wallstreetbets", "stocks", "investing", "securityanalysis", "stockmarket"]
start_time = datetime.utcnow() - timedelta(days=1)

def fetch_reddit_data():
    all_data = []  

    for subreddit in subreddits:
        for submission in reddit.subreddit(subreddit).new(limit=500):
            post_time = datetime.utcfromtimestamp(submission.created_utc)
            if post_time >= start_time:

                
                all_data.append({
                    "created_utc": post_time.strftime("%Y-%m-%d"),
                    "subreddit": subreddit,
                    "text": submission.title[:500],  
                    "source": "Title"
                })

                if submission.selftext:
                    all_data.append({
                        "created_utc": post_time.strftime("%Y-%m-%d"),
                        "subreddit": subreddit,
                        "text": submission.selftext[:500],  
                        "source": "Post"
                    })

                submission.comments.replace_more(limit=0)
                sorted_comments = sorted(submission.comments.list(), key=lambda c: c.score, reverse=True)[:5]  # Keep only top 5 comments
                for comment in sorted_comments:
                    comment_time = datetime.utcfromtimestamp(comment.created_utc)
                    if comment_time >= start_time:
                        all_data.append({
                            "created_utc": comment_time.strftime("%Y-%m-%d"),
                            "subreddit": subreddit,
                            "text": comment.body[:500],  
                            "source": "Comment"
                        })

    return pd.DataFrame(all_data)

In [8]:
df = fetch_reddit_data()

In [9]:
if df.empty:
    print("⚠ No new Reddit posts/comments found in the last 24 hours.")
else:
    print(f"✅ Collected {len(df)} Reddit posts/comments.")

✅ Collected 1191 Reddit posts/comments.


In [10]:
print(df.head())

  created_utc       subreddit  \
0  2025-04-22  wallstreetbets   
1  2025-04-22  wallstreetbets   
2  2025-04-22  wallstreetbets   
3  2025-04-22  wallstreetbets   
4  2025-04-22  wallstreetbets   

                                                text   source  
0  $MP Materials Locks in Major Contracts With Tw...    Title  
1                            MP is about to print 💸💸     Post  
2  Everyone’s Panicking On Tariffs — I’m Loading ...    Title  
3  https://preview.redd.it/z014fm775awe1.png?widt...     Post  
4                                            Shut up  Comment  


# **Aggregate All Mentions**

In [11]:
import re

valid_tickers = set(nasdaq_symbols['Ticker'])  

def extract_tickers(text):
    potential_tickers = re.findall(r'\b[A-Z]{1,5}\b', str(text)) 
    return [ticker for ticker in potential_tickers if ticker in valid_tickers]

df["extracted_tickers"] = df["text"].apply(extract_tickers)

In [12]:
df["extracted_tickers"] = df["extracted_tickers"].apply(lambda x: x if isinstance(x, list) else [])

df_exploded = df.explode("extracted_tickers").rename(columns={"extracted_tickers": "ticker_source"}).dropna()

ticker_counts = df_exploded.groupby(["ticker_source", "source"]).size().unstack(fill_value=0)

ticker_counts["title_weighted"] = ticker_counts.get("Title", 0) * 3
ticker_counts["post_weighted"] = ticker_counts.get("Post", 0) * 2
ticker_counts["comment_weighted"] = ticker_counts.get("Comment", 0) * 1

ticker_counts["weighted_score"] = ticker_counts["title_weighted"] + ticker_counts["post_weighted"] + ticker_counts["comment_weighted"]

top_10_tickers = ticker_counts[["weighted_score"]].sort_values(by="weighted_score", ascending=False).head(10)

In [13]:
print(top_10_tickers)

source         weighted_score
ticker_source                
TSLA                       46
EU                         21
NVDA                       20
VXUS                       10
OP                          8
BBSI                        7
GOOG                        7
NFE                         6
SQQQ                        6
FTC                         6


In [14]:
top_10_tickers = top_10_tickers.reset_index()

In [15]:
display(top_10_tickers)

source,ticker_source,weighted_score
0,TSLA,46
1,EU,21
2,NVDA,20
3,VXUS,10
4,OP,8
5,BBSI,7
6,GOOG,7
7,NFE,6
8,SQQQ,6
9,FTC,6


In [16]:
export_path = f"C:\Users\amcco\Documents\stock-dashboard\app\data\Top Ten Tickers.csv"

top_10_tickers.to_csv(export_path, index=False)

print(f"✅ Exported Top Ten Tickers to {export_path}")

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (3647724622.py, line 1)

In [None]:
print(top_10_tickers)

source ticker_source  weighted_score
0                 EU             264
1               TSLA              79
2                QQQ              76
3               NVDA              52
4                DJT              45
5                 OP              38
6                 UK              38
7               AAPL              35
8               AMZN              29
9                BND              21
