# Goals and Steps

**Goal:**

Use Reddit's Trending section to discover if of the topics that are trending are discusssing stocks. Able to figure out if the trending section includes stocks by looking for a ticker symbol or any other keywords.

From this, look to see if the topic is discussing if the stock will go up or down, or if it's positive or negative talks about said stock.

**Steps:**

1. Obtain reddit API and do a GET to obtain the Trending section words. Also able to just scrape the words on the Trending section.

2. Use the Trending Title to figure out if the topic involves stocks or not.

3. After targeting Trending Titles that discuss stocks, use the title words or topic texts to see if these are positives or negatives about the stock (sentiment analysis).

Extras: (remove exact same comments posted multiple times, the same account spamming with one ticker etc.) You can also find the most mentioned stock and pick from an array of those options and see the comments/sentiments around that stock. Will also need an api that grabs all stock tickers, this will be used to loop through Reddit comments to find any mentions of tickers.

**What to grab when working with the [Reddit API](https://www.reddit.com/dev/api/):**"
In order to obtain trending subreddits, will have to go to the listings section of the api documentation, hyperlink, and look a the [/api/trending_subreddits](https://www.reddit.com/dev/api#GET_api_trending_subreddits).

Able to use the [live threads](https://www.reddit.com/dev/api#section_live) section to grab live threads.

**The subreddit to focus on is r/wallstreetbets**


**Reddit Sections:**
- "Best" is the highest upvote to downvote ratio

- "Top" are the most votes, upvotes and downvotes

- "Hot" is the most upvotes recently.


**Reddit Type Prefixes**
- t1_	- Comment

- t2_	- Account

- t3_	- Link

- t4_	- Message

- t5_	- Subreddit

- t6_	- Award

# Imports

In [1]:
import pandas as pd
import numpy as np
import os

# API, scrappping, etc.
from bs4 import BeautifulSoup
import requests

In [2]:
# Reddit API Credentials
CLIENT_ID = os.environ.get("REDDIT_CLIENT_ID")
SECRET_KEY = os.environ.get("REDDIT_SECRET_KEY")
REDDIT_PASSWORD = os.environ.get("REDDIT_PASSWORD")

In [3]:
# Requests a temporary auth token from Reddit
auth = requests.auth.HTTPBasicAuth(CLIENT_ID, SECRET_KEY)

In [4]:
# Retrieving access token
data = {
    "grant_type": "password",
    "username": "Thisguycodes2",
    "password": REDDIT_PASSWORD
        }

In [5]:
headers = {'User-Agent': 'redditAPITrending:myredditapp:0.0.1 (by /u/Thisguycodes2)'}

In [6]:
# Send request for API token
res = requests.post("https://www.reddit.com/api/v1/access_token",
                   auth=auth, data=data, headers=headers)
ACCESS_TOKEN = res.json()["access_token"]  # Token will be put in header to access API
headers = {**headers, **{'Authorization': f"bearer {ACCESS_TOKEN}"}}  # Adding authorization to our headers dictionary

# Using Reddit API On Trending Stock /r/wallstreetbets

In [7]:
def get_trending_subreddit(subreddit="wallstreetbets", hot_new_or_rising="hot", limit=25):
    """
    Able to select which subreddit
    you'd like to view, and from there
    you're able to view all the data/text
    in said subreddit.
    
    The max limit is 100. The limit will
    return data points.
    
    Able to run/call this, and it will 
    update it with the latest information
    """
    trending_posts = requests.get(f"https://oauth.reddit.com/r/{subreddit}/{hot_new_or_rising}", headers=headers, params={"limit": limit})
    return trending_posts.json()["data"]["children"]

get_trending_subreddit()

[{'kind': 't3',
  'data': {'approved_at_utc': None,
   'subreddit': 'wallstreetbets',
   'selftext': 'Your daily trading discussion thread. Please keep the shitposting to a minimum. \n\n^Navigate ^WSB|^We ^recommend ^best ^daily ^DD\n:--|:--                                 \n**DD** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADD) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADD&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADD&amp;restrict_sr=on&amp;t=week)\n**Discussion** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADiscussion) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADiscussion&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADiscussion&amp;restrict_sr=on&amp;t=week)

In [8]:
subreddit = "wallstreetbets"
trending_posts = requests.get(f"https://oauth.reddit.com/r/{subreddit}/hot", headers=headers, params={"limit": 100})
stocks_df = pd.DataFrame()
for post in trending_posts.json()["data"]["children"]:
    stocks_df = stocks_df.append({
        "Subreddit": post["data"]["subreddit"],
        "Title": post["data"]["title"],
        "Text": post["data"]["selftext"],
        "Up Vote Ratio": post["data"]["upvote_ratio"],
        "Up Votes": post["data"]["ups"],
        "Down Votes": post["data"]["downs"],
        "Thread_id": post["kind"] + "_" + post["data"]["id"]  # Also available by doing post["name"]
                }, ignore_index=True, sort=False)

In [9]:
stocks_df

Unnamed: 0,Down Votes,Subreddit,Text,Thread_id,Title,Up Vote Ratio,Up Votes
0,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0
1,0.0,wallstreetbets,"Long story short, unaffiliated people are impe...",t3_lvb3o9,I am once again asking for your Twitter support,0.97,12581.0
2,0.0,wallstreetbets,,t3_lvkqmu,Which one of you beautiful dipshits bought $12...,0.98,6601.0
3,0.0,wallstreetbets,,t3_lvh4ir,"A sign from God himself, HOLD MOTHERFUCKERS",0.96,12659.0
4,0.0,wallstreetbets,,t3_lvjii7,When GME starts rising!,0.98,5772.0
...,...,...,...,...,...,...,...
97,0.0,wallstreetbets,,t3_lvlawi,Still holding RKT,0.83,81.0
98,0.0,wallstreetbets,,t3_lvifj0,My parents have been calling me retarded my wh...,0.90,115.0
99,0.0,wallstreetbets,,t3_lv33d0,Loaded Up and Ready For Today ✊🏼,0.96,1178.0
100,0.0,wallstreetbets,,t3_lvnoww,RKT 🚬🧑‍🦯 YOLOd $52k so the_big_short_2020 woul...,0.87,58.0


## Getting Reddit Thread Comments

In [10]:
# Looking at subreddit comments
subreddit = "wallstreetbets"
article = stocks_df["Thread_id"][0][3:]  # Returning only the id, not the kind (e.g. t3)
trending_posts_comments = requests.get(f"https://oauth.reddit.com/r/{subreddit}/comments/{article}", 
                                       headers=headers, 
                                       params={"limit":100, "sort": "top"})

In [11]:
trending_posts_comments.json()

[{'kind': 'Listing',
  'data': {'modhash': None,
   'dist': 1,
   'children': [{'kind': 't3',
     'data': {'approved_at_utc': None,
      'subreddit': 'wallstreetbets',
      'selftext': 'Your daily trading discussion thread. Please keep the shitposting to a minimum. \n\n^Navigate ^WSB|^We ^recommend ^best ^daily ^DD\n:--|:--                                 \n**DD** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADD) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADD&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADD&amp;restrict_sr=on&amp;t=week)\n**Discussion** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADiscussion) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADiscussion&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.reddit.com/r

In [12]:
trending_posts_comments.json()[1]["data"]["children"][0]["data"]["body"]

'I just saw a man drink his own piss if that isn’t bullish AF for tomorrow I don’t know what is'

In [13]:
for comments in trending_posts_comments.json()[1]["data"]["children"]:
    print(comments["data"].get("body"))  # Retrieving the "body" key did not work, accessing the "body" key by using the .get method

I just saw a man drink his own piss if that isn’t bullish AF for tomorrow I don’t know what is
It’s the year 2031, I get on my Gamestop Airlines flight departing from New Gamestop and landing on Los Gamestop, as I drive my Tesla home. As soon as I get home I hop into my pool and once I get out my son Game Stop the third asks me about my “Deep Fucking Value” tattoo on my lower back. I look him deep in the eyes and tell him “Where he goes, I go, if he holds the line, I’m standing right next to him”.
HOLD THE GME LINE APES!!!
All my stonks are green.. I don't get it? aren't they supposed to be red?
GME: 105

Me: Nope

GME: 110

Me: Not happening

GME: 115

Me: I'm not doing this again!

GME: 120

Me: I'M DONE

GME: 130!!!

Me: OKAY FINE

GME: ...115

FUCK
Hedgefund managers, if you aren't willing to drink your own piss, do you think you really stand a chance?
🚨IF U LOST MONEY TODAY U R RETARDED🚨
If you didn’t make money today...

1) you truly are retarded
2) come back tomorrow, casino ope

In [14]:
# Looking at subreddit comments
comments_df = pd.DataFrame()  # Empty comment dataframe to store thread comments
subreddit = "wallstreetbets"
for article_id in stocks_df["Thread_id"]:
    trending_posts_comments = requests.get(f"https://oauth.reddit.com/r/{subreddit}/comments/{article_id[3:]}", 
                                       headers=headers, 
                                       params={"limit":100, "depth":5})
    for comments in trending_posts_comments.json()[1]["data"]["children"]:
        comments_df = comments_df.append({
                                        "Thread_id": comments["data"].get("parent_id"),
                                        "Comments": comments["data"].get("body")
                                        }, ignore_index=True)


In [15]:
comments_df

Unnamed: 0,Comments,Thread_id
0,Damn futures are boring tonight. Someone go tu...,t3_lvkeek
1,DPLS???,t3_lvkeek
2,Never thought I’d say this but: RKT🚀🚀🚀,t3_lvkeek
3,give me karma,t3_lvkeek
4,Apes pretty insecure about their bags tonight 🤣,t3_lvkeek
...,...,...
2909,"This was very simple yet, very entertaining. G...",t3_lur27s
2910,Great work ape!,t3_lur27s
2911,I have obtained a wrinkle,t3_lur27s
2912,Fantastic work! Love the story line,t3_lur27s


### Problem:

For some weird reason the key value is giving an error, saying that it is not there when infact it is.

### Solution:

Using the .get method, which is introduced when using dictionaries, was able to fix the problem.

In [16]:
# Merging comments_df with stocks_df using Thread_id
stocks_df = pd.merge(left=stocks_df, right=comments_df, how="inner", on="Thread_id")

In [17]:
stocks_df

Unnamed: 0,Down Votes,Subreddit,Text,Thread_id,Title,Up Vote Ratio,Up Votes,Comments
0,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0,Damn futures are boring tonight. Someone go tu...
1,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0,DPLS???
2,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0,Never thought I’d say this but: RKT🚀🚀🚀
3,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0,give me karma
4,0.0,wallstreetbets,Your daily trading discussion thread. Please k...,t3_lvkeek,"What Are Your Moves Tomorrow, March 02, 2021",0.97,713.0,Apes pretty insecure about their bags tonight 🤣
...,...,...,...,...,...,...,...,...
2909,0.0,wallstreetbets,,t3_lur27s,GME explained via Smash Ultimate,0.97,8954.0,"This was very simple yet, very entertaining. G..."
2910,0.0,wallstreetbets,,t3_lur27s,GME explained via Smash Ultimate,0.97,8954.0,Great work ape!
2911,0.0,wallstreetbets,,t3_lur27s,GME explained via Smash Ultimate,0.97,8954.0,I have obtained a wrinkle
2912,0.0,wallstreetbets,,t3_lur27s,GME explained via Smash Ultimate,0.97,8954.0,Fantastic work! Love the story line


# Sentiment Analysis

In [18]:
import nltk
from nltk.corpus import stopwords
import re
import string

In [19]:
def clean_text(df, column:str):
    """Make text lowercase, 
    remove square brackers,
    and remove punctuations"""
    df[column] = [names.lower() for names in df[column]]  # Convert all letter to lowercase
    df[column] = [names.encode("ascii", errors="ignore").decode() for names in df[column]] # Remove non ascii chars
    df[column] = [re.sub("\[.*?\]", "", names) for names in df[column]]  # Remove anything in brackets
    df[column] = [re.sub("[%s]" % re.escape(string.punctuation), "", names) for names in df[column]]  # Remove punctuations
    df[column] = [re.sub("\w*\d\w*", "", names) for names in df[column]]  # Remove words with numbers in them
    return df[column]

In [20]:
# Converting comments column to a string
stocks_df["Comments"] = [str(comments) for comments in stocks_df["Comments"]]

In [21]:
stocks_df["Comments"] = clean_text(stocks_df, "Comments")

In [22]:
# Tokenization - Breaking a corpus into smaller texts/words
stocks_df["Comments"] = stocks_df["Comments"].apply(lambda x: nltk.word_tokenize(x))

In [23]:
stocks_df["Comments"]

0       [damn, futures, are, boring, tonight, someone,...
1                                                  [dpls]
2               [never, thought, id, say, this, but, rkt]
3                                       [give, me, karma]
4       [apes, pretty, insecure, about, their, bags, t...
                              ...                        
2909    [this, was, very, simple, yet, very, entertain...
2910                                   [great, work, ape]
2911                      [i, have, obtained, a, wrinkle]
2912            [fantastic, work, love, the, story, line]
2913                                               [none]
Name: Comments, Length: 2914, dtype: object

In [25]:
# Removing stopwords
stocks_df["Comments"] = stocks_df["Comments"].apply(lambda x: [words for words in x if words not in nltk.corpus.stopwords.words("english")])

In [26]:
stocks_df["Comments"]

0       [damn, futures, boring, tonight, someone, go, ...
1                                                  [dpls]
2                          [never, thought, id, say, rkt]
3                                           [give, karma]
4                 [apes, pretty, insecure, bags, tonight]
                              ...                        
2909               [simple, yet, entertaining, good, job]
2910                                   [great, work, ape]
2911                                  [obtained, wrinkle]
2912                 [fantastic, work, love, story, line]
2913                                               [none]
Name: Comments, Length: 2914, dtype: object