# First exploration of the Reddit API

There is already a Reddit API parser in Python called [PRAW](https://praw.readthedocs.io/en/latest/). The requirements to use it are outlined [here](https://praw.readthedocs.io/en/latest/getting_started/quick_start.html).

For our use case, we only need read-only access, which requires getting a client id, a client secret and a user agent. They are all hidden from the repo but anyone interested could set up theirs with a file like *config-personal.yml* and use reproduce this notebook.

In [1]:
import pandas as pd
import praw
from pmaw import PushshiftAPI
import yaml
import pendulum
from yaml.loader import SafeLoader
from praw.models import MoreComments

with open('../config-personal.yml') as f:
    config = yaml.load(f, Loader=SafeLoader)

reddit = praw.Reddit(
    client_id=config['reddit']['client_id'],
    client_secret=config['reddit']['secret'],
    user_agent=f"testscript by u/{config['reddit']['user_name']}",
)

subreddit = reddit.subreddit("wallstreetbets")

In [31]:
api = PushshiftAPI(praw=reddit)
gen = api.search_comments(since=int(pendulum.from_format('2021-01-30', 'YYYY-MM-DD').timestamp()),
                             until=int(pendulum.from_format('2021-01-30', 'YYYY-MM-DD').timestamp()),
                             subreddit='wallstreetbets', size=100)

Not all PushShift shards are active. Query results may be incomplete.


In [2]:
for submission in subreddit.hot(limit=10):
    print(submission.title)
    # Output: the submission's title
    print(submission.score)
    # Output: the submission's score
    print(submission.id)
    # Output: the submission's ID
    print(submission.url)

Daily Discussion Thread for May 22, 2023
101
13omies
https://www.reddit.com/r/wallstreetbets/comments/13omies/daily_discussion_thread_for_may_22_2023/
This is the last time I try to help you all, after this I'm just here LOL with you Apes
3679
13oavvz
https://www.reddit.com/r/wallstreetbets/comments/13oavvz/this_is_the_last_time_i_try_to_help_you_all_after/
White House says if the US defaults on its debt, the stock market could fall 45%
7350
13op7nn
https://www.cnbc.com/2023/05/21/debt-ceiling-yellen-says-hard-choices-will-need-to-be-made-if-debt-ceiling-is-not-raised.html
13 years ago, this man paid 10,000 Bitcoin for two Papa John's pizzas. Today, 10,000 BTC is worth $268 million.
1030
13oomia
https://v.redd.it/9p6ieo82fd1b1
EU reportedly fines Meta $1.3B over user data transfers to US
779
13ol2ln
http://www.breakingthenews.net/news/details/60039458
Morgan Stanley sees signals of panic buying in S&P 500, ongoing rally 'a head fake' By Investing.com
120
13oqupw
https://www.investing.c

In [7]:
post_title = []
post_comments = []

for submission in subreddit.hot(limit=10):
    print(submission.title)
    submission.comments.replace_more(limit=None)
    for comment in submission.comments.list():
         post_title.append(submission.title)
         post_comments.append(comment.body)

What Are Your Moves Tomorrow, May 15, 2023
Loud and Clear 😬
I will integrate AI into my personal life
if a company says ai it goes 🚀🚀🚀🚀🚀🚀🚀
Debt Ceiling with Rising Rates, No Problem ! 🎪
Risk management king. Catch me behind the Wendys.
Gang gang.
Genuine question, what do all of you do for work that you’re fine with having $5k weekly loss porn
spotted in the wild
I’m in a toxic relationship


In [8]:
df = pd.DataFrame({'title': post_title, 'comments': post_comments})

Due to new changes on Reddit API terms of services on May 1st 2023, [it is no longer available to search posts or comments by date](https://old.reddit.com/r/RedditAPIAdvocacy/comments/13esznz/reddit_has_cut_off_historical_data_access_help_us/). Hence, in order to cut for time and convenience, it is more feasible to use an already sourced dataframe for our research topic.