# First exploration of the Reddit API

There is already a Reddit API parser in Python called [PRAW](https://praw.readthedocs.io/en/latest/). The requirements to use it are outlined [here](https://praw.readthedocs.io/en/latest/getting_started/quick_start.html).

For our use case, we only need read-only access, which requires getting a client id, a client secret and a user agent. They are all hidden from the repo but anyone interested could set up theirs with a file like *config-personal.yml* and use reproduce this notebook.

In [5]:
import pandas as pd
import praw
from pmaw import PushshiftAPI
import yaml
import pendulum
from yaml.loader import SafeLoader
from praw.models import MoreComments

with open('../config-personal.yml') as f:
    config = yaml.load(f, Loader=SafeLoader)

reddit = praw.Reddit(
    client_id=config['reddit']['client_id'],
    client_secret=config['reddit']['secret'],
    user_agent=f"testscript by u/{config['reddit']['user_name']}",
)

subreddit = reddit.subreddit("wallstreetbets")

In [31]:
api = PushshiftAPI(praw=reddit)
gen = api.search_comments(since=int(pendulum.from_format('2021-01-30', 'YYYY-MM-DD').timestamp()),
                             until=int(pendulum.from_format('2021-01-30', 'YYYY-MM-DD').timestamp()),
                             subreddit='wallstreetbets', size=100)

Not all PushShift shards are active. Query results may be incomplete.


In [6]:
for submission in subreddit.hot(limit=10):
    print(submission.title)
    # Output: the submission's title
    print(submission.score)
    # Output: the submission's score
    print(submission.id)
    # Output: the submission's ID
    print(submission.url)

What Are Your Moves Tomorrow, May 15, 2023
17
13hlqjx
https://www.reddit.com/r/wallstreetbets/comments/13hlqjx/what_are_your_moves_tomorrow_may_15_2023/
Loud and Clear 😬
15410
13hfoas
https://i.redd.it/09sklqvv0vza1.jpg
I will integrate AI into my personal life
1089
13hdqkb
https://i.redd.it/5y1wbngnmuza1.jpg
if a company says ai it goes 🚀🚀🚀🚀🚀🚀🚀
693
13hf9hv
https://i.redd.it/2ngrpr9bgtza1.png
Debt Ceiling with Rising Rates, No Problem ! 🎪
422
13hg8n8
https://i.redd.it/dd7uzm8dntza1.png
Risk management king. Catch me behind the Wendys.
114
13hk8g8
https://i.redd.it/e3nt5di7yvza1.jpg
Gang gang.
93
13hkxqn
https://i.redd.it/znlijptg3wza1.jpg
Genuine question, what do all of you do for work that you’re fine with having $5k weekly loss porn
121
13hh9fp
https://www.reddit.com/r/wallstreetbets/comments/13hh9fp/genuine_question_what_do_all_of_you_do_for_work/
spotted in the wild
4752
13gt82e
https://i.redd.it/c6czsedtjpza1.jpg
I’m in a toxic relationship
108
13hdwex
https://i.redd.it/zqzsgvstn

In [7]:
post_title = []
post_comments = []

for submission in subreddit.hot(limit=10):
    print(submission.title)
    submission.comments.replace_more(limit=None)
    for comment in submission.comments.list():
         post_title.append(submission.title)
         post_comments.append(comment.body)

What Are Your Moves Tomorrow, May 15, 2023
Loud and Clear 😬
I will integrate AI into my personal life
if a company says ai it goes 🚀🚀🚀🚀🚀🚀🚀
Debt Ceiling with Rising Rates, No Problem ! 🎪
Risk management king. Catch me behind the Wendys.
Gang gang.
Genuine question, what do all of you do for work that you’re fine with having $5k weekly loss porn
spotted in the wild
I’m in a toxic relationship


In [8]:
df = pd.DataFrame({'title': post_title, 'comments': post_comments})

Due to new changes on Reddit API terms of services on May 1st 2023, [it is no longer available to search posts or comments by date](https://old.reddit.com/r/RedditAPIAdvocacy/comments/13esznz/reddit_has_cut_off_historical_data_access_help_us/). Hence, in order to cut for time and convenience, it is more feasible to use an already sourced dataframe for our research topic.