## Content

1. Goal
2. Setup
3. Connect to API using PRAW
4. Retrieve Data from Subreddit
5. Store Data

## Goal

The following data is retrieved through the API:
- Name of subreddit
- Number of subscribers in subreddit
- Number of posts in subreddit
- Date of posts
- Number of comments in posts

Coins for which data is pulled:
- Audius
- Ecomi
- FTX

Coins that are interesting but have no reddit yet:
- Convex Finance
- Dopex
- Rari Governance Token
- Spell Token
- Raydium
- GMX
- NFTX
- OCEAN
- Gro DAO Token
- Ribbon Finance

## Setup

In [1]:
# If you installed Python using Anaconda or Miniconda, then use conda:
# conda install -c conda-forge praw

# If you installed Python any other way, then use pip:
# !pip install praw

In [2]:
import praw          # Reddit API wrapper
import pprint        # Formats data structure outputs. https://docs.python.org/3/library/pprint.html
import pandas as pd  # For organising data in tables
from datetime import datetime  # Needed for date & datetime functions

In [3]:
# pd.options.display.max_rows = 4000
# pd.options.display.max_columns = 50

## Connect to API using PRAW
PRAW: Python Reddit API Wrapper

API documentation
https://praw.readthedocs.io/en/v7.4.0/code_overview/models/subreddit.html

Introduction guide
https://gilberttanner.com/blog/scraping-redditdata

In [4]:
my_client_id     = "qIZJdlnEWYTd4pUr6QtrhQ"
my_client_secret = "foKDZFlOxPkoQDjzzpRU4myPPs-kVQ"
my_user_agent    = "Schiggy"

reddit = praw.Reddit(client_id     = my_client_id,
                     client_secret = my_client_secret,
                     user_agent    = my_user_agent)

## Retrieve Data from Subreddits

#### Get X newest posts

In [5]:
x_newest_posts = 100000

subr_1 = "audius"
subr_2 = "ecomi"
subr_3 = "FTXOfficial"
subreddits = subr_1 + "+" + subr_2 + "+" + subr_3
model = reddit.subreddit(subreddits)

posts = []

for post in model.new(limit=x_newest_posts):
    
    # Convert post createn date to readable format
    unix_date     = int(post.created)
    readable_date = datetime.utcfromtimestamp(unix_date).strftime("%d-%m-%Y")
    
    posts.append([post.title,
                  post.num_comments,
                  post.subreddit,
                  post.subreddit_subscribers,
                  readable_date,
                  post.created])

        
df_today = pd.DataFrame(posts, columns = ['post_title',
                                          'num_comments',
                                          'subreddit',
                                          'subreddit_subscribers',
                                          'date',
                                          'date_unix'])

df_today.head(3)

Unnamed: 0,post_title,num_comments,subreddit,subreddit_subscribers,date,date_unix
0,Music NFT'S,0,ecomi,22457,31-10-2021,1635690000.0
1,Fresh music !!,0,audius,8065,31-10-2021,1635690000.0
2,The Toys that Made Us,1,ecomi,22457,31-10-2021,1635685000.0


## Store Data

In [6]:
df_today.to_csv("./Datasets/historical_data_reddit.csv", index= False)