# SCRAPING AsianAmerican SUBREDDIT

Referenced Link: https://towardsdatascience.com/scraping-reddit-data-1c0af3040768

---
---

# SETUP STEPS:

## Import env for API Keys

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

True

## Create PRAW instance

In [2]:
import praw

reddit = praw.Reddit(client_id=os.getenv("my_client_id"), 
                     client_secret=os.getenv("my_client_secret"), 
                     user_agent=os.getenv("my_user_agent"))


---
---
---


# CREATE TWO DATA FRAMES

- ## Hot Posts dataframe
- ## Top Comments from Hot Posts dataframe

### Create top posts dataframe

In [3]:
import pandas as pd

# create empty list to gather raw post data
recorded_posts = []

# create instance of PRAW for subreddit
top_posts = reddit.subreddit('AsianAmerican').top(time_filter="all",
                                                 limit=None)

# loop through PRAW instance and record in post list
for post in top_posts:
    
    recorded_posts.append([post.title, 
                  post.score, 
                  post.id, 
                  post.subreddit, 
                  post.url, 
                  post.num_comments, 
                  post.selftext, 
                  post.created])

# create dataframe for posts
top_posts_df = pd.DataFrame(recorded_posts,
                     columns=['title', 
                              'score', 
                              'id', 
                              'subreddit', 
                              'url', 
                              'num_comments', 
                              'body', 
                              'created'])

print(top_posts_df.shape[0], "posts scraped.")
top_posts_df.head()

999 posts scraped.


Unnamed: 0,title,score,id,subreddit,url,num_comments,body,created
0,"NBA All-Star Damian Lillard wears """"Stop Asian...",1604,me5ef9,asianamerican,https://i.redd.it/rey9pga6lhp61.jpg,54,,1616815000.0
1,“Fuck you! We will stop the Hate!” NBA star Ba...,1375,m35vd1,asianamerican,https://i.redd.it/sd9p46xlzhm61.jpg,58,,1615512000.0
2,Accurate,1285,gkut94,asianamerican,https://i.redd.it/gbzzxwnir4z41.jpg,44,,1589636000.0
3,"Naomi Osaka: ""If people loved Asian people as ...",1279,mf1q4w,asianamerican,https://twitter.com/naomiosaka/status/13758652...,87,,1616942000.0
4,My friend's mother was one of the victims of A...,1194,m885fx,asianamerican,https://gofund.me/6653b648,51,,1616124000.0


### Create comments dataframe

In [5]:
from praw.models import MoreComments

# create dataframe for comments
comments_df = pd.DataFrame(columns=['post_id', 'body'])


# loop through ids in posts, and gather all the top comments into dataframe
for i, post_id in enumerate(top_posts_df.id):
    
    if i % 10 == 0:
        
        print("Scraping post with index number:", i)
    
    comments = []

    submission = reddit.submission(id=post_id)

    for comment in submission.comments:

        if isinstance(comment, MoreComments):

            continue

        comments.append([post_id, comment.body])

    comments = pd.DataFrame(comments,
                         columns=['post_id', 'body'])
    
    comments_df = pd.concat([comments_df, comments], sort=False)

    
print(comments_df.shape[0], "top comments scraped.\n")

print(comments_df.info())

comments_df.head()

Scraping post with index number: 0
Scraping post with index number: 10
Scraping post with index number: 20
Scraping post with index number: 30
Scraping post with index number: 40
Scraping post with index number: 50
Scraping post with index number: 60
Scraping post with index number: 70
Scraping post with index number: 80
Scraping post with index number: 90
Scraping post with index number: 100
Scraping post with index number: 110
Scraping post with index number: 120
Scraping post with index number: 130
Scraping post with index number: 140
Scraping post with index number: 150
Scraping post with index number: 160
Scraping post with index number: 170
Scraping post with index number: 180
Scraping post with index number: 190
Scraping post with index number: 200
Scraping post with index number: 210
Scraping post with index number: 220
Scraping post with index number: 230
Scraping post with index number: 240
Scraping post with index number: 250
Scraping post with index number: 260
Scraping pos

Unnamed: 0,post_id,body
0,me5ef9,✊✊🏿✊✊🏿✊✊🏿
1,me5ef9,The high school he went to in Oakland is 50% A...
2,me5ef9,✊✊✊✊✊✊
3,me5ef9,[deleted]
4,me5ef9,Love to see the support. Anyone know where to ...


# EXPORT to CSVs

In [6]:
top_posts_df.to_csv("AsianAmerican_posts.csv", sep=',')

comments_df.to_csv("AsianAmerican_comments.csv", sep=',')