# Scraping data from Reddit

Use PRAW library in Python to scraping data from Reddit. Detail refer to this post: https://www.atoti.io/reddit-data-analytics-trilogy-1-data-scraping-with-praw/

API documentation: https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html

In [4]:
!pip install praw



In [1]:
import praw
import pandas as pd

In [2]:
# create a reddit connection
# create your own Reddit developer account at https://www.reddit.com/prefs/apps/ and get secret, id etc., info as below 
reddit = praw.Reddit(
    client_id= 'xxxx', 
    client_secret= 'xxxx', 
    user_agent= 'xxxx'
)

Version 7.4.0 of praw is outdated. Version 7.6.0 was released 2 days ago.


## Find the sub reddit 

Here is a tool to find subreddit: https://anvaka.github.io/map-of-reddit/?x=145246&y=454606&z=3257.015516997539  

### Create a function to get Reddit submissions (without comments)

In [3]:
def scrap_reddit(name):
    """
    This function is to
    scrap the /name subreddit passed in as
    a parameter, then store it into a dataframe
    """
    _posts = []
    # return 10000 hot posts from MensRights
    hot_bets = reddit.subreddit(name).hot(limit=10000)
    # return the important attributes
    for post in hot_bets:
        _posts.append(
            [
                post.id,
                post.name,
                post.subreddit,
                post.title,
                post.score, # The number of upvotes for the submission.
                post.num_comments,
                post.selftext,
                post.created_utc,
                post.pinned,
                post.total_awards_received,
                post.upvote_ratio, # The percentage of upvotes from all votes on the submission
                post.url
            ]
        )
        
        # create a dataframe
    _posts = pd.DataFrame(
        _posts,
        columns=[
            "id",
            "name",
            "subreddit",
            "title",
            "score", # The number of upvotes for the submission.
            "num_comments",
            "selftext",
            "created_utc",
            "pinned",
            "total awards",
            "upvote_ratio", # The percentage of upvotes from all votes on the submission
            "url"
        ],
    )
    
    _posts["created"] = pd.to_datetime(_posts["created_utc"], unit="s")
    _posts["created date"] = pd.to_datetime(_posts["created_utc"], unit="s").dt.date
    _posts["created time"] = pd.to_datetime(_posts["created_utc"], unit="s").dt.time
    
    count_row = _posts.shape[0]  # Gives number of rows
    count_col = _posts.shape[1]  # Gives number of columns
    print(count_row)
    print(count_col)
    
    return _posts

In [5]:
MensRights = scrap_reddit('MensRights')
MensRights.head()
# save the dataframe into a csv file
MensRights.to_csv('Reddit_MensRights.csv')

853
15


In [6]:
MensLib = scrap_reddit('MensLib')
MensLib.head()
# save the dataframe into a csv file
MensLib.to_csv('MensLib.csv')

826
15


In [7]:
Feminism = scrap_reddit('Feminism')
Feminism.head()
# save the dataframe into a csv file
Feminism.to_csv('Feminism.csv')

732
15


In [8]:
IncelTear = scrap_reddit('IncelTear')
IncelTear.head()
# save the dataframe into a csv file
IncelTear.to_csv('IncelTear.csv')

943
15


In [9]:
ainbow = scrap_reddit('ainbow')
ainbow.head()
# save the dataframe into a csv file
ainbow.to_csv('ainbow.csv')

512
15


In [11]:
TwoXChromosomes = scrap_reddit('TwoXChromosomes')
TwoXChromosomes.head()
# save the dataframe into a csv file
ainbow.to_csv('TwoXChromosomes.csv')

927
15


### Create a function to get Reddit submissions (with comments)

In [12]:
def scrap_reddit_submissioncomments(name):
    """
    This function is to
    scrap the posts and comments in /name subreddit, then store it into a dataframe
    """
    # create a dictionary to hold posts
    post_dict = {
        "id":[],
        "name":[],
        "subreddit":[],
        "title":[],
        "score":[],
        "num_comments":[],
        "selftext":[],
        "created_utc":[],
        "pinned":[],
        "total_awards_received":[],
        "upvote_ratio":[],
        "url":[]
    }
    # create a dictionary to hold comments in each posts
    comment_dict = {
            "comment_id":[],
            "comment_parent_id":[],
            "comment_body":[],
            "comment_link_id":[]
        }
    
    # scraping posts in the subreddit and store them in the post_dict dictionary
    # you can of course change the limit number to get as more posts
    for post in reddit.subreddit(name).hot(limit=1000):
        post_dict["id"].append(post.id)
        post_dict["name"].append(post.name)
        post_dict["subreddit"].append(post.subreddit)
        post_dict["title"].append(post.title)
        post_dict["score"].append(post.score)
        post_dict["num_comments"].append(post.num_comments)
        post_dict["selftext"].append(post.selftext)
        post_dict["created_utc"].append(post.created_utc)
        post_dict["pinned"].append(post.pinned)
        post_dict["total_awards_received"].append(post.total_awards_received)
        post_dict["upvote_ratio"].append(post.upvote_ratio)
        post_dict["url"].append(post.url)
    
    # scraping comments in the posts and store them in the comment_dict dictionary
        post.comments.replace_more(limit=None)
        for comment in post.comments.list():
            comment_dict["comment_id"].append(comment.id)
            comment_dict["comment_parent_id"].append(comment.parent_id)
            comment_dict['comment_body'].append(comment.body)
            comment_dict['comment_link_id'].append(comment.link_id)
    # translate created_utc time into date and time    
    post_data = pd.DataFrame(post_dict)
    post_data["created date"] = pd.to_datetime(post_data["created_utc"], unit="s").dt.date
    post_data["created time"] = pd.to_datetime(post_data["created_utc"], unit="s").dt.time
    
    comment_data = pd.DataFrame(comment_dict)
    
    post_count_row = post_data.shape[0]  # Gives number of rows
    post_count_col = post_data.shape[1]  # Gives number of columns
    print("Number of posts scrapped from this subreddit is {}".format(post_count_row))
    comment_count_row = comment_data.shape[0]  # Gives number of rows
    comment_count_col = comment_data.shape[1]  # Gives number of columns
    print("Number of comments scrapped from this subreddit is {}".format(comment_count_row))
    
    comment_data.to_csv(name +"_comments_" + "subreddit.csv")
    post_data.to_csv(name+"_" + "subreddit.csv")
    
    return post_data
    return comment_data

In [13]:
MensRights = scrap_reddit_submissioncomments('MensRights')
MensRights.head()

Number of posts scrapped from this subreddit is 852
Number of comments scrapped from this subreddit is 40247


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,uf6mqb,t3_uf6mqb,MensRights,"The feminist myth of ""Thousands of years of op...",749,145,[https://www.theguardian.com/science/2022/apr/...,1651306000.0,False,2,0.94,https://www.reddit.com/r/MensRights/comments/u...,2022-04-30,08:07:47
1,u95mh8,t3_u95mh8,MensRights,Applying best practice to men's human rights. ...,163,26,Here's my plan.\n\n1. Create a document with b...,1650600000.0,False,3,0.98,https://www.reddit.com/r/MensRights/comments/u...,2022-04-22,03:57:30
2,uocfiw,t3_uocfiw,MensRights,Wife filled husband’s phone with child porn in...,416,45,,1652393000.0,False,1,0.98,https://www.pennlive.com/crime/2022/05/wife-fi...,2022-05-12,22:00:24
3,uoel8t,t3_uoel8t,MensRights,men should qualify to retire earlier than women.,126,17,Men don't live as long as women so they shoul...,1652399000.0,False,0,0.88,https://www.reddit.com/r/MensRights/comments/u...,2022-05-12,23:47:04
4,uo2rux,t3_uo2rux,MensRights,Female And Non-Binary Uber Drivers Will Now Be...,578,193,,1652367000.0,False,0,0.96,https://www.ladbible.com/news/latest-female-an...,2022-05-12,14:47:26


In [14]:
TumblrInAction = scrap_reddit_submissioncomments('TumblrInAction')
TumblrInAction.head()

Number of posts scrapped from this subreddit is 904
Number of comments scrapped from this subreddit is 80980


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,ujvkuf,t3_ujvkuf,TumblrInAction,"While we're still alive, it's time to give thi...",279,101,"Hey y'all. As you can tell, we're still alive....",1651866000.0,False,0,0.94,https://www.reddit.com/r/TumblrInAction/commen...,2022-05-06,19:45:24
1,uo0tgf,t3_uo0tgf,TumblrInAction,"You “misgendered” me, so I’m going to torture ...",2085,238,,1652361000.0,False,0,0.97,https://i.redd.it/8a31dskyo1z81.jpg,2022-05-12,13:14:55
2,uoea6e,t3_uoea6e,TumblrInAction,"In other news, how a woman's body during pregn...",287,34,,1652398000.0,False,0,0.99,https://www.reddit.com/gallery/uoea6e,2022-05-12,23:31:01
3,uo5mzn,t3_uo5mzn,TumblrInAction,DEAR GOD,902,276,,1652375000.0,False,0,0.94,https://i.redd.it/3mxhrq7js2z81.jpg,2022-05-12,16:56:46
4,uodiuh,t3_uodiuh,TumblrInAction,killing yourself is cultural appropriation,294,35,,1652396000.0,False,0,0.97,https://i.redd.it/hth4u4zyj4z81.png,2022-05-12,22:52:19


In [15]:
WhereAreAllTheGoodMen = scrap_reddit_submissioncomments('WhereAreAllTheGoodMen')
WhereAreAllTheGoodMen.head()

Number of posts scrapped from this subreddit is 942
Number of comments scrapped from this subreddit is 105060


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,ug2ihx,t3_ug2ihx,WhereAreAllTheGoodMen,All posts must originate from forums.red to be...,11,40,Pursuant to our announcement [here](https://ww...,1651420000.0,False,0,0.59,https://www.reddit.com/r/WhereAreAllTheGoodMen...,2022-05-01,15:40:04
1,uj7563,t3_uj7563,WhereAreAllTheGoodMen,"WAATGM is moving, whether you like it or not.",0,129,"As a long-time moderator of WAATGM, I am disap...",1651784000.0,False,0,0.46,https://www.reddit.com/r/WhereAreAllTheGoodMen...,2022-05-05,20:59:36
2,ung6ww,t3_ung6ww,WhereAreAllTheGoodMen,Where oh where is her good person at? She NO L...,146,26,,1652293000.0,False,0,0.94,https://www.forums.red/p/whereareallthegoodmen...,2022-05-11,18:14:29
3,umfn37,t3_umfn37,WhereAreAllTheGoodMen,This fíorleadhb bhuile: “I fucking hate men!!!...,147,33,,1652179000.0,False,0,0.88,https://www.forums.red/p/whereareallthegoodmen...,2022-05-10,10:32:49
4,umf6sk,t3_umf6sk,WhereAreAllTheGoodMen,This post is for those with the urge to white ...,38,7,,1652177000.0,False,0,0.84,https://www.forums.red/p/whereareallthegoodmen...,2022-05-10,10:03:16


In [16]:
PussyPass = scrap_reddit_submissioncomments('PussyPass')
PussyPass.head()

Number of posts scrapped from this subreddit is 747
Number of comments scrapped from this subreddit is 14131


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,uljvu7,t3_uljvu7,PussyPass,Movie recommendation: How to Lose a Guy in 10 ...,94,6,I don't know if this is allowed in this subred...,1652073000.0,False,0,0.82,https://www.reddit.com/r/PussyPass/comments/ul...,2022-05-09,05:14:38
1,ukub4t,t3_ukub4t,PussyPass,Murder charges dropped after woman murders hus...,42,26,Killing husbands just isn't a crime for women ...,1651985000.0,False,0,0.58,https://www.reddit.com/r/PussyPass/comments/uk...,2022-05-08,04:44:34
2,ui6si1,t3_ui6si1,PussyPass,'Perverse' woman who lured 15-year-old schoolb...,194,10,,1651671000.0,False,0,0.98,https://www.dailymail.co.uk/news/article-10781...,2022-05-04,13:38:17
3,uhq7zd,t3_uhq7zd,PussyPass,Mother who removed the feeding tube from her c...,337,18,,1651613000.0,False,0,0.96,https://i.redd.it/zqplm9imubx81.jpg,2022-05-03,21:16:13
4,ugumn0,t3_ugumn0,PussyPass,Sex worker gets 30 years for fatally drugging ...,146,7,,1651512000.0,False,0,0.95,https://www.washingtonpost.com/nation/2022/05/...,2022-05-02,17:17:03


In [17]:
TwoXChromosomes = scrap_reddit_submissioncomments('TwoXChromosomes')
TwoXChromosomes.head()

Number of posts scrapped from this subreddit is 919
Number of comments scrapped from this subreddit is 37573


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,fejj7u,t3_fejj7u,TwoXChromosomes,[MINI FAQ] Do I have to be a woman to particip...,1737,0,#Do I have to be a woman to participate in thi...,1583526000.0,False,19,0.95,https://www.reddit.com/r/TwoXChromosomes/comme...,2020-03-06,20:21:40
1,uh61wn,t3_uh61wn,TwoXChromosomes,RED ALERT FOR WOMEN'S RIGHTS IN THE USA: ROE V...,5631,1078,It looks like the US Supreme Court is going to...,1651545000.0,False,40,0.96,https://www.reddit.com/r/TwoXChromosomes/comme...,2022-05-03,02:27:13
2,uo9k3u,t3_uo9k3u,TwoXChromosomes,If the right can be pro forced-birth then I am...,5596,256,What? It saves the life of another person! If ...,1652385000.0,False,3,0.93,https://www.reddit.com/r/TwoXChromosomes/comme...,2022-05-12,19:55:16
3,uogq20,t3_uogq20,TwoXChromosomes,Men do not understand the anatomical realities...,1115,196,I've been kind of sour on reddit for the past ...,1652406000.0,False,1,0.92,https://www.reddit.com/r/TwoXChromosomes/comme...,2022-05-13,01:42:47
4,uo6qoe,t3_uo6qoe,TwoXChromosomes,Cops are not on our side of the abortion debate,4794,328,A friend was escorting outside of a planned Pa...,1652378000.0,False,1,0.91,https://www.reddit.com/r/TwoXChromosomes/comme...,2022-05-12,17:47:00


In [18]:
MensLib = scrap_reddit_submissioncomments('MensLib')
MensLib.head()

Number of posts scrapped from this subreddit is 827
Number of comments scrapped from this subreddit is 80476


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,u9o0jc,t3_u9o0jc,MensLib,White Privilege: what it is and what it isn't,384,92,In every conversation we have surrounding soci...,1650660000.0,False,5,0.91,https://www.reddit.com/r/MensLib/comments/u9o0...,2022-04-22,20:32:58
1,uol1z7,t3_uol1z7,MensLib,Weekly Free Talk Friday Thread!,5,8,Welcome to our weekly Free Talk Friday thread!...,1652422000.0,False,0,0.86,https://www.reddit.com/r/MensLib/comments/uol1...,2022-05-13,06:00:09
2,unz2ip,t3_unz2ip,MensLib,You are the good cause,358,28,,1652356000.0,False,1,0.97,https://nyteshadeblog.wordpress.com/2018/03/23...,2022-05-12,11:40:09
3,une854,t3_une854,MensLib,"""A major problem is that even those who expres...",1353,143,,1652288000.0,False,0,0.92,https://hbr.org/2022/04/stop-criticizing-women...,2022-05-11,16:47:26
4,umthgx,t3_umthgx,MensLib,"How Ben Got His Penis: ""Phalloplasty — the sur...",709,84,,1652219000.0,False,0,0.94,https://www.nytimes.com/2022/05/10/magazine/ph...,2022-05-10,21:38:02


In [19]:
Feminism = scrap_reddit_submissioncomments('Feminism')
Feminism.head()

Number of posts scrapped from this subreddit is 736
Number of comments scrapped from this subreddit is 6719


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,phrcrn,t3_phrcrn,Feminism,This is a comprehensive list of resources for ...,1696,136,This is a list of resources I’m compiling for...,1630761000.0,False,20,0.99,https://www.reddit.com/r/Feminism/comments/phr...,2021-09-04,13:15:02
1,ov6ctg,t3_ov6ctg,Feminism,What's the best feminist book you've read?,963,687,,1627735000.0,False,10,0.98,https://www.reddit.com/r/Feminism/comments/ov6...,2021-07-31,12:34:43
2,uone53,t3_uone53,Feminism,Overturning Roe could have major repercussions...,20,1,,1652432000.0,False,0,1.0,https://amp.cnn.com/cnn/2022/05/11/politics/ro...,2022-05-13,08:53:54
3,unyqz0,t3_unyqz0,Feminism,Canada and Mexico prepare to accept Americans ...,895,46,,1652354000.0,False,0,0.98,https://www.theguardian.com/us-news/2022/may/0...,2022-05-12,11:19:51
4,uo18l2,t3_uo18l2,Feminism,"""In terms of sexual violence, the average Indi...",471,6,,1652362000.0,False,0,0.98,https://i.redd.it/ug4k1najs1z81.png,2022-05-12,13:35:00


In [20]:
IncelTear = scrap_reddit_submissioncomments('IncelTear')
IncelTear.head()

Number of posts scrapped from this subreddit is 944
Number of comments scrapped from this subreddit is 23765


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,ulnqtu,t3_ulnqtu,IncelTear,"Weekly Advice Thread (May 09, 2022)",11,28,There's no strict limit over what types of ad...,1652090000.0,False,0,0.92,https://www.reddit.com/r/IncelTear/comments/ul...,2022-05-09,10:00:12
1,uo3o01,t3_uo3o01,IncelTear,Doesn't look right to me!,1127,143,,1652369000.0,False,0,0.99,https://i.redd.it/ga5dv6smc2z81.jpg,2022-05-12,15:27:36
2,uogf03,t3_uogf03,IncelTear,r/traditionalmuslims is an incel cesspool wher...,165,53,,1652405000.0,False,0,0.98,https://www.reddit.com/gallery/uogf03,2022-05-13,01:26:26
3,uomw53,t3_uomw53,IncelTear,Dude couldn’t accept the definition of misogyn...,15,0,,1652430000.0,False,0,0.9,https://i.redd.it/pdft5f45c7z81.jpg,2022-05-13,08:13:46
4,uo98s0,t3_uo98s0,IncelTear,Waaaaaaahhhh I'm being investigated by the pol...,130,12,,1652384000.0,False,0,0.95,https://i.redd.it/2q6osofgl3z81.png,2022-05-12,19:40:37


In [21]:
ainbow = scrap_reddit_submissioncomments('ainbow')
ainbow.head()

Number of posts scrapped from this subreddit is 512
Number of comments scrapped from this subreddit is 6778


Unnamed: 0,id,name,subreddit,title,score,num_comments,selftext,created_utc,pinned,total_awards_received,upvote_ratio,url,created date,created time
0,ujmzav,t3_ujmzav,ainbow,Regarding Abortion Rights,65,7,"Hello Folks,\n\nI feel it is important that we...",1651842000.0,False,0,0.95,https://www.reddit.com/r/ainbow/comments/ujmza...,2022-05-06,13:02:39
1,uoo85u,t3_uoo85u,ainbow,Not what Qatar promised as a host and what the...,105,8,,1652436000.0,False,0,0.99,https://i.redd.it/b0nw6swfu7z81.png,2022-05-13,09:56:36
2,uobjtz,t3_uobjtz,ainbow,"Got a haircut, feeling super invalidated rn",229,16,So I haven’t gone to a hair salon in a year or...,1652390000.0,False,0,0.97,https://www.reddit.com/r/ainbow/comments/uobjt...,2022-05-12,21:21:26
3,uoeob8,t3_uoeob8,ainbow,Gay media megathread,43,11,Recently I've come across a few people asking ...,1652400000.0,False,0,0.96,https://www.reddit.com/r/ainbow/comments/uoeob...,2022-05-12,23:51:40
4,uo2iut,t3_uo2iut,ainbow,worst hate crime I've received 😒,223,15,Ok so I've been called the f slurr my whole li...,1652366000.0,False,0,0.94,https://www.reddit.com/r/ainbow/comments/uo2iu...,2022-05-12,14:35:46
