# Use PRAW to scrape Reddit Data

<p>Reddit has very powerful APIs, and PRAW is a wonderful reddit API wrapper in Python. We could use PRAW to get all submissions of a specific subreddit and scraping it. In fact, most posts are images or vedios or url, and do not have a text included. To do analysis, we should focus on the posts including text. Also, the title, author and the post time should be scraped. </p>
<p>Here, we use subreddit 'oddlydepressing' as an example. The process could be divided into following two steps:<p/>
<li> Screen out all submissions including text, and get their title, text, and author.</li>
<li> For each author, get all his submissions. This corpus could be used to predict the age of the author</li>

In [1]:
import praw
user_agent = ("Mental Health 1.0 by /u/kakakuo ")
r = praw.Reddit(user_agent=user_agent)
subreddit_name = 'oddlydepressing'
# get all submissions of subreddit 
submissions = r.get_subreddit(subreddit_name).get_top_from_all(limit = 1000)
# go through all submissions
submit_list = []
for submission in submissions:
    if submission.selftext:
        tmp = [submission.title,submission.selftext,submission.author]
        submit_list.append(tmp)
# get the author name and the posts of the author
for submission in submit_list:
    author = submission.pop()
    if author:
        submission.append(author._case_name)
        corpus = ""
        user_submits = author.get_submitted()
        for user_submit in user_submits:
            corpus += user_submit.selftext.strip()
            corpus += " "
        submission.append(corpus)
    else:
        submission.append("")
        submission.append("")

Here is the result. There are only 6 valids in this subreddit.

In [2]:
for (N,submit) in enumerate(submit_list):
    print "post "+str(N+1) +":"
    print "title: " + submit[0]
    print "text: " + submit[1]
    print "author: " + submit[2]
    print "\n"

post 1:
title: Hello?
text: Is anyone out there?
author: teamrocketgruntjosh


post 2:
title: Should /r/oddlydepressing join the Reddit Blackout? Let your voices be heard!
text: To the 175 subscribers of this subreddit, should we do it? I hope you guys won't be oddly depressed about it.. But then that's what we're here for. 
author: callmeWia


post 3:
title: I don't have many people to call "friends" or "best friends"
text: They'd have a sleepover with a group of people and not invite me. Time to be a hikikomori now... Jk I'm not Japanese 
author: 


post 4:
title: Where is everyone? :(
text: Come say hi! 
author: callmeWia


post 5:
title: Random sadness..
text: I feel sadness right now and I don't know why. I started crying for no reason. What's wrong with me? 
author: netabell


post 6:
title: Helpless
text: I'm 22 years old 5 kids I'm going crazy I feel like im all for me nobody cares about me but me I don't feel love I'm hopeless I so shame of myself I don't think god don't even 

Here is the corpus for all posts

In [3]:
for (N,submit) in enumerate(submit_list):
    print "post "+str(N+1) +":"
    print submit[-1]
    print "\n"

post 1:
 I know there's one for Spanish speakers, I'm just wondering if there's been any sort of discussion on developing one for English speakers.  Just looking for some new blogs to follow and curious to see what everyone here likes.   http://imgur.com/wjGMwsR

I thought it was an interesting little bug, does anyone else see this?  [removed]   #ZenithMC

*Can you and your faction members make it to the top?*

ZenithMC is the newest Factions server on the block! We were founded with a simple goal; to provide a fun & friendly Factions experience! Whether you enjoy building, PvPing, buying & selling, or more, ZenithMC is the perfect choice!

----------------------------------------------------------------------------------------------------------------------------------------------------
#Server Address:  zenithmc.net

#Subreddit: /r/zenithmc

#Website: [ZenithMC](http://www.zenithmc.net)

--------------------------------------------------------------------------------------------------