# Webscraping Reddit


<img src="https://miro.medium.com/max/640/1*3PBf6sHuFXsPec47pJ0mXQ.png" alt="logo" align="left" width=300/>


This script will use a Python wrapper for the Reddit API to scrape data from one or more subreddits. The data will initially be parsed and stored in a dictionary. As will be demonstrated, the information can be presented in more readable formats, including a [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), a data structure readily converted into HTML tables and CSV documents as well.

In [None]:
#! python3
import pandas as pd
import datetime as dt
from time import sleep
from IPython.display import display, HTML

### Getting Started with PRAW

The [Python Reddit API Wrapper (PRAW) module](https://praw.readthedocs.io/en/latest/) will need to be installed before importing it.

In [None]:
!pip install praw

Collecting praw
[?25l  Downloading https://files.pythonhosted.org/packages/2c/15/4bcc44271afce0316c73cd2ed35f951f1363a07d4d5d5440ae5eb2baad78/praw-7.1.0-py3-none-any.whl (152kB)
[K     |██▏                             | 10kB 11.8MB/s eta 0:00:01[K     |████▎                           | 20kB 16.9MB/s eta 0:00:01[K     |██████▌                         | 30kB 13.2MB/s eta 0:00:01[K     |████████▋                       | 40kB 9.9MB/s eta 0:00:01[K     |██████████▊                     | 51kB 5.5MB/s eta 0:00:01[K     |█████████████                   | 61kB 5.7MB/s eta 0:00:01[K     |███████████████                 | 71kB 6.3MB/s eta 0:00:01[K     |█████████████████▎              | 81kB 6.3MB/s eta 0:00:01[K     |███████████████████▍            | 92kB 6.7MB/s eta 0:00:01[K     |█████████████████████▌          | 102kB 6.9MB/s eta 0:00:01[K     |███████████████████████▊        | 112kB 6.9MB/s eta 0:00:01[K     |█████████████████████████▉      | 122kB 6.9MB/s eta 0:00:0

In [None]:
import praw

## Register application with Reddit

Create app at https://www.reddit.com/prefs/apps/ and obtain authentication information.

In [None]:
from getpass import getpass

user = getpass('Enter user')
password = getpass('Enter password')
app_id = getpass('Enter app_id')
app_secret = getpass('Enter app_secret')


Create a praw.Reddit instance and assign it to a variable.

In [None]:
reddit = praw.Reddit(
    user_agent="CS Resource Bot (by u/user)",
    client_id=app_id,
    client_secret=app_secret,
    username=user,
    password=password
)

## Monitor submission stream and collect data.

With the praw.Reddit instance we can utilize all of the functions. For example, we can monitor the /r/learnprogramming subreddit and filter for the top 50 "hottest" posts from the last week.

In [None]:
subreddit = reddit.subreddit("learnprogramming")

Scrape the data from Reddit API and populate the dictionary

In [54]:
top_week_dict = { "id": [],
                 "title": [],
                 "score": [],
                 "subreddit": [],
                 "url": [],
                 "url_csv": [],
                 "num_comms": [],
                 "created": [],
                 "body": []
                 }

In [None]:
print("ID\t|score\t|\tdate\t     |\t  subreddit\t |\ttitle \t\t\t\t\t\t\t| \t\t\turl")
for submission in subreddit.top("week", limit=50):
  sleep(1)
  top_week_dict["id"].append(submission.id)
  top_week_dict["created"].append(dt.datetime.fromtimestamp(submission.created))
  top_week_dict["subreddit"].append(submission.subreddit)
  top_week_dict["title"].append(submission.title)
  top_week_dict["body"].append(submission.selftext)
  top_week_dict["score"].append(submission.score)
  top_week_dict["num_comms"].append(submission.num_comments)
  top_week_dict["url"].append("""<a href='{0}' target="_blank">{1}</a>""".format(submission.url, submission.url[8:]))
  top_week_dict["url_csv"].append(submission.url[8:])
  print(submission.id + " | " + str(submission.score) + " | " + str(dt.datetime.fromtimestamp(submission.created)) + "  | "  +  str(submission.subreddit) + " | " + submission.title + " | " + submission.url[8:])


## Create DataFrame

In [60]:
df1 = pd.DataFrame(top_week_dict,columns=['id', 'title', 'score', 'subreddit', 'url', 'num_comms', 'created', 'body'])
df2 = pd.DataFrame(top_week_dict,columns=['id', 'title', 'score', 'subreddit', 'url_csv', 'num_comms', 'created', 'body'])

In [61]:
display(df2)

Unnamed: 0,id,title,score,subreddit,url_csv,num_comms,created,body
0,ktfpfx,Use books instead of brief tutorials to learn ...,1834,learnprogramming,www.reddit.com/r/learnprogramming/comments/ktf...,307,2021-01-09 08:22:57,Fundamental and broad knowledge (which is impo...
1,kwhgia,"I help inmates get resources behind bars, and ...",1524,learnprogramming,www.reddit.com/r/learnprogramming/comments/kwh...,159,2021-01-13 22:27:55,The book would have to show examples of the co...
2,kvrk05,"How to answer ""Tell me about yourself"" in an i...",1312,learnprogramming,www.reddit.com/r/learnprogramming/comments/kvr...,185,2021-01-12 21:22:26,So I recently got a call from a company for wh...
3,kv8u8u,I finally made a completed app in c++,1136,learnprogramming,www.reddit.com/r/learnprogramming/comments/kv8...,105,2021-01-12 02:37:32,First off I am only here to show off my projec...
4,kue1ti,Is project based learning more effective than ...,818,learnprogramming,www.reddit.com/r/learnprogramming/comments/kue...,115,2021-01-10 21:00:08,I take notes as a simple reference to look bac...
5,kt1rml,Where do you write in html?,634,learnprogramming,www.reddit.com/r/learnprogramming/comments/kt1...,354,2021-01-08 20:44:59,"So i cant get a straight answer, i need the sp..."
6,kvnfgi,I did it guys!,614,learnprogramming,www.reddit.com/r/learnprogramming/comments/kvn...,39,2021-01-12 16:03:13,"In this sub are so many people, who want to ch..."
7,kuw7mu,What type of person do you recommend be a prog...,498,learnprogramming,www.reddit.com/r/learnprogramming/comments/kuw...,254,2021-01-11 13:32:52,"I start Harvard’s CS50 online course tomorrow,..."
8,ku2e59,What is the likely-hood of a beginner getting ...,483,learnprogramming,www.reddit.com/r/learnprogramming/comments/ku2...,91,2021-01-10 07:42:46,Hi! I'm currently a freshman majoring in CS. ...
9,kwpt6f,MIT Introduction to Computer Science and Progr...,591,learnprogramming,www.reddit.com/r/learnprogramming/comments/kwp...,35,2021-01-14 05:01:24,MIT's popular Python course is open for enroll...


## Render DataFrame as HTML

In [62]:
html = df1.to_html(escape=False)

In [63]:
HTML(html)

Unnamed: 0,id,title,score,subreddit,url,num_comms,created,body
0,ktfpfx,Use books instead of brief tutorials to learn programming,1834,learnprogramming,www.reddit.com/r/learnprogramming/comments/ktfpfx/use_books_instead_of_brief_tutorials_to_learn/,307,2021-01-09 08:22:57,"Fundamental and broad knowledge (which is important in programming) can only be gained from books. Tutorials (text/video) are more like cookbooks that will taught something particular and are good if used as a supplementation to a books. Also book can be used later as a reference were you can quickly look for a topic that you are interested in. If you have never program before be sure to pick a book that is intended for people that never have programed before. \n\nAlso its is important to write your code in parallel with book. Just anything, practice is very important.\n\nGood luck :)"
1,kwhgia,"I help inmates get resources behind bars, and I'm looking for a good coding book you think would be the most instructive for people who don't have access to computers",1524,learnprogramming,www.reddit.com/r/learnprogramming/comments/kwhgia/i_help_inmates_get_resources_behind_bars_and_im/,159,2021-01-13 22:27:55,"The book would have to show examples of the code (because they don't have computers), and be written in simple enough terms for regular people to understand.\n\nAny ideas?\n\nEDIT: thank you for offering your advice. Someone mentioned learn level [https://towardsdatascience.com/how-to-teach-programming-to-people-in-prison-without-computers-c455baca7f19](https://towardsdatascience.com/how-to-teach-programming-to-people-in-prison-without-computers-c455baca7f19)\n\nI spoke with the director / co-founder there after ya'll mentioned it and if you want to help in this cause generally, please visit their website they are producing programming material that's boiled down in simple terms and easy to learn without computers [https://learnlevel.org/](https://learnlevel.org/)"
2,kvrk05,"How to answer ""Tell me about yourself"" in an interview call",1312,learnprogramming,www.reddit.com/r/learnprogramming/comments/kvrk05/how_to_answer_tell_me_about_yourself_in_an/,185,2021-01-12 21:22:26,"So I recently got a call from a company for which I had applied for a React.js intern position and the person after exchanging greetings asked me *""Okay tell me about yourself""*. I didnt really know how to respond other than - just telling my name, where I live and telling I mainly work with React.js and thats it - I went blank . It was my first time actually getting a call from a company and I don't think it was impressive.\n\nDoes anyone have any ideas or good tips on how I should respond to such a question (assuming that the person asking has already taken a look at your Resume) ??\n\nThanks in advance.\n\nEDIT 1: Goddamn it people of Reddit, you guys are providing such great pointers that I had no idea even existed. I am definitely gonna write down some of these in my notebook for my next interview. Huge thanks to all - Keep'em coming !\n\nEDIT 2: OKAY !! so this question kind of blew up - didnt expect this much response. It made me realise that many people actually go through the same shit as I do. I really hope its gonna help people get over the nervousness of answering this question."
3,kv8u8u,I finally made a completed app in c++,1136,learnprogramming,www.reddit.com/r/learnprogramming/comments/kv8u8u/i_finally_made_a_completed_app_in_c/,105,2021-01-12 02:37:32,First off I am only here to show off my project so if you care keep reading lol.\n\nSo I am 15 and having been programming in c++ for a while now and I have started many projects however I rarely see them through to the end and even then have never been confidant in the final product. I finally built something cool that is finished and here it is on [github](https://github.com/ultimategamer309/Mass-Mailer). It is a gui based app built off of mailguns api to send email in mass. I was hoping to provide a default server and key in it but apparently I was banned on mailgun. Hopefully in the near future I can get this running on plain stmp however I would have to own a server. Feel free to post my code in r/programminghorror or r/badcode as long as you link it in the comments so i can learn lol.
4,kue1ti,Is project based learning more effective than taking notes??,818,learnprogramming,www.reddit.com/r/learnprogramming/comments/kue1ti/is_project_based_learning_more_effective_than/,115,2021-01-10 21:00:08,"I take notes as a simple reference to look back on, but I've recently just been too lathargec and tired to write anything down.\nI do remember what I've been learning (kind of) and I put that into projects to solidify my knowledge I'm just worried that maybe taking notes is a good way to structure knowledge since I can actually physically see my diagrams and text instead of having to visualize it in my mind. I feel as though I could visualize in my mind easier if I draw the diagram down and reference it\n\n(I understand that just straight visualizing would strengthen my brain, but what if I got some information wrong?? I'd have to to go searching on Google for the right awnser).\n\nJust wanted other people thoughts on the matter. Lmk if completly ditching note taking has worked for you."
5,kt1rml,Where do you write in html?,634,learnprogramming,www.reddit.com/r/learnprogramming/comments/kt1rml/where_do_you_write_in_html/,354,2021-01-08 20:44:59,"So i cant get a straight answer, i need the space in which i can write everything in order to make a website. Im sorry if this is a dumb question, im brand new to this (started codecademy yesterday)\n\nOkay, so, thank you all so much for the help, there is so much to learn ive got VSCode downloaded but am kinda scared to open it, it looks daunting as anything."
6,kvnfgi,I did it guys!,614,learnprogramming,www.reddit.com/r/learnprogramming/comments/kvnfgi/i_did_it_guys/,39,2021-01-12 16:03:13,"In this sub are so many people, who want to change there job and so many storys of people who made it, so i thought i write my own story here. Sorry for my english btw.\n\nI always wanted to work as a programmer and today I made a big step for my hopefully new job.\n\nI am currently working as a network engineer. But in my company that means you mostly do first and second level suppport. That's soooo fucking boring. 95% of the time you don't have to use your brain. Most of the time people just delete the internet and you have to put the wifi cable back in the computer. At the end it made me sick, so I decided to try something.\n\nSo after struggling very long and not knowing what to do... today I asked my boss, if i could work in another team as a programmer. He was totally fine with it and wants to help me changing teams asap. I thnik he was more afraid, that i leave the company, so it was nice for him, that i just want to change teams. But i'm so happy right now.\n\nOverall i'm still a bit afraid. I had many programming courses in college, but i never did a real project. I understand basics of programming and databases and i'm able to write simple things. I really hope thats enough for the first part.\n\nWish me luck guys!"
7,kuw7mu,What type of person do you recommend be a programmer?,498,learnprogramming,www.reddit.com/r/learnprogramming/comments/kuw7mu/what_type_of_person_do_you_recommend_be_a/,254,2021-01-11 13:32:52,"I start Harvard’s CS50 online course tomorrow, and want to dedicate my full time to it so I can begin to transition to programming. It’s currently free but you can pay $200 to receive a certificate of completion (thought it would be good to add to a resume and after completion I would begin a coding boot camp). My problem is I don’t want to dedicate months of my time and then realize that I don’t enjoy it at all. What are some things about learning coding that most people don’t realize (if there is any) and pros/cons to being a programmer?"
8,ku2e59,What is the likely-hood of a beginner getting an internship?,483,learnprogramming,www.reddit.com/r/learnprogramming/comments/ku2e59/what_is_the_likelyhood_of_a_beginner_getting_an/,91,2021-01-10 07:42:46,"Hi! I'm currently a freshman majoring in CS. I have beginners knowledge in Python and learned HTML&CSS over the winter break. At this point, should I even be trying to apply for internships or focus on improving my knowledge instead?"
9,kwpt6f,MIT Introduction to Computer Science and Programming Using Python starts on January 27th 2021,591,learnprogramming,www.reddit.com/r/learnprogramming/comments/kwpt6f/mit_introduction_to_computer_science_and/,35,2021-01-14 05:01:24,"MIT's popular Python course is open for enrollment. (learn Python 3.5). Over million people have taken this course, designed to help people with no prior exposure to computer science or programming learn to think computationally and write programs to tackle useful problems. Join for free. \n\- Credit to a post a year ago who mentioned it when it occured last year, just copied and pasted his tl;dr ([https://www.reddit.com/r/learnprogramming/comments/bk9zrc/mits\_introduction\_to\_computer\_science\_and/](https://www.reddit.com/r/learnprogramming/comments/bk9zrc/mits_introduction_to_computer_science_and/)) \n\n\n[https://www.edx.org/course/introduction-to-computer-science-and-programming-7](https://www.edx.org/course/introduction-to-computer-science-and-programming-7)"


Write HTML to file

In [64]:
with open("index.html", "w") as f:
  f.write("""
  <style>
    tr:nth-child(even) {background-color: #f2f2f2;} 
  </style>""")
  f.write(html)

## Export to CSV

In [65]:
df2.to_csv('v1.csv', index=False)