## Project 3: Webscraping subreddit r/Dreams 

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Project-3:-Webscraping-subreddit-r/Dreams" data-toc-modified-id="Project-3:-Webscraping-subreddit-r/Dreams-1">Project 3: Webscraping subreddit r/Dreams</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Import-Libraries" data-toc-modified-id="Import-Libraries-1.0.1">Import Libraries</a></span></li></ul></li><li><span><a href="#Step-1:-Create-Reddit-instance" data-toc-modified-id="Step-1:-Create-Reddit-instance-1.1">Step 1: Create Reddit instance</a></span></li><li><span><a href="#Step-2:-Scrape-the-URL" data-toc-modified-id="Step-2:-Scrape-the-URL-1.2">Step 2: Scrape the URL</a></span></li><li><span><a href="#Step-3:-Create-a-pandas-DataFrame-from-list-of-subreddit-posts¬∂" data-toc-modified-id="Step-3:-Create-a-pandas-DataFrame-from-list-of-subreddit-posts¬∂-1.3">Step 3: Create a pandas DataFrame from list of subreddit posts¬∂</a></span></li><li><span><a href="#Step-4:-Export-data-to-csv" data-toc-modified-id="Step-4:-Export-data-to-csv-1.4">Step 4: Export data to csv</a></span></li></ul></li></ul></div>

Intro to Notebook 1

This notebook (Notebook 1) will display a step-by-step process for scraping subreddit r/Dreams. The scrape will be for 1000 posts that contain text which will be analyzed and later applied to classification models in the notebooks to follow.

#### Import Libraries

In [3]:
# Allows HTTP/1.1 (URL) requests to so users can add content from sites
import requests
# For data manipulation
import pandas as pd
# Ability to save file to be exported and use
import csv
# Easy to use for scraping Reddit data
import praw

### Step 1: Create Reddit instance

Before any data can be scraped, users need to be authenticated. In order to do this, a Reddit instance must be created.

1. Create Reddit app here: https://www.reddit.com/prefs/apps
2. After pressing "create app", the authentification information needed to create the praw. Reddit instance will be provided.

In [4]:
# Input values client_id, client_secret, user_agent which can be found after "create app" action
reddit = praw.Reddit(client_id='inQ3U2b00SevdQ', client_secret='mWYaNvAOx-0ilXOfMV82nHZ7R3U',user_agent='reddit scrape')

### Step 2: Scrape the URL

In [6]:
# Start with an empty list
posts = [] 
# Scrape subreddit r/Dreams
dreams_subreddit = reddit.subreddit('Dreams')

# Grab 'hot' 1000 posts from subreddit r/Dreams
for post in dreams_subreddit.hot(limit=1000): 
    posts.append([post.title, post.score, post.id, post.subreddit, post.url, post.num_comments, post.selftext, post.created])
# Input the names that will scraped and eventually become columns in dataframe
posts = pd.DataFrame(posts,columns=['title', 'score', 'id', 'subreddit', 'url', 'num_comments', 'body', 'created'])
# Print 'posts' to confirm data successfully scraped
print(posts)

                                                 title  score      id  \
0                  Just dreamt of the end of the world    135  dhnta6   
1    I was accused of blugeoning Beyonce to death i...      9  dhqqyx   
2                           Morgan Freeman food advice      8  dhr09c   
3                              Violating the nightmare      3  dhucwz   
4    Dreamt was holding my rapist and crying and ap...      2  dhuokx   
..                                                 ...    ...     ...   
992                                                  üèû      1  dbxj1t   
993  Saw this fella a while ago when I had a sleep ...    125  dbe0bc   
994                    Dreams about a Lost Friendship.      1  dbx8xo   
995  Tooth (just one) falling out, scared when woke up      1  dbx6g4   
996               Four ways, murder, and demons oh my!      3  dbs1q3   

    subreddit                                                url  \
0      Dreams  https://www.reddit.com/r/Dreams/comme

Observations: 997 rows is close to 1000 so 997 number will suffice in what is needed for analysis. 

### Step 3: Create a pandas DataFrame from list of subreddit posts¬∂

In [7]:
# Save scraped data of dreams subreddit to new dataframe
df_dreams = pd.DataFrame(posts)
# Print dataframe to confirm it was successfully created
df_dreams

Unnamed: 0,title,score,id,subreddit,url,num_comments,body,created
0,Just dreamt of the end of the world,135,dhnta6,Dreams,https://www.reddit.com/r/Dreams/comments/dhnta...,18,I just woke from an insanely realistic dream a...,1.571072e+09
1,I was accused of blugeoning Beyonce to death i...,9,dhqqyx,Dreams,https://www.reddit.com/r/Dreams/comments/dhqqy...,1,"I was in my local Kroger getting groceries, an...",1.571089e+09
2,Morgan Freeman food advice,8,dhr09c,Dreams,https://www.reddit.com/r/Dreams/comments/dhr09...,0,Part of my dream this morning had Morgan Freem...,1.571091e+09
3,Violating the nightmare,3,dhucwz,Dreams,https://www.reddit.com/r/Dreams/comments/dhucw...,0,Well uh.. where to begin...\n\nI found this su...,1.571105e+09
4,Dreamt was holding my rapist and crying and ap...,2,dhuokx,Dreams,https://www.reddit.com/r/Dreams/comments/dhuok...,0,I don't understand.. I was finally getting bet...,1.571106e+09
...,...,...,...,...,...,...,...,...
992,üèû,1,dbxj1t,Dreams,https://i.redd.it/v3xatojx1zp31.jpg,0,,1.569984e+09
993,Saw this fella a while ago when I had a sleep ...,125,dbe0bc,Dreams,https://i.redd.it/u7untosobrp31.jpg,11,,1.569889e+09
994,Dreams about a Lost Friendship.,1,dbx8xo,Dreams,https://www.reddit.com/r/Dreams/comments/dbx8x...,0,"A little over a year ago, my relationship with...",1.569982e+09
995,"Tooth (just one) falling out, scared when woke up",1,dbx6g4,Dreams,https://www.reddit.com/r/Dreams/comments/dbx6g...,4,Hi everyone... This dream I had last night and...,1.569982e+09


### Step 4: Export data to csv
Save scraped data to the processed directory to be read and cleaned for further analysis in Notebook 3.

In [8]:
# Export dreams dataframe file
df_dreams.to_csv(r'./subreddit_Dreams.csv')