## Contents
- [Imports](#Imports)
    - [Crediential Imports](#Credential-Imports)
- [Subreddit Submissions](#Subreddit-Submissions)
- [Functions](#Functions)
- [Uber Scraping](#Uber-Scraping)
- [Lyft Scraping](#Lyft-Scraping)

### Imports

In [1]:
import pandas as pd
import requests, json, time, datetime
import praw
import datetime

from praw.models import MoreComments

### Credential Imports

In [2]:
%ls ~/Documents/API_keys/

In [3]:
pwd

'/Users/Work/Documents/github/ga/projects/project_3'

In [4]:
creds_file = open('./reddit_creds.json', 'r')

reddit_creds = json.loads(creds_file.read())

In [5]:
reddit = praw.Reddit(
    client_id = reddit_creds['id'],
    client_secret = reddit_creds['secret'],
    username = reddit_creds['user'],
    password = reddit_creds['pass'],
    user_agent = 'aerika'
)

reddit.read_only

False

In [6]:
uber = reddit.subreddit('uber')
lyft = reddit.subreddit('Lyft')

### Subreddit Submissions

In [7]:
uber_top_news = [i for i in uber.new(limit = 1500)]
lyft_top_news = [i for i in lyft.new(limit = 1500)]

In [8]:
len(uber_top_news)

992

In [9]:
len(lyft_top_news)

995

### Functions

In [10]:
# Function to convert UTC to days passed since post

def days_since(utc_stamp):
    return ((time.time() - utc_stamp) / 60) / 1440

In [11]:
# Function to grab submission body text / cross post test / label if post is an image

def body_txt(post):
    if post.selftext == '':
        try:
            submission_id = post.url.split(sep = '/')[-3]
            return praw.models.Submission(reddit, id = submission_id).selftext
        except:
            return('Image')
        
    else:
        return post.selftext

In [12]:
# function grabs all comments in a submitted reddit post
# input is submission_id but it is 'post'
# https://praw.readthedocs.io/en/latest/tutorials/comments.html
# Daniel Kim helped me create this

def comment_grabber(post):
    
    # if there are NO comments in the submission
    if len(post.comments) == 0:
        
        # check whether the post is a cross post
        try:
                # Grab the crosspost submission ID
                crosspost_id = post.url.split(sep = '/')[-3]
                crosspost_stored_comments = praw.models.Submission(reddit, id = crosspost_id).comments

                list_comments = []

                for comment in crosspost_stored_comments:
                    if isinstance(comment, MoreComments):
                        continue
                    list_comments.append(comment.body)
                    
                return list_comments
            
        # if there are no comments in the crosspost, return "no comments"
        except:
            return ["No Comments"]
    
    
    else:
        return [comment.body for comment in post.comments]

In [13]:
# Function returns dataframe with basic meta data

def meta_data(threads):
    
    titles = [post.title for post in threads]
    
    num_comments = [post.num_comments for post in threads]
    
    time_col = [days_since(post.created_utc) for post in threads]
    
    subreddit = [post.subreddit for post in threads]
    
    body = [body_txt(post) for post in threads]
    

    threads_dict = {
        'title': titles,
        'num_comments': num_comments,
        'elapsed_time': time_col,
        'subreddit': subreddit,
        'body': body,

}
    return(pd.DataFrame(threads_dict))

### Uber Scraping

In [14]:
uber_df = meta_data(uber_top_news)

In [15]:
# Using comment_grabber function to get comments

all_uber_comments = [comment_grabber(submission_id) for submission_id in uber_top_news]

In [16]:
# Adding comment columns to dataframe

uber_df['comments'] = all_uber_comments

In [17]:
# Turning each comment into one long line of string for future NPL purposes

uber_df['comments'] = uber_df['comments'].apply(lambda x: ' '.join(x))

In [18]:
uber_df.head()

Unnamed: 0,title,num_comments,elapsed_time,subreddit,body,comments
0,I wonder what all the pax are gonna be complai...,10,0.183559,uber,Pax on reddit: \n\njust gave my self driving U...,Can't find me! \n\n\nCar smells like cigarett...
1,Error adding a payment method,0,0.337922,uber,I just got a new card as my previous was expir...,
2,Cutting back on Uber due to the drivers I rece...,12,0.37842,uber,The amount of unprofessionalism I have ran int...,If you give 3 or less you'll never get them ag...
3,Rating doesn't increase at all.,2,0.493524,uber,Hey so I was wondering why isn't my rating inc...,Average of 500 rides last 3 months.
4,"Uber scammed me, AGAIN, out of another cancel ...",1,0.56114,uber,Pax didn’t show up. I waited the full amount o...,Downvoted? Unbelievable.


In [19]:
# https://www.saltycrane.com/blog/2008/06/how-to-get-current-date-and-time-in/
# Saving file path with filename with current date
    
now = datetime.datetime.now()
now.strftime("%Y-%m-%d_%H_%M")

uber_path = "./data_csv/" + "uber" + now.strftime("%Y-%m-%d_%H_%M") + ".csv"
uber_path

'./data_csv/uber2019-10-13_14_44.csv'

In [20]:
uber_df.to_csv(uber_path,index=False)

### Lyft Scraping

In [21]:
lyft_df = meta_data(lyft_top_news)

In [22]:
# Using comment_grabber function to get comments

all_lyft_comments = [comment_grabber(submission_id) for submission_id in lyft_top_news]

In [23]:
# # Adding comment columns to dataframe

lyft_df['comments'] = all_lyft_comments

In [24]:
# Turning each comment into one long line of string for future NPL purposes

lyft_df['comments'] = lyft_df['comments'].apply(lambda x: ' '.join(x))se

In [25]:
lyft_df.head()

Unnamed: 0,title,num_comments,elapsed_time,subreddit,body,comments
0,Happen often?,12,0.188484,Lyft,"To all the drivers, from a rider, thank you! N...","Yeah even if the driver is online, if someone ..."
1,Lyft driver app thinks I'm in Chicago when I'm...,4,0.720834,Lyft,Image,This always happens to me when im doing my si...
2,I'm pissed...,8,0.737269,Lyft,Wth is going on? Am I bugged or what? Ive been...,"This is purely a demand region, it does not ta..."
3,Where is the earnings breakdown??? Why cant we...,5,0.964086,Lyft,Image,Because the new peons they are bringing on thr...
4,Where is the earnings breakdown??? Why cant we...,1,0.976725,Lyft,Image,There isn't any. U get paid for the mileage an...


In [26]:
lyft_path = "./data_csv/" + "lyft" + now.strftime("%Y-%m-%d_%H_%M") + ".csv"
lyft_path

'./data_csv/lyft2019-10-13_14_44.csv'

In [27]:
lyft_df.to_csv(lyft_path,index=False)

In [28]:
now.strftime("%Y-%m-%d_%H:%M")

'2019-10-13_14:44'