# Exploring Reddit API

---

Let's get the hang of using Reddit's API by following [Shropshire's article](https://towardsdatascience.com/exploring-reddits-ask-me-anything-using-the-praw-api-wrapper-129cf64c5d65).


Step 1: Install or Update PRAW in your Terminal
- Check.

---

Step 2: Create and/or Login to Your Reddit Account to begin Authenticating via OAuth
- Check.

---

## Libraries

In [1]:
import os             # file system stuff
import json           # digest json
import praw           # reddit API
import pandas as pd   # Dataframes
import pymongo        # MongoDB

In [2]:
import helper     # Custom helper functions

---

### Load dem keys

Step 3: Create your first Authorized Reddit Instance

In [11]:
# Define path to secret
secret_path = os.path.join(os.environ['HOME'], 'mia/.secret', 'reddit_api.json')

In [12]:
keys = helper.get_keys(secret_path)

In [13]:
reddit = praw.Reddit(client_id=keys['client_id'] 
                     ,client_secret=keys['api_key']
                     ,username=keys['username']
                     ,password=keys['password']
                     ,user_agent='reddit_research accessAPI:v0.0.1 (by /u/FlatDubs)')

---

Step 4: Obtain a Subreddit Instance from your Reddit Instance

In [14]:
subreddit = reddit.subreddit('politics')

In [15]:
print(subreddit.display_name)  # Output: politics
print(subreddit.title)         
print(subreddit.description)
print(subreddit.subscribers)

politics
Politics
## **Welcome to /r/Politics! Please read [the wiki](/r/politics/w/index) before participating.** ||  [Voter Registration Resources](https://www.reddit.com/r/politics/comments/9irkg3/rpolitics_wants_you_to_register_to_vote_for/)
/r/politics is the subreddit for current and explicitly political U.S. news.

### [Our full rules](https://www.reddit.com/r/politics/wiki/index) [Reddiquette](https://www.reddit.com/wiki/reddiquette)

# [Comment Guidelines](/r/politics/w/index#wiki_be_civil):

 ||
:-:|:-:
Be civil|Treat others with basic decency. No personal attacks, shill accusations, hate-speech, flaming, baiting, trolling, witch-hunting, or unsubstantiated accusations. Threats of violence will result in a ban. [More Info.](/r/politics/wiki/index#wiki_be_civil)
Do not post users' personal information.|Users who violate this rule will be banned on sight. Witch-hunting and giving out private personal details of other people can result in unexpected and potentially serious conse

---

Step 5: Obtain a Submission Instance from your Subreddit Instance

In [72]:
master_list = ['kamala']
def add_term(term, start_term):
    master_string = start_term + or + term
    return master_string

SyntaxError: invalid syntax (<ipython-input-72-7acf908e251a>, line 3)

In [73]:
for submission in subreddit.search('kamala', sort='comments'): #union of the search terms
    print(submission.title)  # Output: the submission's title
    print(submission.score)  # Output: the submission's upvotes
    print(submission.id)     # Output: the submission's ID
    print(submission.url)    # Output: the URL

Megathread: AG Willam Barr releases his top line summary of the Mueller report
32750
b50gkr
https://www.reddit.com/r/politics/comments/b50gkr/megathread_ag_willam_barr_releases_his_top_line/
Megathread: President Trump delivers remarks on Charlottesville during Press Conference
39625
6tx8h7
https://www.reddit.com/r/politics/comments/6tx8h7/megathread_president_trump_delivers_remarks_on/
Megathread: Likely Explosive Devices Addressed to Multiple Political Figures, Suspect in Custody
15836
9rlm9p
https://www.reddit.com/r/politics/comments/9rlm9p/megathread_likely_explosive_devices_addressed_to/
Megathread: President Trump announces a deal to temporarily reopen the government for three weeks
44550
ajsubi
https://www.reddit.com/r/politics/comments/ajsubi/megathread_president_trump_announces_a_deal_to/
[Megathread] President Trump’s Address on Border Security and the Democratic Response (Part 2)
4327
ae2e7b
https://www.reddit.com/r/politics/comments/ae2e7b/megathread_president_trumps_addres

---

Step 6: Create a Pandas DataFrame of Basic Submission Stats Taken From the Subreddit

In [74]:
# Compile submission into list
title = []
time = []
num_upvotes = []
num_comments = []
upvote_ratio = []
link_flair = []
redditor = []
body = []
i=0

for submission in subreddit.top(limit=5):
    i+=1
    title.append(submission.title)
    time.append(submission.created_utc)
    num_upvotes.append(submission.score)
    num_comments.append(submission.num_comments)
    upvote_ratio.append(submission.upvote_ratio)
    link_flair.append(submission.link_flair_text)
    redditor.append(submission.author)
    body.append(submission.selftext)
    if i%5 == 0:
        print(f'{i} submissions completed')

5 submissions completed


In [75]:
df = pd.DataFrame(
    {'title': title,
     'time': time,
     'num_comments': num_comments,
     'num_upvotes': num_upvotes,
     'upvote_ratio': upvote_ratio,
     'link_flair': link_flair,
     'redditor': redditor
     ,'body': body
    })
df.head(10)

Unnamed: 0,title,time,num_comments,num_upvotes,upvote_ratio,link_flair,redditor,body
0,"Kim Davis, clerk who refused to sign marriage ...",1541553000.0,2769,101995,0.83,,FuegoFerdinand,
1,Trump revealed highly classified information t...,1494882000.0,20858,99356,0.79,,SQUEEEEEEEEEPS,
2,"Trump Ordered Mueller Fired, but Backed Off Wh...",1516929000.0,14547,95196,0.85,,71tsiser,
3,Ivanka Trump used a personal email account to ...,1542671000.0,6518,89837,0.83,,Bloodbath-McGrath,
4,A petition calling for FCC Chairman Ajit Pai t...,1511730000.0,1945,88281,0.9,,mixplate,


---

### Can we put this in a MongoDB?

Instantiate MongoDB

In [None]:
# Mongo Prep
mc = pymongo.MongoClient(host='localhost', port=27017)
db = mc['got']
coll = db['test_collection']

In [None]:
# Initialize
i = 0
topics = []

for submission in subreddit.top(limit=50):
    i+=1
    topics.append({
                   'title': submission.title
                    ,'time': submission.created_utc
                    ,'num_upvotes': submission.score
                    ,'num_comments': submission.num_comments
                    ,'upvote_ratio': submission.upvote_ratio
                    ,'link_flair': submission.link_flair_text
#                     ,'redditor': submission.author
                    ,'body': submission.selftext
                 })
#    topics_dict['title'].append(submission.title)
#     time.append(submission.created_utc)
#     num_upvotes.append(submission.score)
#     num_comments.append(submission.num_comments)
#     upvote_ratio.append(submission.upvote_ratio)
#     link_flair.append(submission.link_flair_text)
#     redditor.append(submission.author)
#     body.append(submission.selftext)

    if i%5 == 0:
        print(f'{i} submissions completed')

In [None]:
topics

Try inserting into collection.

In [None]:
coll.insert_many(topics)

Yay! it worked.

### Can we put this in a MongoDB...in the cloud!?

In [None]:
# Set up connection string
mongo_user = 'werlindo'
mongo_pw = 'dsaf040119'

In [None]:
#cli = pymongo.MongoClient('mongodb+srv://werlindo:dsaf040119@dsaf-oy1s0.mongodb.net/test?retryWrites=true')

In [None]:
# Initialize
i = 0
topics = []

for submission in subreddit.top(limit=50):
    i+=1
    topics.append({
                   'title': submission.title
                    ,'time': submission.created_utc
                    ,'num_upvotes': submission.score
                    ,'num_comments': submission.num_comments
                    ,'upvote_ratio': submission.upvote_ratio
                    ,'link_flair': submission.link_flair_text
#                     ,'redditor': submission.author
                    ,'body': submission.selftext
                 })
#    topics_dict['title'].append(submission.title)
#     time.append(submission.created_utc)
#     num_upvotes.append(submission.score)
#     num_comments.append(submission.num_comments)
#     upvote_ratio.append(submission.upvote_ratio)
#     link_flair.append(submission.link_flair_text)
#     redditor.append(submission.author)
#     body.append(submission.selftext)

    if i%5 == 0:
        print(f'{i} submissions completed')

In [None]:
# Instantiate client
client = pymongo.MongoClient("mongodb+srv://" + mongo_user + ":" 
                         + mongo_pw 
                         + "@dsaf-oy1s0.mongodb.net/test?retryWrites=true")


In [None]:
db = client['got']
coll = db['test_collection']

In [None]:
coll.delete_many({})

In [None]:
coll.insert_many(topics)

In [None]:
# Look at DB names
cur = client.list_databases()

for item in cur:
    print(item)

In [None]:
# Look at everything in our collection!
cur = coll.find({})

for item in cur:
    print(item)

---
## What if we just dump the entire submission into a dataframe?

In [None]:
def serialize(post):
    """
    https://www.reddit.com/r/redditdev/comments/90bdr4/subreddit_sentiment_analysis/
    posted by f_k_a_g_n
    """
    
    """Helper function for converting PRAW objects to python dictionary"""
    result = {}
    for k, v in post.__dict__.items():
        if k.startswith('_'):
            continue
        if k in {'author', 'subreddit'}:
            result[k] = str(v)
            continue
        if v is None:
            continue
        result[k] = v
    return result

In [None]:
submissions = subreddit.top(limit=10)

# load into pandas
subs = pd.DataFrame(serialize(post) for post in submissions)

# change the `created_utc` column to a datetime object
subs['created_utc'] = pd.to_datetime(subs.created_utc, unit='s')

In [None]:
subs.head()

Works, but I don't know if I like it.

How many docs in this here coll?

In [None]:
coll.count_documents({})

## How about we try to get all the comments?

In [None]:
submissions = subreddit.top(limit=2)

In [36]:
for sub in submissions:
    sub.comments.replace_more(limit=None)
    for comment in sub.comments.list():
        print(comment.body)
        print('what')

NameError: name 'submissions' is not defined

---
OK that sucks. Let's try the example from the [docs]() first.

In [76]:
# Here's a thread (is that even the right term?)
# https://www.reddit.com/r/gameofthrones/comments/bqa2qd/spoilers_live_episode_discussion_season_8_episode/
submission = reddit.submission(id='bqa2qd')

In [77]:
# Instantiate list to hold comments
test_comments = []
comments_dicts = []

submission.comments.replace_more(limit=5)
for comment in submission.comments.list()[:100]:
#     print(comment.body)
    # List of comments, as strings
    test_comments.append(comment.body)

    # List of comments (dicts)
    comments_dicts.append({
        'comment': comment.body
    })
    

### remove Neutrals?

## how to attribute document to character?

# Can we tie these comment to likes or upvotes?

In [78]:
# Check 
test_comments[:10]

["They don't want to make a new Iron Throne so they are gonna use the guy whos always in a chair already.",
 'Jaime Lannister: *Banged the coolest knight ever, Brienne of Tarth, just to leave her for that fugly bitch Cersei.* \n\n*Signed,\nBrienne*',
 'How are they supposed to deal with Drogon if Jon kills Dannny?\n\nDrogon: \\*literally fucks off\\*\n\n&#x200B;\n\nOh... that was easy',
 'Who knew Drogon would have the greatest character development out of them all',
 'Jon: How the fuck did you get up here Arya',
 'How the hell are there that many Dothraki still alive',
 'Anyone else wondering why Bran wants a master of whispers when he IS the master of whispers? Like he could say “a little bird told me” and mean it literally.',
 'what is the Nights Watch watching for now exactly?? They basically just sent Jon to summer camp',
 "So like...what are the Dothraki gonna do now? They don't seem like the farming type",
 'Gawd Emilia Clarke really knows how to deliver a war speech in a foreig

In [79]:
# Check
comments_dicts[:5]

[{'comment': "They don't want to make a new Iron Throne so they are gonna use the guy whos always in a chair already."},
 {'comment': 'Jaime Lannister: *Banged the coolest knight ever, Brienne of Tarth, just to leave her for that fugly bitch Cersei.* \n\n*Signed,\nBrienne*'},
 {'comment': 'How are they supposed to deal with Drogon if Jon kills Dannny?\n\nDrogon: \\*literally fucks off\\*\n\n&#x200B;\n\nOh... that was easy'},
 {'comment': 'Who knew Drogon would have the greatest character development out of them all'},
 {'comment': 'Jon: How the fuck did you get up here Arya'}]

In [60]:
# Put it in a dataframe, as POC
test_df = pd.DataFrame(test_comments, columns=['comment'])

test_df.head()

Unnamed: 0,comment
0,They don't want to make a new Iron Throne so t...
1,Jaime Lannister: *Banged the coolest knight ev...
2,How are they supposed to deal with Drogon if J...
3,Who knew Drogon would have the greatest charac...
4,Jon: How the fuck did you get up here Arya


## How about some Vader Sentiment Action?

In [25]:
from vaderSentiment import vaderSentiment

In [26]:
analyser = vaderSentiment.SentimentIntensityAnalyzer()

In [33]:
for comment in test_comments:
    print(comment)
    print(analyser.polarity_scores(comment))

They don't want to make a new Iron Throne so they are gonna use the guy whos always in a chair already.
{'neg': 0.06, 'neu': 0.94, 'pos': 0.0, 'compound': -0.0572}
Jaime Lannister: *Banged the coolest knight ever, Brienne of Tarth, just to leave her for that fugly bitch Cersei.* 

*Signed,
Brienne*
{'neg': 0.208, 'neu': 0.792, 'pos': 0.0, 'compound': -0.6124}
How are they supposed to deal with Drogon if Jon kills Dannny?

Drogon: \*literally fucks off\*

&#x200B;

Oh... that was easy
{'neg': 0.24, 'neu': 0.655, 'pos': 0.105, 'compound': -0.5719}
Who knew Drogon would have the greatest character development out of them all
{'neg': 0.0, 'neu': 0.741, 'pos': 0.259, 'compound': 0.6369}
Jon: How the fuck did you get up here Arya
{'neg': 0.28, 'neu': 0.72, 'pos': 0.0, 'compound': -0.5423}
How the hell are there that many Dothraki still alive
{'neg': 0.303, 'neu': 0.526, 'pos': 0.171, 'compound': -0.4588}
Anyone else wondering why Bran wants a master of whispers when he IS the master of whisp

### Let's save it to MongoDB Atlas!

In [61]:
# Set up connection string
mongo_user = 'werlindo'
mongo_pw = 'dsaf040119'

In [62]:
# Instantiate client
client = pymongo.MongoClient("mongodb+srv://" + mongo_user + ":" 
                         + mongo_pw 
                         + "@dsaf-oy1s0.mongodb.net/test?retryWrites=true")


In [63]:
#cli = pymongo.MongoClient('mongodb+srv://werlindo:dsaf040119@dsaf-oy1s0.mongodb.net/test?retryWrites=true')

In [64]:
db = client['got']
coll = db['s8e6']

In [65]:
coll.delete_many({})

<pymongo.results.DeleteResult at 0x11de5bb48>

In [66]:
coll.insert_many(comments_dicts)

<pymongo.results.InsertManyResult at 0x10cc12908>

In [67]:
# Look at DB names
cur = client.list_databases()

for item in cur:
    print(item)

{'name': 'got', 'sizeOnDisk': 151552.0, 'empty': False}
{'name': 'sample_airbnb', 'sizeOnDisk': 57303040.0, 'empty': False}
{'name': 'sample_geospatial', 'sizeOnDisk': 1384448.0, 'empty': False}
{'name': 'sample_mflix', 'sizeOnDisk': 31514624.0, 'empty': False}
{'name': 'sample_supplies', 'sizeOnDisk': 1339392.0, 'empty': False}
{'name': 'sample_training', 'sizeOnDisk': 72982528.0, 'empty': False}
{'name': 'sample_weatherdata', 'sizeOnDisk': 4427776.0, 'empty': False}
{'name': 'admin', 'sizeOnDisk': 245760.0, 'empty': False}
{'name': 'local', 'sizeOnDisk': 1051865088.0, 'empty': False}


In [68]:
# Look at everything in our collection!
cur = coll.find({})

for item in cur:
    print(item)

{'_id': ObjectId('5ce7167456eec2e3b4e30c81'), 'comment': "They don't want to make a new Iron Throne so they are gonna use the guy whos always in a chair already."}
{'_id': ObjectId('5ce7167456eec2e3b4e30c82'), 'comment': 'Jaime Lannister: *Banged the coolest knight ever, Brienne of Tarth, just to leave her for that fugly bitch Cersei.* \n\n*Signed,\nBrienne*'}
{'_id': ObjectId('5ce7167456eec2e3b4e30c83'), 'comment': 'How are they supposed to deal with Drogon if Jon kills Dannny?\n\nDrogon: \\*literally fucks off\\*\n\n&#x200B;\n\nOh... that was easy'}
{'_id': ObjectId('5ce7167456eec2e3b4e30c84'), 'comment': 'Who knew Drogon would have the greatest character development out of them all'}
{'_id': ObjectId('5ce7167456eec2e3b4e30c85'), 'comment': 'Jon: How the fuck did you get up here Arya'}
{'_id': ObjectId('5ce7167456eec2e3b4e30c86'), 'comment': 'How the hell are there that many Dothraki still alive'}
{'_id': ObjectId('5ce7167456eec2e3b4e30c87'), 'comment': 'Anyone else wondering why Bra

# THINGS TO FIGURE OUT

- ## Extract data back out from MongoDB  
~~- ## Use MongoDB Atlas?~~
- ## Build Corpus from Mongo'd data
- ## Sentiment Analysis from Corpus

---

## Below here is island of old lame code

In [None]:
client.database_names

In [None]:
cur = client.list_databases()

In [None]:
for item in cur:
    print(item)

In [None]:
# Mongo Prep
mc = pymongo.MongoClient(host='localhost', port=27017)
db = mc['got']
coll = db['test_collection']

In [None]:
dbee = client['got']
collee = dbee['reddit_test']

In [None]:
topics

Try inserting into collection.

In [None]:
collee.insert_many(topics)


client = pymongo.MongoClient("mongodb://USER:PASSWORD@ABC-cluster-shard-00-00-XYZ.mongodb.net:27017" + 
                            ",ABC-cluster-shard-00-01-XYZ.mongodb.net:27017," +
                            "ABC-cluster-shard-00-02-XYZ.mongodb.net:27017/" + 
                            "DATABASE?ssl=true&replicaSet=ABC-cluster-shard-0&authSource=admin")

---

---


client = pymongo.MongoClient("mongodb://USER:PASSWORD@ABC-cluster-shard-00-00-XYZ.mongodb.net:27017" + 
                            ",ABC-cluster-shard-00-01-XYZ.mongodb.net:27017," +
                            "ABC-cluster-shard-00-02-XYZ.mongodb.net:27017/" + 
                            "DATABASE?ssl=true&replicaSet=ABC-cluster-shard-0&authSource=admin")

---

---

---

---

In [None]:
# Alex's code
# Load secret keys from credentials.json
import json
url = 'https://www.reddit.com/'
with open('/Users/<Your CPUs User>/.secrets/credentials.json') as f:
    params = json.load(f)

In [None]:
def get_keys(path):
    with open(path) as f:
        return json.load(f)