# Exploring Reddit API

---

Let's get the hang of using Reddit's API by following [Shropshire's article](https://towardsdatascience.com/exploring-reddits-ask-me-anything-using-the-praw-api-wrapper-129cf64c5d65).


Step 1: Install or Update PRAW in your Terminal
- Check.

---

Step 2: Create and/or Login to Your Reddit Account to begin Authenticating via OAuth
- Check.

---

## Libraries

In [1]:
import os             # file system stuff
import json           # digest json
import praw           # reddit API
import pandas as pd   # Dataframes
import pymongo        # MongoDB

In [2]:
import helper     # Custom helper functions

---

### Load dem keys

Step 3: Create your first Authorized Reddit Instance

In [3]:
# Define path to secret
secret_path = os.path.join(os.environ['HOME'], '.secret', 'reddit.json')

In [4]:
keys = helper.get_keys(secret_path)

In [5]:
reddit = praw.Reddit(client_id=keys['client_id'] 
                     ,client_secret=keys['api_key']
                     ,username=keys['username']
                     ,password=keys['password']
                     ,user_agent='reddit_research accessAPI:v0.0.1 (by /u/FlatDubs)')

Step 4: Obtain a Subreddit Instance from your Reddit Instance

In [8]:
subreddit = reddit.subreddit('gameofthrones')
print(subreddit.display_name)  # Output: gameofthrones
print(subreddit.title)         # Output:I Am A, where the mundane...
print(subreddit.description)
print(subreddit.subscribers)

iama


ResponseException: received 504 HTTP response

---

Step 5: Obtain a Submission Instance from your Subreddit Instance

In [None]:
#iterating through the 10 submissions marked hot
for submission in subreddit.hot(limit=3):
    print(submission.title)  # Output: the submission's title
    print(submission.score)  # Output: the submission's upvotes
    print(submission.id)     # Output: the submission's ID
    print(submission.url)    # Output: the URL

---

Step 6: Create a Pandas DataFrame of Basic Submission Stats Taken From the Subreddit

In [None]:
# Compile submission into list
title = []
time = []
num_upvotes = []
num_comments = []
upvote_ratio = []
link_flair = []
redditor = []
body = []
i=0

for submission in subreddit.top(limit=5):
    i+=1
    title.append(submission.title)
    time.append(submission.created_utc)
    num_upvotes.append(submission.score)
    num_comments.append(submission.num_comments)
    upvote_ratio.append(submission.upvote_ratio)
    link_flair.append(submission.link_flair_text)
    redditor.append(submission.author)
    body.append(submission.selftext)
    if i%5 == 0:
        print(f'{i} submissions completed')

In [None]:
df = pd.DataFrame(
    {'title': title,
     'time': time,
     'num_comments': num_comments,
     'num_upvotes': num_upvotes,
     'upvote_ratio': upvote_ratio,
     'link_flair': link_flair,
     'redditor': redditor
     ,'body': body
    })
df.head(10)

---

### Can we put this in a MongoDB?

Instantiate MongoDB

In [None]:
# Mongo Prep
mc = pymongo.MongoClient(host='localhost', port=27017)
db = mc['got']
coll = db['test_collection']

In [None]:
# Initialize
i = 0
topics = []

for submission in subreddit.top(limit=5):
    i+=1
    topics.append({
                   'title': submission.title
                    ,'time': submission.created_utc
                    ,'num_upvotes': submission.score
                    ,'num_comments': submission.num_comments
                    ,'upvote_ratio': submission.upvote_ratio
                    ,'link_flair': submission.link_flair_text
#                     ,'redditor': submission.author
                    ,'body': submission.selftext
                 })
#    topics_dict['title'].append(submission.title)
#     time.append(submission.created_utc)
#     num_upvotes.append(submission.score)
#     num_comments.append(submission.num_comments)
#     upvote_ratio.append(submission.upvote_ratio)
#     link_flair.append(submission.link_flair_text)
#     redditor.append(submission.author)
#     body.append(submission.selftext)

    if i%5 == 0:
        print(f'{i} submissions completed')

In [None]:
topics

Try inserting into collection.

In [None]:
coll.insert_many(topics)

Yay! it worked.

# THINGS TO FIGURE OUT

- ## Extract data back out from MongoDB  
- ## Use MongoDB Atlas?
- ## Build Corpus from Mongo'd data
- ## Sentiment Analysis from Corpus

In [None]:

client = pymongo.MongoClient("mongodb://USER:PASSWORD@ABC-cluster-shard-00-00-XYZ.mongodb.net:27017" + 
                            ",ABC-cluster-shard-00-01-XYZ.mongodb.net:27017," +
                            "ABC-cluster-shard-00-02-XYZ.mongodb.net:27017/" + 
                            "DATABASE?ssl=true&replicaSet=ABC-cluster-shard-0&authSource=admin")

In [None]:

client = pymongo.MongoClient("mongodb+srv://werlindo:dsaf040119@dsaf-oy1s0.mongodb.net/test?retryWrites=true")
db = client.test


---

---

---

---

---

---

In [None]:
# Alex's code
# Load secret keys from credentials.json
import json
url = 'https://www.reddit.com/'
with open('/Users/<Your CPUs User>/.secrets/credentials.json') as f:
    params = json.load(f)

In [None]:
def get_keys(path):
    with open(path) as f:
        return json.load(f)