# Lab 9 - APIs

Skills
- Authenticate in order to use API
- Execute queries using an API
- Return and process API data

In [1]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize #special package in pandas
import json


# Part A: APIs

## Wiki Dog Info

In [65]:
# download the response from Wikipedia API endpoint https://en.wikipedia.org/w/api.php 
# to get general information about the Dog Wikipedia article
# ask for the output as json
# HINT: prop = info

# save output and print
wiki_dog_info = requests.get('https://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Dog&format=json')
print(wiki_dog_info)

<Response [200]>


In [69]:
# convert json string into python lists and dictionaries
wiki_info = requests.get("https://en.wikipedia.org/w/api.php?action=query&prop=info&titles=Dog&format=json").json()
json_string_info = json.dumps(str(wiki_info))
json_list_info = json.loads(json_string_info)
json_list_info

"{'batchcomplete': '', 'query': {'pages': {'4269567': {'pageid': 4269567, 'ns': 0, 'title': 'Dog', 'contentmodel': 'wikitext', 'pagelanguage': 'en', 'pagelanguagehtmlcode': 'en', 'pagelanguagedir': 'ltr', 'touched': '2020-05-02T17:04:57Z', 'lastrevid': 954418223, 'length': 131920}}}}"

## Wiki Dog Revisions

In [67]:
# download the response from Wikipedia API endpoint https://en.wikipedia.org/w/api.php 
# to get revision information for the Dog Wikipedia article
# ask for the output as json
# HINT: prop = revisions

# save output and print
wiki_dog_revisions = requests.get('https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Dog&format=json')
print(wiki_dog_revisions)


<Response [200]>


In [28]:
# convert json string into python lists and dictionaries
wiki = requests.get("https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Dog&format=json").json()
json_string = json.dumps(str(wiki))
json_list = json.loads(json_string)

In [21]:
# flatten json data into a data frame where each row is a revision
df = pd.json_normalize(wiki,record_path=["query",["pages","4269567","revisions"]])
# print first few rows
df.head()

Unnamed: 0,revid,parentid,user,timestamp,comment
0,954894748,954711555,Pbrower2a,2020-05-04T21:05:33Z,/* Competitors and predators */


In [23]:
# download the last 15 revisions to the dog page
# HINT: using rvlimit
r15 = requests.get('https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Dog&rvlimit=15&format=json')
# save output and print
print(r15)


<Response [200]>


In [27]:
# convert json string into python lists and dictionaries
rr15 = requests.get('https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Dog&rvlimit=15&format=json').json()
json_stringr15 = json.dumps(str(rr15))
json_listr15 = json.loads(json_stringr15)

In [26]:
# flatten json data into a data frame where each row is a revision
dfr15 = pd.json_normalize(rr15,record_path=["query",["pages","4269567","revisions"]])
# count the number of rows  ["query",["pages",'4269567']]
dfr15.shape[0]
# print the first few rows
dfr15.head()

Unnamed: 0,revid,parentid,user,timestamp,comment,minor
0,954894748,954711555,Pbrower2a,2020-05-04T21:05:33Z,/* Competitors and predators */,
1,954711555,954710371,MarialeegRVT,2020-05-03T22:07:06Z,Found answer using source,
2,954710371,954418223,MarialeegRVT,2020-05-03T21:58:44Z,Typo,
3,954418223,954226928,William Harris,2020-05-02T08:49:21Z,/* top */,
4,954226928,952270565,William Harris,2020-05-01T09:25:43Z,/* Notes */ →References because {{note}} is a ...,


# Part B: Authentication

Get Reddit API access by creating two types of accounts
- create a regular user account on Reddit (https://www.reddit.com/)
- create a developer application account on Reddit (https://www.reddit.com/prefs/apps). Here is a longer explanation of creating an API access for Reddit (https://github.com/reddit-archive/reddit/wiki/OAuth2).
    - give your developer application account a good name (this is your user-agent)
    - mark that you are creating a script (THIS IS VERY IMPORTANT!)
    - describe the purpose of the script (e.g. "access reddit api for inst 447")
    - put in any URL you want (for example I put my website you can use http://localhost:8080)
    - save your client id (alphanumeric code in the last line in top left corner under the name you gave your developer account)
    - save your secret code (alphanumeric code next to the words secret)

In [2]:
import requests.auth

In [3]:
# add your reddit developer account information in order to authenticate
client_id =  '6GSYqIa3aILFgQ'# upper left corner on your application registration form
client_secret =  'U2w2dVcWBZghxClUhh-uZbxv4K4'# listed under secret on your application registration form
username =  'brianna279'# your reddit username
user_pass =  '112598'# your reddit password
user_agent =  'inst447lab9'# your developer application account name



In [4]:
# get your access token by passing your developer reddit account information using headers, post data
client_auth = requests.auth.HTTPBasicAuth(client_id, client_secret)
headers = {"UserAgent": user_agent}
post_data = {"grant_type":"password", "username": username, "password": user_pass}
response = requests.post("https://www.reddit.com/api/v1/access_token", auth=client_auth, data=post_data, headers=headers)
access_token = response.json()
access_token

{'message': 'Too Many Requests', 'error': 429}

Read through the Reddit API documentation to understand how to make API calls and the type of data that will be returned.
https://www.reddit.com/dev/api/

In [5]:
# use your access token to get information about your user account
# HINT: endpoint "https://oauth.reddit.com/api/v1/me"
headers2 = {"Authorization":"bearer"+str(access_token),"User-Agent":user_agent}
user_account = requests.get("https://oauth.reddit.com/api/v1/me",headers=headers2)

In [6]:
# use your access token to get information about popular subreddits
# HINT: endpoint "https://oauth.reddit.com/subreddits/popular"
popular = requests.get("https://oauth.reddit.com/subreddits/popular",headers=headers2)

In [38]:
# convert json string for popular subreddits into python lists and dictionaries
popular_string = json.dumps(str(popular))
popular_list = json.loads(popular_string)
# loop through the list of popular subreddits
# print the display_name for each of the popular subreddits
for x in popular:
    print(x)

b'<!doctype html><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><title>subreddits</title><meta name="key'
b'words" content=" reddit, reddit.com, vote, comment, submit " /><meta name="description" content="Reddit gives you the best of th'
b'e internet in one place. Get a constantly updating feed of breaking news, fun stories, pics, memes, and videos just for you. Pas'
b'sionate about something niche? Reddit has thousands of vibrant communities with people that share your interests. Alternatively,'
b' find out what\xe2\x80\x99s trending across all of Reddit on r/popular. Reddit is also anonymous so you can be yourself, with your Reddit'
b' profile and persona disconnected from your real-world identity." /><meta name="referrer" content="origin"><meta http-equiv="Con'
b'tent-Type" content="text/html; charset=UTF-8" /><link type="application/opensearchdescription+xml" rel="search" href="/static/op'
b'ensearch.xml"/><link rel="canonical" href="https://www.reddit.com

# Part C: Wrappers

- get PRAW a Python wrapper for Reddit API (conda install -c conda-forge praw)
- make use of PRAW to make API calls and return data
- You need to read the PRAW documentation to know what functions to call (https://praw.readthedocs.io/en/latest/)

In [9]:
import praw # python reddit api wrapper


In [10]:
# create a instance of python wrapper
# you'll need to pass it your account information for both your reddit user account
# and your developer application account created in Part B
reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     user_agent=user_agent,
                    username=username,
                    password=user_pass)
# verify that authentication worked by returning your username
reddit.subreddit(username)

Subreddit(display_name='brianna279')

In [24]:
# get a list of posts for umd subreddit
# HINT: See documentation here https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html
umd = reddit.subreddit("umd")

In [12]:
# for the umd subreddit get the date when it was created and print this date
umd.created_utc

1271308223.0

In [14]:
# get the number of subscribers to the umd subreddit and print this number
umd.subscribers

24325

In [27]:
# get a list of the top voted posts for the umd subreddit 
# see attributes about submissions https://praw.readthedocs.io/en/latest/code_overview/models/submission.html
# HINT: top
top = umd.top("all")
# print object
print(top)

<praw.models.listing.generator.ListingGenerator object at 0x117765828>


In [28]:
# loop over the top voted posts and print id, title, num_comments
for submission in top:
    print(submission.id)
    print(submission.title)
    print(submission.num_comments)

f8i34j
Thank you everyone!!! From the girl whose leg was run over by a Shuttle-UM bus yesterday
55
brcc2g
had to sacrifice my best friend to testudo :(
44
deaxj0
The duality of man
11
fe89xh
[COVID-19] Important Information You Need to Know About the Coronavirus
93
d5k9q8
The Beekeeping Club is having a bake sale
52
fv7c5r
Finally caught a snapping turtle in animal crossing, so I turned my museum into Mckeldin
16
9b4k03
This is a very rare Frosty Testudo. Donate upvotes to Frosty Testudo to end Maryland's heatwave.
19
gaviet
Absolutely terrible response by UMD Finance Department to a student requesting extra time due to her father passing away from COVID-19
61
fik5l1
Please don't take this pandemic lightly
43
g0n2ms
The loss of a fellow Terp
31
fwc4oq
Time to zoom
8
gb6g39
I'm doing great! Update from the girl who got run over by a Shuttle-UM bus
24
fl6zkh
Imagine being a senior in 2020
11
f0aqy4
*GRAND OPENING* 8:30am today President Loh announces the Grand opening of the McKeldin Wat

Questions
- Compare these posts to what you see during the website www.reddit.com/r/umd. Are they the same?

Answers
These posts are the same as the ones on the website.