### Calling APIs

Interfaces
- In computing, an interface is something that allows two different systems to interact
- A graphical user interface allows a person to interact with a computer

APIs
- An API is an application-programming interface
- It allows people or machines to talk to a computer system
- We have already seen databases, which offer programmers a SQL API
- In this unit, we will learn to call APIs across the web

Web APIs
- When you go to a website, you are already calling a sort of API
- You make a request to the server and the server returns HTML, Javascript and CSS
- Your browser render that code into a page that you can see 
- Other kinds of API return information that is really meant for computers instead of people

### Our first web API call

- Web APIs return information for machines. But you can still read them as a person. I find that this is an important and useful first step when using web APIs

- Let's take a look

- https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder

- This is a structure called json

### JSON

- Json has keys and values
- Values can also be json objects 
- We read JSON with Python using the json module
- JSON is called "javascript object notation" because you use this format to represent javascript objects. 
- In this class we will stick to Python

In [1]:
import json

# You can write json in Python like this 
some_json = {"name": "fred", "species": "dog"}

serialized = '{"name": "fred", "species": "dog"}'

other_json = json.loads(serialized)

other_json

{'name': 'fred', 'species': 'dog'}

In [2]:
### Looping over the keys in json
for key in other_json:
    print(key, other_json[key])

name fred
species dog


In [3]:
### Looping over the keys in json with .items
for key, value in other_json.items():
    print(key, value)

name fred
species dog


In [4]:
### JSON format allows lists of JSON objects

pets = [{"name": "fred", "species": "dog"}, 
        {"name": "george", "species": "cat"},
        {"name": "harry", "species": "fish"}]

for pet in pets:
    print("**")
    print(pet)
    print(pet["name"], pet["species"])  # use brackets and quotes to access a field in json 


**
{'name': 'fred', 'species': 'dog'}
fred dog
**
{'name': 'george', 'species': 'cat'}
george cat
**
{'name': 'harry', 'species': 'fish'}
harry fish


### Serialization

- When you have a list of JSON objects, sometimes you can store them in a file line-by-line
- This format is called jsonl. It it a common format for storing information
- Storing information in general is called "serialization".
- You already know and love the csv serialization format
- Think of jsonl as a serialization format, just like csv

In [5]:
pets = [{"name": "fred", "species": "dog"}, 
        {"name": "george", "species": "cat"},
        {"name": "harry", "species": "fish"}]

with open("/tmp/pets.jsonl", "w") as of:
    for pet in pets:
        print(json.dumps(pet)) # dumps = dump string
        of.write(json.dumps(pet) + "\n") # write the pet on a new line in our output file

{"name": "fred", "species": "dog"}
{"name": "george", "species": "cat"}
{"name": "harry", "species": "fish"}


In [6]:
! cat /tmp/pets.jsonl

{"name": "fred", "species": "dog"}
{"name": "george", "species": "cat"}
{"name": "harry", "species": "fish"}


### JSON allows for nesting 

In [7]:
bowl = {"size": "3 gallons", "material": "glass", "name":"bowl"}
bed = {"size": "6 feet", "material": "soft polyester", "name":"bed"}

In [8]:
print(pets[0])
pets[0]["habitat"] = bed

print(pets[0])

{'name': 'fred', 'species': 'dog'}
{'name': 'fred', 'species': 'dog', 'habitat': {'size': '6 feet', 'material': 'soft polyester', 'name': 'bed'}}


In [9]:
pets[2]["habitat"] = bowl
pets[1]['habitat'] = bed

In [10]:
### JSON is a poor man's pandas 

In [11]:
pets[0]["habitat"]

{'size': '6 feet', 'material': 'soft polyester', 'name': 'bed'}

In [12]:
[pet for pet in pets if pet["habitat"]["name"] == "bed"]

[{'name': 'fred',
  'species': 'dog',
  'habitat': {'size': '6 feet', 'material': 'soft polyester', 'name': 'bed'}},
 {'name': 'george',
  'species': 'cat',
  'habitat': {'size': '6 feet', 'material': 'soft polyester', 'name': 'bed'}}]

In [13]:
sum(1 for pet in pets if pet["habitat"]["name"] == "bed")

2

In [14]:
sum(1 for pet in pets if pet["habitat"]["name"] == "bed")/len(pets)

0.6666666666666666

In [15]:
import requests

req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder')

req.text

'{\n    "data": [\n        {\n            "all_awardings": [],\n            "allow_live_comments": false,\n            "author": "blackoutcherrytom",\n            "author_flair_css_class": null,\n            "author_flair_richtext": [],\n            "author_flair_text": null,\n            "author_flair_type": "text",\n            "author_fullname": "t2_783puaua",\n            "author_patreon_flair": false,\n            "author_premium": false,\n            "awarders": [],\n            "can_mod_post": false,\n            "contest_mode": false,\n            "created_utc": 1602383103,\n            "domain": "self.cuboulder",\n            "full_link": "https://www.reddit.com/r/cuboulder/comments/j8xbaa/question_about_physics_exam_structure/",\n            "gildings": {},\n            "id": "j8xbaa",\n            "is_crosspostable": true,\n            "is_meta": false,\n            "is_original_content": false,\n            "is_reddit_media_domain": false,\n            "is_robot_indexable":

In [16]:
import requests

req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder')

req

<Response [200]>

In [17]:
dir(req)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

In [18]:
req.text

'{\n    "data": [\n        {\n            "all_awardings": [],\n            "allow_live_comments": false,\n            "author": "blackoutcherrytom",\n            "author_flair_css_class": null,\n            "author_flair_richtext": [],\n            "author_flair_text": null,\n            "author_flair_type": "text",\n            "author_fullname": "t2_783puaua",\n            "author_patreon_flair": false,\n            "author_premium": false,\n            "awarders": [],\n            "can_mod_post": false,\n            "contest_mode": false,\n            "created_utc": 1602383103,\n            "domain": "self.cuboulder",\n            "full_link": "https://www.reddit.com/r/cuboulder/comments/j8xbaa/question_about_physics_exam_structure/",\n            "gildings": {},\n            "id": "j8xbaa",\n            "is_crosspostable": true,\n            "is_meta": false,\n            "is_original_content": false,\n            "is_reddit_media_domain": false,\n            "is_robot_indexable":

In [19]:
web_json = json.loads(req.text) # opposite of json dumps

In [20]:
web_json.keys()

dict_keys(['data'])

In [21]:
type(web_json["data"])

list

In [22]:
for item in web_json["data"]:
    print("**")
    print(item.keys())

**
dict_keys(['all_awardings', 'allow_live_comments', 'author', 'author_flair_css_class', 'author_flair_richtext', 'author_flair_text', 'author_flair_type', 'author_fullname', 'author_patreon_flair', 'author_premium', 'awarders', 'can_mod_post', 'contest_mode', 'created_utc', 'domain', 'full_link', 'gildings', 'id', 'is_crosspostable', 'is_meta', 'is_original_content', 'is_reddit_media_domain', 'is_robot_indexable', 'is_self', 'is_video', 'link_flair_background_color', 'link_flair_richtext', 'link_flair_text_color', 'link_flair_type', 'locked', 'media_only', 'no_follow', 'num_comments', 'num_crossposts', 'over_18', 'parent_whitelist_status', 'permalink', 'pinned', 'pwls', 'retrieved_on', 'score', 'selftext', 'send_replies', 'spoiler', 'stickied', 'subreddit', 'subreddit_id', 'subreddit_subscribers', 'subreddit_type', 'thumbnail', 'title', 'total_awards_received', 'treatment_tags', 'upvote_ratio', 'url', 'whitelist_status', 'wls'])
**
dict_keys(['all_awardings', 'allow_live_comments', '

In [23]:
for item in web_json["data"]:
    print("**")
    print(item["title"])

**
Question about physics exam structure?
**
Apartment, cheap rent :)
**
Math1112 is trash and I can’t seem to pass any of the tests
**
Phonebank for Biden/Hickenlooper
**
Any big CFB fans on here?
**
Curious
**
CU bar scene
**
ATOC 1060 Our changing environment question.
**
9 things to check off your to-do list
**
The laptop question
**
ikon pass discount
**
Looking for Volunteer Tutors (Service Hours Rewarded!)
**
Playwright production courses at CU Boulder?
**
Any of y’all tried PES?
**
What’s the difference
**
Study spots on/close to campus?
**
An Age Old Dilemma
**
Boulder engineering grads: What kinds of engineering jobs are there?
**
Incoming MS in Aero student Spring 2021
**
PHYS1120 shivers?
**
yeehaw! 🤠🤠🤠
**
Saw some deer chilling out at the aero building today!
**
Hypothetically, what is the worse that could happen
**
Come join us tomorrow night for some conversations and games :)
**
Can canvas see if you opened the app on your phone?


In [24]:
for item in web_json["data"]:
    print("**")
    print(item["title"], item["url"])

**
Question about physics exam structure? https://www.reddit.com/r/cuboulder/comments/j8xbaa/question_about_physics_exam_structure/
**
Apartment, cheap rent :) https://www.reddit.com/r/cuboulder/comments/j8vyrg/apartment_cheap_rent/
**
Math1112 is trash and I can’t seem to pass any of the tests https://www.reddit.com/r/cuboulder/comments/j8rru2/math1112_is_trash_and_i_cant_seem_to_pass_any_of/
**
Phonebank for Biden/Hickenlooper https://www.reddit.com/r/cuboulder/comments/j8r79c/phonebank_for_bidenhickenlooper/
**
Any big CFB fans on here? https://www.reddit.com/r/cuboulder/comments/j8qims/any_big_cfb_fans_on_here/
**
Curious https://www.reddit.com/r/cuboulder/comments/j8q3wp/curious/
**
CU bar scene https://www.reddit.com/r/cuboulder/comments/j8p6mf/cu_bar_scene/
**
ATOC 1060 Our changing environment question. https://www.reddit.com/r/cuboulder/comments/j8ouyp/atoc_1060_our_changing_environment_question/
**
9 things to check off your to-do list https://www.colorado.edu/today/2020/10/0

In [25]:
for item in web_json["data"]:
    out = (item["title"], item["url"])

In [26]:
# pagination 

In [27]:
import requests

# sort desc

req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder&sort=desc')

responses = json.loads(req.text)

len(responses["data"])


# print out looking for times
for r in responses["data"]:
    print(r)

{'all_awardings': [], 'allow_live_comments': False, 'author': 'blackoutcherrytom', 'author_flair_css_class': None, 'author_flair_richtext': [], 'author_flair_text': None, 'author_flair_type': 'text', 'author_fullname': 't2_783puaua', 'author_patreon_flair': False, 'author_premium': False, 'awarders': [], 'can_mod_post': False, 'contest_mode': False, 'created_utc': 1602383103, 'domain': 'self.cuboulder', 'full_link': 'https://www.reddit.com/r/cuboulder/comments/j8xbaa/question_about_physics_exam_structure/', 'gildings': {}, 'id': 'j8xbaa', 'is_crosspostable': True, 'is_meta': False, 'is_original_content': False, 'is_reddit_media_domain': False, 'is_robot_indexable': True, 'is_self': True, 'is_video': False, 'link_flair_background_color': '', 'link_flair_richtext': [], 'link_flair_text_color': 'dark', 'link_flair_type': 'text', 'locked': False, 'media_only': False, 'no_follow': True, 'num_comments': 0, 'num_crossposts': 0, 'over_18': False, 'parent_whitelist_status': 'all_ads', 'permalin

In [34]:
import requests

# sort desc

req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder&sort=desc')

responses = json.loads(req.text)

len(responses["data"])

# look up https://www.reddit.com/r/redditdev/comments/3qsv97/whats_the_time_unit_for_created_utc_and_what_time/

# print out looking for times
for r in responses["data"]:
    print(r["created_utc"])
    last = r["created_utc"]

1602383103
1602377527
1602362069
1602360090
1602357708
1602356326
1602353226
1602352159
1602351658
1602349338
1602342625
1602339653
1602338422
1602335051
1602327203
1602317177
1602306342
1602298698
1602293622
1602290119
1602283866
1602281060
1602280593
1602275278
1602270807


In [None]:
req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder&sort=desc&before={}'.format(1602264427))

responses = json.loads(req.text)

len(responses["data"])

# look up https://www.reddit.com/r/redditdev/comments/3qsv97/whats_the_time_unit_for_created_utc_and_what_time/

# print out looking for times
for r in responses["data"]:
    print(r["created_utc"])

In [32]:
import requests

def get25(before_utc):

    req = requests.request('GET', 'https://api.pushshift.io/reddit/search/submission/?subreddit=cuboulder&sort=desc&before={}'.format(before_utc))
    responses = json.loads(req.text)
    out = []

    # print out looking for times
    for r in responses["data"]:
        out.append(r)
        
    return out

next25 = get25(10000000000)

KeyboardInterrupt: 

In [None]:
next25.sort(key=lambda x:x["created_utc"], reverse=True)  # poor man's pandas

In [None]:
for n in next25:
    print(n["created_utc"])

In [36]:
for i in range(10):
    print(i)

    next25 = get25(last)
    
    last = next25[-1]["created_utc"]
    
    next25.sort(key=lambda x:x["created_utc"], reverse=True)  # poor man's pandas
    
    next25 = get25(last)
    
    print(last)

0
1602179642
1
1602099429
2


KeyboardInterrupt: 

In [60]:
data = []

for o in next25[0:3]:
    data.append({"url": o["title"], "created_utc": str(o["created_utc"])})

In [61]:
import pandas as pd  # adding to pandas

pd.read_json(json.dumps(data), orient="records")

Unnamed: 0,url,created_utc
0,test,1602099387
1,"""If you are exhausted, you are not alone""",1602096219
2,All of my bear creek roommates moved out. Will...,1602094507


In [64]:
df = pd.read_json(json.dumps(data), orient="records")


In [87]:
len(df)

3

In [82]:
df[df["created_utc"].idxmax()]

KeyError: 0