In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

# Consuming APIs (and JSON)
<!-- requirement: secrets/twitter_secrets.json.sample -->


Consuming APIs is supposed to be easy (that's the point of having an API).  

Let's look at a simple example of consuming a JSON API.  The example we'll look at is a *geocoder*: That is, a service for converting between addresses and normalized geographic information (e.g. latitude and longitude).  Going from addresses to normalized form is "forward geocoding" and going the other way is "reverse geocoding".

We'll interact with a free (and non-authenticated) geocoder run by OpenStreetMap.  The geocoded information is available by sending a GET request to <tt>http:&#8203;//nominatim.openstreetmap.org/search?q=<i>address</i>&addressdetails=1&format=json</tt>.  The portion before the question mark (`http://nominatim.openstreetmap.org/search`) is the endpoint on the server, while the portion following, known as the *query string*, contains the data being sent to the server.  (Thus, a GET request can be repeated simply by requesting the same URL again.  In contrast, the data sent in a POST request is contained in the request body, not in the URL.)

As is typical, the query string consists of several key=value pairs, separated by ampersands.  The requested address is specified with the `q` key in this case.  Some characters, like the spaces and commas, cannot be using in the URL, so they must be encoded with the `urllib2.quote()` function.

In [1]:
import urllib2

address = "1600 Pennsylvania Avenue, Washington, DC"
urllib2.quote(address)

'1600%20Pennsylvania%20Avenue%2C%20Washington%2C%20DC'

In [2]:
url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % urllib2.quote(address)
url

'http://nominatim.openstreetmap.org/search?q=1600%20Pennsylvania%20Avenue%2C%20Washington%2C%20DC&addressdetails=1&format=json'

We can request this URL with the `urlopen()` function, which returns a stream we can read from.

In [3]:
data = urllib2.urlopen(url).read()
data

'[{"place_id":"228527596","licence":"Data \xc2\xa9 OpenStreetMap contributors, ODbL 1.0. https:\\/\\/osm.org\\/copyright","osm_type":"way","osm_id":"564931814","boundingbox":["38.8957842","38.895924","-77.0309688","-77.0304609"],"lat":"38.8958536","lon":"-77.0307129","display_name":"Pennsylvania Ave, Penn Quarter, Washington, District of Columbia, 20004, United States of America","class":"highway","type":"path","importance":0.22875,"address":{"path":"Pennsylvania Ave","suburb":"Penn Quarter","city":"Washington","state":"District of Columbia","postcode":"20004","country":"United States of America","country_code":"us"}},{"place_id":"158306366","licence":"Data \xc2\xa9 OpenStreetMap contributors, ODbL 1.0. https:\\/\\/osm.org\\/copyright","osm_type":"way","osm_id":"397325778","boundingbox":["38.8633822","38.8637409","-76.9467576","-76.945632"],"lat":"38.8636383","lon":"-76.9463651","display_name":"Pennsylvania Avenue, Coral Hills, Prince George\'s County, District of Columbia, 20020, Unit

The result was returned to us in the form of JSON. JSON is JavaScript Object Notation&mdash;it's a human readable text-based format for transmitting key-value pairs (and strings, numbers, and arrays). The json package lets us convert between this and Python's native dictionaries, etc.

In [4]:
import simplejson as json

json.loads(data)

[{'address': {'city': 'Washington',
   'country': 'United States of America',
   'country_code': 'us',
   'path': 'Pennsylvania Ave',
   'postcode': '20004',
   'state': 'District of Columbia',
   'suburb': 'Penn Quarter'},
  'boundingbox': ['38.8957842', '38.895924', '-77.0309688', '-77.0304609'],
  'class': 'highway',
  'display_name': 'Pennsylvania Ave, Penn Quarter, Washington, District of Columbia, 20004, United States of America',
  'importance': 0.22875,
  'lat': '38.8958536',
  'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'lon': '-77.0307129',
  'osm_id': '564931814',
  'osm_type': 'way',
  'place_id': '228527596',
  'type': 'path'},
 {'address': {'country': 'United States of America',
   'country_code': 'us',
   'county': "Prince George's County",
   'locality': 'Coral Hills',
   'postcode': '20020',
   'road': 'Pennsylvania Avenue',
   'state': 'District of Columbia'},
  'boundingbox': ['38.8633822', '38.8637409', '-76.9467576', '

In [5]:
json.loads(data)[0]['boundingbox']

['38.8957842', '38.895924', '-77.0309688', '-77.0304609']

Note that this was a public API, with no authentication.  We'll go through an example of the code for an authenticated API at the end -- the example will be the free Twitter stream.  (The reason we didn't do this up front is that you can't run the code without signing up for an API key, etc.)

## Handling URL parameters


`urllib2` module requires an enormous amount of work to perform the simplest of tasks. The `requests` library provides a higher-level way to do web requests. This is already nice in examples, like the above, where we need to encode parameters into the URL.  It is even more convenient when there are also `POST` parameters (or cookies, or authentication, or...) involved.  (Don't worry if you don't know what that means.)

In [None]:
import requests
def geocode(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    return requests.get('http://nominatim.openstreetmap.org/search', params=params)

response = geocode("107 Page St., San Francisco")

The parameters are automatically encoded and assembled into the query string.

In [None]:
response.url

The raw response is available...

In [None]:
response.text

...but it can also be converted to JSON.

In [None]:
response.json()

In [None]:
response.json()[0]['boundingbox']

**Exercise:** The National Weather Service operates a free API for weather information.  A sample request looks like this: `http://forecast.weather.gov/MapClick.php?lat=37.7739&lon=-122.4225&FcstType=json`.

Use the geocoder to write a function

        def weather_at_address(address):
            ....
            
that gets the current weather (temperature, cloudy or not) from a human-entered address.

## Authenticated APIs


Lots of interesting APIs are free (or at least free for moderate use) but still require you to register first.  The `requests` library (together with some supporting ones, e.g. `requests_oauthlib`) make it easy to consume these too.

**Exercise:** In order to access the Twitter API, you must first sign up: create an app on http://apps.twitter.com, get an access token, *et voila*, you have your shiny new credentials -- consisting of four pieces of data. The file `secrets/twitter_secrets.json.sample` has the format template; then rename the file to have a `.nogit` extension to prevent it being tracked in a git repository.

In [11]:
from requests_oauthlib import OAuth1
import requests

with open("secrets/twitter_secrets.json.nogit") as fh:
    secrets = json.loads(fh.read())

# create an auth object
auth = OAuth1(
    secrets["api_key"],
    secrets["api_secret"],
    secrets["access_token"],
    secrets["access_token_secret"]
)

Let's see all of Michael's friends.

In [12]:
r = requests.get(
    "https://api.twitter.com/1.1/friends/ids.json",
    auth=auth,
    params={'screen_name' : 'tianhuil'}
)
michaels_friends=r.json()

r2 = requests.post(
    'https://api.twitter.com/1.1/users/lookup.json',
    auth=auth,
    data={'user_id' : michaels_friends['ids'][:50]}
)
friends_info = r2.json()
[(f['screen_name'], f['name']) for f in friends_info]

[(u'AnokyeFrancis38', u'Francis Anokye'),
 (u'best_workers', u'Best Workers'),
 (u'singla_gourish', u'gourish singla'),
 (u'expertmodels', u'Expert Models'),
 (u'odlabenonline', u'ODOI-LARTEY Benjamin'),
 (u'realCle0patra', u'Cleo'),
 (u'TheTechnologyTm', u'The Technology\u2122'),
 (u'fedora_rapp', u'Fedora Rapp'),
 (u'DmeIntelligence', u'DME Intelligence'),
 (u'Edgar_Villegas', u'Edgar Villegas'),
 (u'Info_Data_Mgmt', u'Caroline Higgins'),
 (u'TheBigDataBot', u'The Big Data Bot'),
 (u'TaylorLevi6', u'Taylor Levi'),
 (u'ZaxUmar', u'Umar Zachary'),
 (u'Mracek', u'John Mracek'),
 (u'PeterSmails', u'Peter Smails'),
 (u'iotguide', u'IoT Guide'),
 (u'luxillo', u'Luis Chavez'),
 (u'noyonsazu', u'HotSpotSetup'),
 (u'JessicaMHill', u'Jessica M Hill'),
 (u'bmaltaverne', u'Bertrand Maltaverne'),
 (u'nimdvir', u'Nim Dvir'),
 (u'janzaibaloch786', u'Janzaib Masud Baloch'),
 (u'sumomomonomomo', u'Aymeric'),
 (u'TatumBa06465803', u'Tatum Barber'),
 (u'J3nTyrell', u'Jen Tyrell'),
 (u'M_E_Cat', u'Rober

Requests also makes it easy to deal with simple streaming APIs.  Let's stream 100 tweets from the Twitter feed.

In [13]:
import sys
r_stream = requests.get('https://stream.twitter.com/1.1/statuses/sample.json', auth=auth, stream=True)
counter = 0
for line in r_stream.iter_lines():
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        counter +=1
        print tweet['text']
    sys.stdout.flush()
    if counter > 100:
        break

RT @SHINee: #SHINee #ÏÉ§Ïù¥Îãà #SHINee_TheStoryofLight

üíøThe Story of Light EP.3 : 2018.06.25 #ÎÑ§Í∞ÄÎÇ®Í≤®ÎëîÎßê #OURPAGE https://t.co/XhNXfwFjMQ
Alla voy  #KarolSevilla #MichaelRonda #RuggeroPasquarelli #SoyLuna #KCAMexico
RT @aries_hn: #Aries Sientes que siempre escuchas problemas, nunca soluciones, debes solucionar eres t√∫.
RT @arkhamoriginis: ÿ±Ÿàÿ≥Ÿäÿß ÿ≠ŸÑŸäŸÅ ÿßÿ≥ÿ™ÿ±ÿßÿ™Ÿäÿ¨Ÿä ŸàÿπŸÜÿØŸÜ ŸÉÿ™Ÿäÿ± ÿ∑Ÿäÿßÿ®
ÿßŸÜÿµÿ±ŸáŸÖ ÿπŸÑŸâ ÿßŸÑÿ∑ÿπŸÖŸäÿ© Ÿäÿß ÿ±ÿ®
RT @vinnybrack: I‚Äôm guessing this isn‚Äôt virgin airlines https://t.co/T8LpwahoEY
RT @sabqorg: ŸÉÿ±ÿ±Ÿáÿß ÿßŸÑŸÖŸÑŸÉ: "#ÿßÿπÿ™ÿØÿßŸÑ_ŸÑÿß_ÿßŸÜÿ≠ŸÑÿßŸÑ".. ŸÅÿ¨ÿßÿ° ŸÇÿ±ÿßÿ± #ÿ•ÿπŸÅÿßÿ°_ÿ±ÿ¶Ÿäÿ≥_ŸáŸäÿ¶ÿ©_ÿßŸÑÿ™ÿ±ŸÅŸäŸá. 
 
https://t.co/e8nbNE3sRa https://t.co/TRnlNc5aER
mais sonoridade pfv https://t.co/xyH1vj1CPO
Allah islami hareketin yardƒ±mcƒ±sƒ±dƒ±r. #AlparslanKuytulaAdalet https://t.co/Q9UfwDmLKi
#Bakƒ±rk√∂y ‚ù§Ô∏è05369470329‚ù§Ô∏è
üëëAktif ‚ù§Pasifüëë 

#istanbulescort
#istanbulgay
#bahcelievler 
#bakirkoygay
#bagcilarg

RT @3808digital: Due to everything I've been through, I've learned not to take life for granted üíØüëå
Midday Market Commentary (6/19) https://t.co/Vla6YbAjjC
Sg Americas Securities Boosted Wellcare Health Plans Com $WCG Holding; Artesian Resources $ARTNA Has 0.75 Sentiment https://t.co/KAvWTX6jmD
SLEEP PEACEFULLY, MY LUNGS ARE
RT @billboard: .@KaceyMusgraves wants "countrified" collabs with @PostMalone &amp; @LanaDelRey https://t.co/Ow43hTypxM https://t.co/TE42xk32Nx
RT @GivingUpSelf: Living in unrepentant sin hardens the spiritual arteries.

#TuesdayThoughts #Confession
#JesusSaves #GUS #JesusIsLord
#Tr‚Ä¶
RT @arnemx: @sabinaberman M√°s datos duros y menos teatro se√±ora. https://t.co/SkAxPSFgnR
@AshleyBBR_HOT Ïûò ÏïàÎ∞õÎäî ÏùåÏãùÏù¥Ïöî? Î≠îÎç∞? (ÎÑ§Í∞Ä ÏãúÏÑ†ÏùÑ ÌîºÌñàÏßÄÎßå Í≥ÑÏÜç ÎÑ§Í∞Ä ÏãúÏÑ†ÏùÑ ÎßàÏ£ºÎ¥êÏ§ÑÎïåÍπåÏßÄ ÎÑàÎ•º Î∞îÎùºÎ≥∏Îã§) ÎßêÌï¥Ï£ºÏãúÎ©¥ Í∑∏Í±¥ ÏµúÎåÄÌïú ÌîºÌï¥Î≥ºÍ≤åÏöî~. Í∑∏Î¶¨Í≥† ÎãπÏó∞ÌïòÏ£†! Ïó¨Í∏∞ÏÑú ÎÇòÍ∞ÄÍ∏∞ÎßåÌïòÎ©¥...‚Ä¶ https://t.co/2rHzxvAAw0
Parece q

We can restrict the location to be more-likely to get English-language tweets.

In [14]:
from itertools import islice  # Question: what does islice do?

r_stream = requests.post('https://stream.twitter.com/1.1/statuses/filter.json', auth=auth,
                          stream=True, data={"locations" : "-125,23,-70,50"} )
for line in islice(r_stream.iter_lines(), 100):
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        print tweet['text']
    sys.stdout.flush()

@marcorubio @KellyannePolls Or just send the whole family back and tell them to enter the country legally. Simple,‚Ä¶ https://t.co/xkSftnCPLd
@alejandrozam199 @Televisa_Prensa @TD_Deportes Si me estaba dando cuentas saludos
Afternoon it‚Äôs #tiptuesday 
Don‚Äôt forget to get‚Ä¶ https://t.co/EQIHBJjvuz
@nataliaaabrn Bet
if y‚Äôall going to pick idols y‚Äôall gotta pick better ones bc the ones y‚Äôall picking are trash and that‚Äôs just that.
Another day, another tech dollar :) (at @Staples in Totowa, NJ) https://t.co/4TsZWtwcv5
why are people saying he recorded this before he died? lol it‚Äôs still a good message even tho it wasn‚Äôt recorded ‚Äúr‚Ä¶ https://t.co/hTO3Er8LPS
@GiannoCaldwell What I‚Äôm saying is, acknowledging a man that doesn‚Äôt support you enough to properly condemn people‚Ä¶ https://t.co/QjubnCWpHK
@BillKristol #WokeBillKristol is a stone cold assassin with the rhetoric
Sadio Mane is the only African in the world that cannot dance.
Hands down the best version of @Chri

## API Request Limitations


Some Authenticated APIs have hard limits on the total number of requests that can be made by one user in one day. An API service that uses a freemium or paid service model will enforce a limit so they can encourage high-volume users to pay for better data access. API providers also do this to force software developers to be disciplined and thoughtful in their use of the API service.

All APIs might have soft limits based on some ambiguous definition of excessive use. Google, for example, will block your IP address if you make too many requests to their services too quickly. Presumably this is done with a machine learning algorithm built specifically for this purpose. Bloomberg has a Python API associated with their desktop terminal application. They will revoke access if you exceed daily or monthly hard limits, but unfortunately specifics of those limits are not shared with any of their users.

These limits create challenges for the cost-conscious data scientist. Happily, Python has tools to help. One of them is the [ediblepickle](https://pypi.python.org/pypi/ediblepickle/1.1.3) package. This package provides a  convenient facility for caching the results of function calls. This can help prevent unnecessary duplicate requests to an API.

In the below example, the previous `geocode` function is modified with ediblepickle's `checkpoint` decorator. It wraps the `geocode2` function with additional functionality to cache the results of the first function call in a pickle file. The results are stored in a file name that is dependent on the function arguments.

If this function is called a second time with the same function arguments, the `checkpoint` decorator will intercept the call and retrieve the results from the cached pickle file.

It is important that the file name be valid file name that is unique to the function parameters. In this example, we use `urllib2.quote` to escape characters and generate a proper file name.

In [None]:
from ediblepickle import checkpoint
import os

cache_dir = 'cache'
if not os.path.exists(cache_dir):
    os.mkdir(cache_dir)

@checkpoint(key=lambda args, kwargs: urllib2.quote(args[0]) + '.p', work_dir=cache_dir)
def geocode2(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    print 'making API request...'
    result = requests.get('http://nominatim.openstreetmap.org/search', params=params)
    print 'API request complete.'
    return result
    
address = "City Hall Park, New York, NY 10007"

In [None]:
%%time

# this created the cached file. observe the creation of a new pickle file in the cache directory.
response = geocode2(address)
print response.json()

In [None]:
%%time

# this reads the cached file. observe that this executes ~100x faster.
# the print statements in the geocode2 function do not appear because the function itself is not executed at all.
response = geocode2(address)
print response.json()

### Exercises


1. Write a Python script that takes as input an address and outputs 50 tweets from within about 10 miles of it.
Now modify it to return the top 10 hashtags within that 10 mile range (based on, say, a 1000 tweet sample).
1. You can plot maps using this [Python Package](http://peak5390.wordpress.com/2012/12/08/matplotlib-basemap-tutorial-plotting-points-on-a-simple-map/).  Get geo-located tweets from the streaming API and plot them on the map.

### Further reading for this lecture


To learn more about JSON (there isn't much more to know!):
 - http://www.secretgeek.net/json_3mins.asp
 - http://en.wikipedia.org/wiki/JSON (esp. "Data types, syntax, and examples")
 - http://tools.ietf.org/html/rfc7159

A useful tool for playing with JSON on the command line is [`jq`](http://stedolan.github.io/jq/).

To learn more about about the prevailing design pattern ("REST") for web-based APIs:
 - http://en.wikipedia.org/wiki/Representational_state_transfer
 
One wild card is the wide variety of authentication strategies employed ("basic auth", cookies, bearer token, OAuth, OAuth 2, etc.).  For several of these, the documentation at http://docs.python-requests.org/en/latest/user/authentication/ is helpful.

### Exit Tickets

1. Explain the difference between requests.get() and requests.post().
2. What data structures do JSON objects in Python use?
3. Describe what the remote site is doing when it receives an API request from you.

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*