In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 2 * matplotlib.rcParams['savefig.dpi']

# Consuming APIs (and JSON)

Consuming APIs is supposed to be easy (that's the point of having an API).  

Let's look at a simple example of consuming a JSON API.  The example we'll look at is a *geocoder*: That is, a service for converting between addresses and normalized geographic information (e.g. latitude and longitude).  Going from addresses to normalized form is "forward geocoding" and going the other way is "reverse geocoding".

We'll interact with a free (and non-authenticated) geocoder run by OpenStreetMap:

In [None]:
import urllib2
import simplejson as json
def geocode(address):
    url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)
    ret = urllib2.urlopen(url).read()
    return json.loads(ret)

my_home = geocode("865 page st, san francisco, ca 94117")
my_home

You can also select out elements of JSON blobs in "the natural way":

In [None]:
my_home[0]['boundingbox']

### Things to note:

1.  In this case, the request parameters were encoded in the URL itself.  This is usually the case for simple "`GET`" queries.  Because our string contained characters like spaces, we had to "URL encode" it (this is what `urllib2.quote` does).  It's usually a bad idea to do your own encoding like this: below we'll talk about the `requests` library, which lets us avoid this.

2. The result was returned to us in the form of _JSON_.  JSON is JavaScript Object Notation -- it's a human readable text-based format for transmitting key-value pairs (and strings, numbers, and arrays).  The `json` package lets us convert between this and Python's native dictionaries, etc.
 
3. This was a public API, with no authentication.  We'll go through an example of the code for an authenticated API at the end -- the example will be the free Twitter stream.  (The reason we didn't do this up front is that you can't run the code without signing up for an API key, etc.)

In [None]:
address = "1600 Pennsylvania Avenue, Washington, DC"
url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)

print address
print
print url
print
print urllib2.urlopen(url).read()
print
json.loads(urllib2.urlopen(url).read())

To make it easier to see what's going on, let's pretty-print that JSON object:

    [
       {"place_id":"9163027846",
        "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
        "osm_type":"way",
        "osm_id":"11557939",
        "boundingbox": ["39.655891418457", "39.6572189331055", 
                        "-77.5709609985352", "-77.5705108642578"],
        "lat":"39.6566765",
        "lon":"-77.5708067",
        "display_name":"Pennsylvania Avenue, Smithsburg, Washington, Maryland, 21783, United States of America",
        "class":"highway",
        "type":"tertiary",
        "importance":0.41,
        "address": {"road":"Pennsylvania Avenue",
                    "town":"Smithsburg", 
                    "county":"Washington", 
                    "state":"Maryland", 
                    "postcode":"21783", 
                    "country":"United States of America", 
                    "country_code":"us"
                   }
       }
    ]

Just like in Python, `[..]` is for arrays and `{..}` is for a dictionary.  This is pretty much all there is to JSON.


**Exercise:** There's also a [free API](http://openweathermap.org) for weather information.

A sample request might look something like `http://api.openweathermap.org/data/2.5/weather?lat=35&lon=139`

Use the geocoder to write a function

        def weather_at_address(address):
            ....
            
that gets the current weather (temperature, cloudy or not) from a human-entered address.

## Handling URL parameters

`urllib2` module requires an enormous amount of work to perform the simplest of tasks. The `requests` library provides a higher-level way to do web requests. This is already nice in examples, like the above, where we need to encode parameters into the URL.  It is even more convenient when there are also `POST` parameters (or cookies, or authentication, or...) involved.  (Don't worry if you don't know what that means.)

In [None]:
import requests
def geocode(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    r = requests.get('http://nominatim.openstreetmap.org/search', params=params)
    return r.json()

In [None]:
x = geocode("107 Page St., San Francisco")

In [None]:
x[0]

## Authenticated APIs

Lots of interesting APIs are free (or at least free for moderate use) but still require you to register first.  The `requests` library (together with some supporting ones, e.g. `requests_oauthlib`) make it easy to consume these too.

**Exercise:** In order to access the Twitter API, you must first sign up: create an app on http://apps.twitter.com, get an access token, et voila, you have your shiny new credentials -- consisting of four pieces of data. The file /secrets/twitter_secrets.json.sample in the datacourse repo has the format template; then rename the file to have a .nogit extension to prevent it being tracked in the repository.

In [None]:
import simplejson
from requests_oauthlib import OAuth1

with open("secrets/twitter_secrets.json.nogit") as fh:
    secrets = simplejson.loads(fh.read())

# create an auth object
auth = OAuth1(
    secrets["api_key"],
    secrets["api_secret"],
    secrets["access_token"],
    secrets["access_token_secret"]
)

In [None]:
# See all of Michael's friends
r = requests.get(
    "https://api.twitter.com/1.1/friends/ids.json",
    auth=auth,
    params={'screen_name' : 'tianhuil'}
)
michaels_friends=r.json()

r2 = requests.post(
    'https://api.twitter.com/1.1/users/lookup.json',
    auth=auth,
    data={'user_id' : michaels_friends['ids'][:50]}
)
friends_info = r2.json()
[(f['screen_name'], f['name']) for f in friends_info]

In [None]:
## Requests also makes it easy to deal with simple streaming APIs.  Let's stream 100 tweets from the Twitter feed.

import json, sys
r_stream = requests.get('https://stream.twitter.com/1.1/statuses/sample.json', auth=auth, stream=True)
counter = 0
for line in r_stream.iter_lines():
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        counter +=1
        print tweet['text']
    sys.stdout.flush()
    if counter > 100:
        break

In [None]:
## Here's a variant that's more US-centric.
## Question: what does islice do?

import json, sys
from itertools import islice
r_stream = requests.post('https://stream.twitter.com/1.1/statuses/filter.json', auth=auth,
                          stream=True, data={"locations" : "-125,23,-70,50"} )
for line in islice(r_stream.iter_lines(), 100):
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        print tweet['text']
    sys.stdout.flush()

### Exercises

1. Write a Python script that takes as input an address and outputs 50 tweets from within about 10 miles of it.
Now modify it to return the top 10 hashtags within that 10 mile range (based on, say, a 1000 tweet sample).
1. You can plot maps using this [Python Package](http://peak5390.wordpress.com/2012/12/08/matplotlib-basemap-tutorial-plotting-points-on-a-simple-map/).  Get geo-located tweets from the streaming API and plot them on the map.

### Further reading for this lecture

To learn more about JSON (there isn't much more to know!):
 - http://www.secretgeek.net/json_3mins.asp
 - http://en.wikipedia.org/wiki/JSON (esp. "Data types, syntax, and examples")
 - http://tools.ietf.org/html/rfc7159

A useful tool for playing with JSON on the command line is [jq](http://stedolan.github.io/jq/).

To learn more about about the prevailing design pattern ("REST") for web-based APIs:
 - http://en.wikipedia.org/wiki/Representational_state_transfer
 
One wildcard is the wide variety of authentication strategies employed ("basic auth", cookies, bearer token, OAuth, OAuth 2, etc.).  For several of these, the documentation at http://docs.python-requests.org/en/latest/user/authentication/ is helpful.

### Exit Tickets
1. Explain the difference between requests.get() and requests.post().
2. What data structures do JSON objects in Python use?
3. Describe what the remote site is doing when it receives an API request from you.

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*