Consuming APIs (and JSON)
================
Consuming APIs is supposed to be easy (that's the point of having an API).  

Let's look at a simple example of consuming a JSON API.  The example we'll look at is a *geocoder*: That is, a service for converting between addresses and normalized geographic information (e.g., latitude and longitude).  Going from addresses to normalized form is "forward geocoding" and going the other way is "reverse geocoding".

We'll interact with a free (and non-authenticated) geocoder run by OpenStreetMap:

In [1]:
import urllib2, json
def geocode(address):
    url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)
    ret = urllib2.urlopen(url).read()
    return json.loads(ret)

In [2]:
my_home = geocode("865 page st, san francisco, ca 94117")
my_home

[{u'address': {u'city': u'SF',
   u'country': u'United States of America',
   u'country_code': u'us',
   u'county': u'SF',
   u'house_number': u'865',
   u'neighbourhood': u'North of Panhandle',
   u'postcode': u'94117',
   u'road': u'Page Street',
   u'state': u'California'},
  u'boundingbox': [u'37.772362897959',
   u'37.772462897959',
   u'-122.43498053061',
   u'-122.43488053061'],
  u'class': u'place',
  u'display_name': u'865, Page Street, North of Panhandle, SF, California, 94117, United States of America',
  u'importance': 0.301,
  u'lat': u'37.7724128979592',
  u'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
  u'lon': u'-122.434930530612',
  u'place_id': u'454066711',
  u'type': u'house'}]

You can also select out elements of json blobs in "the natural way":

In [3]:
my_home[0]['boundingbox']

[u'37.772362897959',
 u'37.772462897959',
 u'-122.43498053061',
 u'-122.43488053061']

In [4]:
my_home[0]['address']

{u'city': u'SF',
 u'country': u'United States of America',
 u'country_code': u'us',
 u'county': u'SF',
 u'house_number': u'865',
 u'neighbourhood': u'North of Panhandle',
 u'postcode': u'94117',
 u'road': u'Page Street',
 u'state': u'California'}

Things to note:
---------------

1.  In this case, the request parameters were encoded in the URL itself.  This is usually the case for simple "`GET`" queries.  Because our string contained characters like spaces, we had to "URL encode" it (this is what `urllib2.quote`) does.  It's usually a bad idea to do your own encoding like this: below we'll talk about the `requests` library, which lets us avoid this.

2. The result was returned to us in the form of _JSON_.  JSON is JavaScript Object Notation -- it's a human readable text-based format for transmiting key-value pairs (and strings, numbers, and arrays).  The `json` package lets us convert between this and Python's native dictionaries (etc.).
 
3. This was a public API, with no authentication.  We'll go through an example of the code for an authenticated API at the end -- the example will be the free Twitter stream.  (The reason we didn't do this up front is that you can't run the code without signing up for an API key, etc.)

In [5]:
address = "1600 Pennsylvania Avenue, Washington, DC"
urllib2.quote(address)
url = "http://nominatim.openstreetmap.org/search?q=%s&addressdetails=1&format=json" % (urllib2.quote(address),)

print address
print
print url
print
print urllib2.urlopen(url).read()
print
print json.loads(urllib2.urlopen(url).read())

1600 Pennsylvania Avenue, Washington, DC

http://nominatim.openstreetmap.org/search?q=1600%20Pennsylvania%20Avenue%2C%20Washington%2C%20DC&addressdetails=1&format=json

[{"place_id":"2577064361","licence":"Data © OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright","osm_type":"way","osm_id":"238241022","boundingbox":["38.8974898","38.897911","-77.0368539","-77.0362521"],"lat":"38.8976989","lon":"-77.036553192281","display_name":"The White House, 1600, Pennsylvania Avenue Northwest, Monumental Core, District of Columbia, 20500, United States of America","class":"building","type":"yes","importance":0.80767573872961,"address":{"building":"The White House","house_number":"1600","pedestrian":"Pennsylvania Avenue Northwest","neighbourhood":"Monumental Core","state":"District of Columbia","postcode":"20500","country":"United States of America","country_code":"us"}},{"place_id":"27049449","licence":"Data © OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstree

To make it easier to see what's going on, let's pretty-print that JSON object:

    [
       {"place_id":"9163027846",
        "licence":"Data \u00a9 OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
        "osm_type":"way",
        "osm_id":"11557939",
        "boundingbox": ["39.655891418457", "39.6572189331055", 
                        "-77.5709609985352", "-77.5705108642578"],
        "lat":"39.6566765",
        "lon":"-77.5708067",
        "display_name":"Pennsylvania Avenue, Smithsburg, Washington, Maryland, 21783, United States of America",
        "class":"highway",
        "type":"tertiary",
        "importance":0.41,
        "address": {"road":"Pennsylvania Avenue",
                    "town":"Smithsburg", 
                    "county":"Washington", 
                    "state":"Maryland", 
                    "postcode":"21783", 
                    "country":"United States of America", 
                    "country_code":"us"
                   }
       }
    ]

Just like in Python, `[..]` is for arrays and `{..}` is for a dictionary.  This is pretty much all there is to JSON.


**Exercise**: There's also a free API for weather information at http://api.openweathermap.org.  

A sample request might be look something like
        http://api.openweathermap.org/data/2.5/weather?lat=35&lon=139
        
        
Use the geocoder to write a function

        def weather_at_address(address):
            ....
            
that gets the current weather (temperature, cloudy or not) from a human entered address.

Easier parameters handling
--------------------
`urllib2` module requires an enormous amount of work to perform the simplest of the tasks. The `requests` library provides a higher-level way to do web requests. This is already nice in examples, like the above, where we need to URL encode parameters into the URL.  It is even more convenient when there are also `POST` parameters (or cookies, or authentication, or...) involved.  (Don't worry if you don't know what that means.)

In [6]:
import requests
def geocode(address):
    params = { 'format'        :'json', 
               'addressdetails': 1, 
               'q'             : address}
    r = requests.get('http://nominatim.openstreetmap.org/search', params = params)
    return r.json()

In [7]:
x = geocode("107 Page St., San Francisco")

In [8]:
x[0]

{u'address': {u'city': u'SF',
  u'country': u'United States of America',
  u'country_code': u'us',
  u'county': u'SF',
  u'house_number': u'107',
  u'neighbourhood': u'Western Addition',
  u'postcode': u'94102',
  u'road': u'Page Street',
  u'state': u'California'},
 u'boundingbox': [u'37.773924413793',
  u'37.774024413793',
  u'-122.42266903448',
  u'-122.42256903448'],
 u'class': u'place',
 u'display_name': u'107, Page Street, Western Addition, SF, California, 94102, United States of America',
 u'importance': 0.201,
 u'lat': u'37.7739744137931',
 u'licence': u'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
 u'lon': u'-122.422619034483',
 u'place_id': u'454203115',
 u'type': u'house'}

Authenticated APIs
--------------------
Lots of interesting APIs are free (or at least free for moderate use) but still require you to register first.  The `requests` library (together with some supporting ones, e.g., `requests_oauthlib`) make it easy to consume these too.

**Exercise**: In order to access the Twitter API, you must first sign up; create an app on http://apps.twitter.com, get an access token, et voila, you have your shiny new credentials -- consisting of four pieces of data:

In [11]:
import simplejson
from requests_oauthlib import OAuth1

with open("../secrets/twitter_secrets.json.nogit") as fh:
    secrets = simplejson.loads(fh.read())
    
# create an auth object
auth = OAuth1(
    secrets["api_key"],
    secrets["api_secret"],
    secrets["access_token"],
    secrets["access_token_secret"]
)

IOError: [Errno 2] No such file or directory: 'secrets/twitter_secrets.json.sample'

In [12]:
# See all of Michael's friends
r = requests.get("https://api.twitter.com/1.1/friends/ids.json", auth=auth, params={'screen_name' : 'tianhuil'})
michaels_friends=r.json()

r2 = requests.post('https://api.twitter.com/1.1/users/lookup.json', auth=auth, data={'user_id' : michaels_friends['ids']})
friends_info = r2.json()
[(f['screen_name'], f['name']) for f in friends_info]

NameError: name 'auth' is not defined

In [13]:
## Requests also makes it easy to deal with simple streaming APIs.  Let's stream 100 tweets from the Twitter feed.

import json, sys
r_stream = requests.get('https://stream.twitter.com/1.1/statuses/sample.json', auth=auth, stream=True)
counter = 0
for line in r_stream.iter_lines():
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        counter +=1
        print tweet['text']
    sys.stdout.flush()
    if counter > 100:
        break

NameError: name 'auth' is not defined

In [None]:
## Here's a variant that's more US-centric.

import json, sys
r_stream = requests.post('https://stream.twitter.com/1.1/statuses/filter.json', auth=auth,
                          stream=True, data={"locations" : "-125,23,-70,50"} )
counter = 0
for line in r_stream.iter_lines():
    # filter out keep-alive new lines
    if not line:
        continue
    tweet = json.loads(line)
    if 'text' in tweet:
        counter +=1
        print tweet['text']
    sys.stdout.flush()
    if counter > 100:
        break

Exercises
----------
1. Write a Python script that takes as input an address and outputs 50 tweets from within about 10 miles of it.
Now modify it to return the top 10 hashtags within that 10 mile range (based on, say, a 1000 tweet sample).
1. You can plot maps using this [Python Package](http://peak5390.wordpress.com/2012/12/08/matplotlib-basemap-tutorial-plotting-points-on-a-simple-map/).  Get geo-located tweets from the streaming API and plot them on the map.

Further reading for this lecture
------------
To learn more about JSON (there isn't much more to know!):
 - http://www.secretgeek.net/json_3mins.asp
 - http://en.wikipedia.org/wiki/JSON (esp., "Data types, syntax, and examples")
 - http://tools.ietf.org/html/rfc7159

A useful tool for playing with json on the command line is [jq](http://stedolan.github.io/jq/).

To learn more about about the prevailing design pattern ("REST") for web-based APIs:
 - http://en.wikipedia.org/wiki/Representational_state_transfert
 
One wildcard is the wide variety of authentication strategies employed ("basic auth", cookies, bearer token, OAuth, OAuth 2, ...).  For several of these, the documentation at http://docs.python-requests.org/en/latest/user/authentication/ is helpful.

*Copyright &copy; 2014 The Data Incubator.  All rights reserved.*