# Introductory Geocoding and Mapping

Juan Shishido

School of Information

GSR, D-Lab

## Imports

In [1]:
import json
import requests
import pandas
from pprint import pprint

## Using the APIs

A function with options for both DSTK and Photon.

This is just for demonstration purposes. In most situations, you'll probably not want to combine both the DSTK and Photon APIs into a single function. Of course, it's based on preference, so you might, in fact, want to do that. Just know that Photon provides more information (even multiple results, in some cases) than DSTK.

In [2]:
def single_address(address, api='dstk'):
    '''
    Individual address lookup with
    either DSTK or Photon
    
    Default is DSTK's /street2coordinates
    For DSTK's Google-style: 'google'
    For Photon: 'photon'
    
    Address must be a string
    '''
    
    # API check
    assert api in ('dstk', 'google', 'photon')
    
    # Type check
    assert type(address) == str
    
    # /street2coordinates
    dstk_dstk = 'http://www.datasciencetoolkit.org/street2coordinates/'
    
    # Google-style
    dstk_google = 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
    
    # Photon
    photon = 'http://photon.komoot.de/api/?q='
    
    # API
    if api == 'dstk':
        url_prefix = dstk_dstk
    elif api == 'google':
        url_prefix = dstk_google
    elif api == 'photon':
        url_prefix = photon
    
    # URL
    url = url_prefix + address.replace(' ', '+')
    
    # Response
    response = requests.get(url)
    return json.loads(response.text)

## Data Science Toolkit

### Street Address to Coordinates

In [3]:
google_hq = single_address('1600 Amphitheatre Pkwy, Mountain View, CA')
pprint(google_hq)

{u'1600 Amphitheatre Pkwy, Mountain View, CA': {u'confidence': 0.902,
                                                u'country_code': u'US',
                                                u'country_code3': u'USA',
                                                u'country_name': u'United States',
                                                u'fips_county': u'06085',
                                                u'latitude': 37.423471,
                                                u'locality': u'Mountain View',
                                                u'longitude': -122.086546,
                                                u'region': u'CA',
                                                u'street_address': u'1600 Amphitheatre Pkwy',
                                                u'street_name': u'Amphitheatre Pkwy',
                                                u'street_number': u'1600'}}


### Google-style

In [4]:
google = single_address('1600 Amphitheatre Pkwy, Mountain View, CA', 'google')
pprint(google)

{u'results': [{u'address_components': [{u'long_name': u'1600',
                                        u'short_name': u'1600',
                                        u'types': [u'street_number']},
                                       {u'long_name': u'Amphitheatre Pkwy',
                                        u'short_name': u'Amphitheatre Pkwy',
                                        u'types': [u'route']},
                                       {u'long_name': u'Mountain View',
                                        u'short_name': u'Mountain View',
                                        u'types': [u'locality',
                                                   u'political']},
                                       {u'long_name': u'CA',
                                        u'short_name': u'CA',
                                        u'types': [u'administrative_area_level_1',
                                                   u'political']},
                                     

DSTK provides a Google-style option to make it easier for people already using Google's geocoding API. Simply replace `maps.googleapis.com` with `www.datasciencetoolkit.org`.

## Photon

In [5]:
google = single_address('1600 Amphitheatre Pkwy, Mountain View, CA', 'photon')
pprint(google)

{u'features': [{u'geometry': {u'coordinates': [-122.0850862, 37.4228139],
                              u'type': u'Point'},
                u'properties': {u'city': u'Mountain View',
                                u'country': u'United States of America',
                                u'housenumber': u'1600',
                                u'name': u'Google Headquaters',
                                u'osm_id': 2192620021,
                                u'osm_key': u'office',
                                u'osm_type': u'N',
                                u'osm_value': u'commercial',
                                u'postcode': u'94043',
                                u'state': u'California',
                                u'street': u'Amphitheatre Parkway'},
                u'type': u'Feature'}],
 u'type': u'FeatureCollection'}


Photon provides much more data than DSTK. In can also returns multiple entries. In those cases, you'll need to parse through the JSON to get what you need.

## Batch Geocoding

### Command-line Interface

> The standard IPython kernel allows running code in other languages using the %%magic syntax.

You can use cURL to access the DSTK API and even save the output to a file. First, invoke the `bash` magic command. It is only active in the cell in which it's called.

#### Single address

With the code below, you're submitting a POST request to the DSTK server. It prints the results and provides a table with additional information.

In [6]:
%%bash

curl -d "1600 Amphitheatre Pkwy, Mountain View, CA" \
     http://www.datasciencetoolkit.org/street2coordinates

{
  "1600 Amphitheatre Pkwy, Mountain View, CA": {
    "country_code3": "USA",
    "latitude": 37.423471,
    "country_name": "United States",
    "longitude": -122.086546,
    "street_address": "1600 Amphitheatre Pkwy",
    "region": "CA",
    "confidence": 0.902,
    "street_number": "1600",
    "locality": "Mountain View",
    "street_name": "Amphitheatre Pkwy",
    "fips_county": "06085",
    "country_code": "US"
  }
}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   467  100   426  100    41   1148    110 --:--:-- --:--:-- --:--:--  1151


Note: The backslash in the command is simply to allow us to type the command across multiple lines.

#### Addresses from a file

This command has three main components (listed from back-to-front):

    http://www.datasciencetoolkit.org/street2coordinates
    
    -d @data/bartaddresses.txt
    
    -o data/bartcoordinates.json

The first of these tells cURL the location of the API.

The next one relates to the data to be sent to the API. This uses the `-d` flag. Use the `@` symbol to indicate that the addresses should be read from a file.

The `-o` flag and the argument that follows it, tells cURL to save the output to a file named `bartcoordinates.json`.

In [7]:
%%bash

curl -o data/bartcoordinates.json -d @data/bartaddresses.txt \
     http://www.datasciencetoolkit.org/street2coordinates

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 15 20151    6  1250  100  1940   1883   2922  0:00:09 --:--:--  0:00:09  2921100 20151  100 18211  100  1940  20524   2186 --:--:-- --:--:-- --:--:-- 20507


Look in `./data` to find your geocoded addresses.

## Processing the Data