# Exploring data - working with APIs

![](../figs/d2s1_website.png)

In [21]:
%matplotlib inline

import pandas as pd
import requests
import json

db = pd.read_csv('data/analysis_ready.csv.gz')\
       .set_index('id')\
       .loc[:, ['longitude', 'latitude']]
db.head()

Unnamed: 0_level_0,longitude,latitude
id,Unnamed: 1_level_1,Unnamed: 2_level_1
15896822,-0.306323,51.410036
4836957,-0.290704,51.411482
13355982,-0.286496,51.415851
13472704,-0.292246,51.415723
17430865,-0.275426,51.404285


![](../figs/d2s1_api.png)

Eg.

> `https://data.police.uk/api/outcomes-at-location?date=2017-01&location_id=883498`

In [17]:
url = ('https://data.police.uk/api/outcomes-at-location?'\
       'date=2017-03&lat=53.40146&lng=-2.96459')
url

'https://data.police.uk/api/outcomes-at-location?date=2017-03&lat=53.40146&lng=-2.96459'

Try it in a browser!

![](../figs/d2s1_json.png)

In [18]:
r = requests.get(url)

In [20]:
# Very large, showing only first 100 chars
r.content[:100]

b'[{"category":{"code":"court-result-unavailable","name":"Court result unavailable"},"date":"2017-09",'

In [25]:
crimes = json.loads(r.content)

To note:

* We go from string to dictionary thanks to the JSON schema and Python dictionary data structure (what are these?)
* Key/value system for dictionaries

## Parsing a single crime

In [27]:
type(crimes)

list

In [28]:
len(crimes)

1032

In [29]:
cr = crimes[0]
cr

{'category': {'code': 'court-result-unavailable',
  'name': 'Court result unavailable'},
 'crime': {'category': 'shoplifting',
  'context': '',
  'id': 55659241,
  'location': {'latitude': '53.405686',
   'longitude': '-2.983495',
   'street': {'id': 910344, 'name': 'On or near Shopping Area'}},
  'location_subtype': 'DEPARTMENT STORES',
  'location_type': 'Force',
  'month': '2017-03',
  'persistent_id': 'afd54aeff44eea8dffb3cfcfb579b19774286c994a4ac5b89c75d443c90c3679'},
 'date': '2017-09',
 'person_id': None}

In [30]:
cr.keys()

dict_keys(['category', 'date', 'person_id', 'crime'])

In [31]:
cr.values()

dict_values([{'code': 'court-result-unavailable', 'name': 'Court result unavailable'}, '2017-09', None, {'category': 'shoplifting', 'location_type': 'Force', 'location': {'latitude': '53.405686', 'street': {'id': 910344, 'name': 'On or near Shopping Area'}, 'longitude': '-2.983495'}, 'context': '', 'persistent_id': 'afd54aeff44eea8dffb3cfcfb579b19774286c994a4ac5b89c75d443c90c3679', 'id': 55659241, 'location_subtype': 'DEPARTMENT STORES', 'month': '2017-03'}])

In [32]:
cr['category']

{'code': 'court-result-unavailable', 'name': 'Court result unavailable'}

Let's say we want to keep:

- Category code
- Crime ID, category, and location (lon/lat)
- Date
- Location type and subtype

In [35]:
cr_parsed = pd.Series({'category_code': cr['category']['code'],\
                       'crime_category': cr['crime']['category'], \
                       'cime_id': cr['crime']['id'],\
                       'cime_category': cr['crime']['category'], \
                       'longitude': cr['crime']['location']['longitude'], \
                       'latitude': cr['crime']['location']['latitude'], \
                       'date': cr['crime']['month'], \
                       'crime_location_type': cr['crime']['location_type'], \
                       'crime_location_subtype': cr['crime']['location_subtype']
                      })
cr_parsed

category_code             court-result-unavailable
cime_category                          shoplifting
cime_id                                   55659241
crime_category                         shoplifting
crime_location_subtype           DEPARTMENT STORES
crime_location_type                          Force
date                                       2017-03
latitude                                 53.405686
longitude                                -2.983495
dtype: object

For advanced ninjas, here's the function version!

In [36]:
def parse_crime(cr):
    cr_parsed = pd.Series({'category_code': cr['category']['code'],\
                       'crime_category': cr['crime']['category'], \
                       'cime_id': cr['crime']['id'],\
                       'cime_category': cr['crime']['category'], \
                       'longitude': cr['crime']['location']['longitude'], \
                       'latitude': cr['crime']['location']['latitude'], \
                       'date': cr['crime']['month'], \
                       'crime_location_type': cr['crime']['location_type'], \
                       'crime_location_subtype': cr['crime']['location_subtype']
                      })
    return cr_parsed

In [38]:
parse_crime(crimes[2])

category_code             court-result-unavailable
cime_category                         public-order
cime_id                                   55655525
crime_category                        public-order
crime_location_subtype                        ROAD
crime_location_type                          Force
date                                       2017-03
latitude                                 53.397361
longitude                                -2.948081
dtype: object

To note:

* It works because argument is `cr` and the `cr` object is used throughout
* A function (method) can be applied in any context where the object passed as `cr` (it doesn't need to be named that way) meets the expectations of the method (ie. has all the attributes it expects)

## Parsing all crimes from a response

In [43]:
parsed = []

for cr in crimes:
    pc = parse_crime(cr)
    parsed.append(pc)

parsed = pd.DataFrame(parsed)

To note:

* `for` loop
* Set up `parsed` first, then `append`
* Transform a list of `Series` objects into a `DataFrame`
* Reuse of name `parsed`

Alternative approach: `map`

In [47]:
parsed = pd.DataFrame(list(map(parse_crime, crimes)))

## Parsing crimes from several locations

In [55]:
url_base = ('https://data.police.uk/api/outcomes-at-location?'\
            'date=2017-03&lat=XXXlatXXX&lng=XXXlonXXX')

In [None]:
%%time
all_crimes = []

for pid, xy in db.iloc[:5, :].iterrows():
    lon, lat = xy
    url = url_base.replace('XXXlatXXX', str(lat))\
                  .replace('XXXlonXXX', str(lon))
    crimes = requests.get(url)
    crimes = json.loads(crimes.content)
    parsed = pd.DataFrame(list(map(parse_crime, crimes)))
    all_crimes.append(parsed)

all_crimes = pd.concat(all_crimes)

To note:

- Loop over rows efficiently with `iterrows`
- Dynamic split (eg. `pid, xy`, `lon, lat`)
- Customise `base_url` with `replace`
- Use of `concat` to take a list of `DataFrame` objects and concatenate them

**NOTE**: The below takes some time to run!

In [None]:
%%time
all_crimes = []

for pid, xy in db.iterrows():
    lon, lat = xy
    url = url_base.replace('XXXlatXXX', str(lat))\
                  .replace('XXXlonXXX', str(lon))
    crimes = requests.get(url)
    crimes = json.loads(crimes.content)
    parsed = pd.DataFrame(list(map(parse_crime, crimes)))
    all_crimes.append(parsed)

all_crimes = pd.concat(all_crimes)

To do:

- Drop duplicates
- Write out

**Exercise**

- Obtain crimes around Abercromby Square in Liverpool for the last 12 months