# Using APIs for data

An API is one way of getting data from a web resource. Typically you get data by forming a URL - the URL is basically your 'question' (**query** or **request**), and the webpage that is delivered to you (the **endpoint**) contains the 'response' with the data, often in JSON format.

The [postcodes API](http://api.postcodes.io/), for example, can be queried by putting a postcode (without spaces) at the *end* of this URL:

`http://api.postcodes.io/postcodes/`

To ask about the postcode B42 2SU, then, you would add it to the end to form the URL:

`http://api.postcodes.io/postcodes/b422su`

If you go to that URL you will get a bunch of code in **JSON** - this is the data for that postcode. If you want it to look a bit easier to understand use the browser extension [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en).



## Importing `pandas` to fetch the data from the API

To import JSON files we need to import the `pandas` library. This can load data directly from a URL which we will generate to query the API for JSON data.

In [None]:
#import the pandas library and call it 'pd' for the rest of the notebook
import pandas as pd

## Reading data from an online source

The `.read_json()` function from the `pandas` library can be used to import JSON an online source - you just need to use the URL of the file. 

Below we import JSON [from data.police.uk](https://data.police.uk/docs/method/forces/)

In [None]:
#read in the JSON at the specified URL
policedata = pd.read_json("https://data.police.uk/api/forces")
#check the type of object created - it's a pandas dataframe
print(type(policedata))
#show the first 3 rows
print(policedata.head(3))

<class 'pandas.core.frame.DataFrame'>
                  id                            name
0  avon-and-somerset  Avon and Somerset Constabulary
1       bedfordshire             Bedfordshire Police
2     cambridgeshire     Cambridgeshire Constabulary


This particular data 'request' is very simple: the police API [describes it](https://data.police.uk/docs/method/forces/) as:

> "A list of all the police forces available via the API except the British Transport Police, which is excluded from the list returned. Unique force identifiers obtained here are used in requests for force-specific data via other methods."

The request doesn't require any particularly specific information - just one URL fetches all the data. That's quite unusual, however - you'll notice most APIs require information about the data you want.

## Querying the postcodes API

Back to the postcodes API we introduced at the beginning. 

Let's store the URL that we mentioned in an object in Python - we'll call it `url`:

In [None]:
#store the url for the postcode we want data on
url = "https://api.postcodes.io/postcodes/np108xg"

Note that this is only a string of characters - it is *not* the contents that can be found *at* that URL. But now that we've stored that URL address, we are going to grab some data from it.

In [None]:
#fetch the json from the url
json = pd.read_json("https://api.postcodes.io/postcodes/np108xg")
#show the keys - there are only 2 at the top level of the JSON
print(json.keys())
#print it - note there's only 2 columns (sub-branches are ignored)
print(json.head())

Index(['status', 'result'], dtype='object')
                status                                 result
admin_county       200                                   None
admin_district     200                                Newport
admin_ward         200                             Marshfield
ccg                200  Aneurin Bevan University Health Board
ced                200                                   None


## Drilling into the JSON

It's a good idea to have the URL open in a browser at the same time so you can see the structure and work out how to access the bit you're after. Again, you should use Chrome or Firefox with the extension [JSONView](https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en) installed, as this makes it a lot easier to understand. (*Tip: hover over any element to see the 'path' to that element in the bottom left corner of the browser*).

The JSON itself has a tree-like structure with many different branches. Some parts are actually branches-of-branches. 

Those branches-of-branches are handled by `pandas` in a couple of different ways by storing them as dictionaries. 

If we print the contents of that object, you can see those by looking for curly brackets: the `codes` branch, for example, contains the data `{'admin_district': 'E08000025', 'admin_county'...}`

Let's try drilling down into the 'codes' part of the data frame to look further:


In [None]:
json['codes']

KeyError: ignored

### What does `KeyError` mean?

We get an error, specifically a `KeyError` for 'codes', meaning that it cannot find a key with that name. Why?

If you check the page of JSON we grabbed this data from, you will see that actually the first two branches of the JSON data are 'status' and 'result' - the 'codes' branch doesn't come until *within* the 'result' branch.

What has happened is that `pandas` has treated those first two branches as the two columns of data. That's why our data has two problems: first, a column full of `200` which we don't need; and secondly, a data structure which is not ideal: what we would like to be column headings are actually at the start of each row.

Let's try to drill down into the 'result' branch instead:

In [None]:
#show the 'result' branch of the dataframe 'json'
json['result']

admin_county                                                               None
admin_district                                                          Newport
admin_ward                                                           Marshfield
ccg                                       Aneurin Bevan University Health Board
ced                                                                        None
codes                         {'admin_district': 'W06000022', 'admin_county'...
country                                                                   Wales
eastings                                                                 328897
european_electoral_region                                                 Wales
incode                                                                      8XG
latitude                                                               51.56632
longitude                                                             -3.027217
lsoa                                    

### Showing the `.keys()` within the current branch

We can also add `.keys()` on to the end of that to see what fields (sub-branches) there are within *this* branch.

In [None]:
#show the keys of the 'result' branch 
print(json['result'].keys())

Index(['admin_county', 'admin_district', 'admin_ward', 'ccg', 'ced', 'codes',
       'country', 'eastings', 'european_electoral_region', 'incode',
       'latitude', 'longitude', 'lsoa', 'msoa', 'nhs_ha', 'northings', 'nuts',
       'outcode', 'parish', 'parliamentary_constituency', 'postcode',
       'primary_care_trust', 'quality', 'region'],
      dtype='object')


### Drilling down two levels

Then let's try to go from there to 'codes':

In [None]:
#drill down into the 'result' branch and then the 'codes' sub-branch
json['result']['codes']

{'admin_county': 'E99999999',
 'admin_district': 'E08000025',
 'admin_ward': 'E05011155',
 'ccg': 'E38000220',
 'ccg_id': '15E',
 'ced': 'E99999999',
 'lau2': 'E08000025',
 'lsoa': 'E01033561',
 'msoa': 'E02001876',
 'nuts': 'TLG31',
 'parish': 'E43000250',
 'parliamentary_constituency': 'E14000564'}

### Drilling down three levels

And further still into the CCG code stored in 'ccg':

In [None]:
#drill down into the 'result' branch and then the 'codes' sub-branch
#and then the 'ccg' sub-sub-branch!
json['result']['codes']['ccg']

'E38000220'

If we wanted to grab the CCG code for a bunch of postcodes, this is how we might do it:

* Loop through the postcodes
* Generate a URL by adding that postcode to the end of the 'base' API query
* Fetch the JSON generated at that URL
* Drill down into the 'results > codes > ccg' branch of that JSON to get the data we need
* Add it to a data frame alongside the postcode
* Repeat!

## Forming a 'request' for the police data API

The [Police API documentation](https://data.police.uk/docs/) has a number of 'methods' that you can use to request data from their API. 

The '[crimes at location](https://data.police.uk/docs/method/crimes-at-location/)' method allows you to ask for data on crimes based on the location ID, or a latitude and longitude. An example is given for data from February 2017:

`https://data.police.uk/api/crimes-at-location?date=2017-02&lat=52.629729&lng=-1.131592`

However, that date is now so long ago that the URL doesn't actually work. Instead, change the year to 2021 to see a working example:

`https://data.police.uk/api/crimes-at-location?date=2021-02&lat=52.629729&lng=-1.131592`

Let's fetch that data.

In [None]:
#store the URL
jsonurl = "https://data.police.uk/api/crimes-at-location?date=2022-02&lat=52.629729&lng=-1.131592"
#read the JSON at that URL into a variable
crimedata = pd.read_json(jsonurl)
#print it
print(crimedata)

                category location_type  \
0  possession-of-weapons         Force   
1           public-order         Force   
2           public-order         Force   

                                            location context  \
0  {'latitude': '52.629909', 'street': {'id': 883...           
1  {'latitude': '52.629909', 'street': {'id': 883...           
2  {'latitude': '52.629909', 'street': {'id': 883...           

                                      outcome_status  \
0  {'category': 'Unable to prosecute suspect', 'd...   
1  {'category': 'Under investigation', 'date': '2...   
2  {'category': 'Investigation complete; no suspe...   

                                       persistent_id        id  \
0  5c57f25d5a2ed17462e08584bb53afb3ac7476868e0918...  99557405   
1  69e04fe7c5e20a2fdb5cefb9c8045c88a8c62ec7220fd6...  99558269   
2  33337d87ccfef036ecbfb8f5a21de1c48c76a4aae74752...  99561304   

  location_subtype    month  
0                   2022-02  
1                   2022

In [None]:
#show the keys - note that these are only the top-level branches
print(crimedata.keys())

Index(['category', 'location_type', 'location', 'context', 'outcome_status',
       'persistent_id', 'id', 'location_subtype', 'month'],
      dtype='object')


### Drilling down into a branch of that

Now let's try to drill down into it to see the 'location' branch, because we know from the JSON at that URL it contains further sub-branches.

This branch is now a *column*.

In [None]:
#show the location branch
print(crimedata['location'])

0    {'latitude': '52.629909', 'street': {'id': 883...
1    {'latitude': '52.629909', 'street': {'id': 883...
2    {'latitude': '52.629909', 'street': {'id': 883...
Name: location, dtype: object


### Drilling down into a sub-branch - `KeyError`

I want to draw your attention to the fact that there are **multiple items** in this branch. 

This is important when we get an error if we try to drill down further...

In [None]:
#attempt to drill down into the 'street' sub-branch
print(crimedata['location']['street'])

KeyError: ignored

We get an error here - a `KeyError`. Why? 

Let's use the `.keys()` method to see what the keys actually *are* in the `location` branch.

In [None]:
#show the keys for 'crimedata'
crimedata['location'].keys()

RangeIndex(start=0, stop=3, step=1)

What we are seeing here is that the `'location'` branch doesn't have the keys that we might have expected. None of those keys is 'street', which is why we are getting an error, even though we *did* see 'street' as a branch in the JSON. 

Something has happened when we imported the JSON into a dataframe.

So what are we seeing?

This is telling us that when you get to `'location'` you are dealing with a **range** - put another way, a list of items, starting with item index 0, and stopping before 3. If we want to access those items we will have to use an index like `[0]` or `[1]` or `[2]`.

A clue to this was the output of `print(crimedata['location'])` earlier: it was a **series** of rows, one starting with 0 and another with 1 before the keys appeared. 

So we should try an index instead.

In [None]:
#access the 'location' key of 'crimedata' and then the item at index 0
print(crimedata['location'][0])

{'latitude': '52.629909', 'street': {'id': 883345, 'name': 'On or near Marquis Street'}, 'longitude': '-1.132073'}


That seems to work. Now instead of seeing two rows of data we just see the first one, and there are no indices at the start of each line.

We also see something that looks more like a dictionary object, i.e. something with keys. 

And we can test this again by using `.keys()`

In [None]:
#show the keys of the item at index 0 in the 'location' branch
crimedata['location'][0].keys()

dict_keys(['latitude', 'street', 'longitude'])

This is what we expected to see earlier. Can we now drill down further into that 'street' branch *for that row*?

In [None]:
print(crimedata['location'][0]['street'])

{'id': 883345, 'name': 'On or near Marquis Street'}


Yes. And drill down once more into the final level of data?

In [None]:
print(crimedata['location'][0]['street']['name'])

On or near Marquis Street


## Using `json_normalize`

An alternative is to [use a function called `json_normalize`](https://hackersandslackers.com/json-into-pandas-dataframes/).

If we try to use it on the whole dataframe we get an empty dataframe.

In [39]:
pd.json_normalize(crimedata)

0
1
2
3
4
5
6
7
8
9


Instead we need to drill down one level to extract the locations and try those instead.

In [None]:
#extract the data under the 'location' key
locations = crimedata['location']
#show what this branch looks like
locations

0    {'latitude': '52.629909', 'street': {'id': 883...
1    {'latitude': '52.629909', 'street': {'id': 883...
Name: location, dtype: object

You can see that the structure above still includes nested branches (e.g. 'street' has an 'id' and 'name').

But when we use `json_normalize()` that structure gets flattened: the two nested branches are brought to the same level, as `street.id` and `street.name`, the period indicating that in the original json they existed *within* street. 

In [38]:
#use json_normalize on the 'location' column/branches of our dataframe
pd.json_normalize(crimedata['location'])

Unnamed: 0,latitude,longitude,street.id,street.name
0,52.629909,-1.132073,883345,On or near Marquis Street
1,52.629909,-1.132073,883345,On or near Marquis Street
2,52.629909,-1.132073,883345,On or near Marquis Street


We can actually change that by adding a `sep=` argument.

In [41]:
#use json_normalize - specify we want to use an underscore as a separator
pd.json_normalize(crimedata['location'], sep="_")

Unnamed: 0,latitude,longitude,street_id,street_name
0,52.629909,-1.132073,883345,On or near Marquis Street
1,52.629909,-1.132073,883345,On or near Marquis Street
2,52.629909,-1.132073,883345,On or near Marquis Street


This will work for other fields but only where there's nested data/JSON/dictionaries.

Let's take a look at the dataframe to see which fields contain those telltale `{` brackets.

In [None]:
#show the data
crimedata

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month
0,drugs,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Local resolution', 'date': '2021...",21bf6ec31d7744b835eed277943ac5c2542f36429675b0...,90363437,,2021-02
1,violent-crime,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Unable to prosecute suspect', 'd...",a65a86d1f35bc06ede4039382f7c04844d9c01162c1a4a...,90361215,,2021-02


The only other column is `outcome_status`:

In [40]:
#try
pd.json_normalize(crimedata['outcome_status'])

Unnamed: 0,category,date
0,Unable to prosecute suspect,2022-02
1,Under investigation,2022-02
2,Investigation complete; no suspect identified,2022-02


### Combining the results with our original dataframe using `.join()`

Of course there's a good chance we will want to combine the data from these sub-branches with the original dataframe that contains the main details. 

As long as all the dataframes have the same number of columns (and these match up) then we can combine them using `.join()`

This is a method of a pandas dataframe, so you need to name one of the dataframes first, then add `.join(` and in the brackets provide the other dataframe(s) that you want to join it to, closing the brackets once you're done. 

In [57]:
#store the results of flattening the 'location' branch
locationdata = pd.json_normalize(crimedata['location'])
#join that dataframe to the main crimedata one - and store in a new dataframe
crimedata_joined = crimedata.join(locationdata)
#show it
crimedata_joined

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month,streetname,latitude,longitude,street.id,street.name
0,possession-of-weapons,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Unable to prosecute suspect', 'd...",5c57f25d5a2ed17462e08584bb53afb3ac7476868e0918...,99557405,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street
1,public-order,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Under investigation', 'date': '2...",69e04fe7c5e20a2fdb5cefb9c8045c88a8c62ec7220fd6...,99558269,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street
2,public-order,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,{'category': 'Investigation complete; no suspe...,33337d87ccfef036ecbfb8f5a21de1c48c76a4aae74752...,99561304,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street


### Rename any clashing column names

We will, however, encounter a problem when we try to repeat that with the other branch of the data.

In [50]:
#store the results of flattening the 'outcome_status' branch
outcomedata = pd.json_normalize(crimedata['outcome_status'])
#join that dataframe to the crimedata_joined one - and ovewrite it
crimedata_joined = crimedata_joined.join(outcomedata)

ValueError: ignored

In [58]:
#store the results of flattening the 'outcome_status' branch
outcomedata = pd.json_normalize(crimedata['outcome_status'])
#join that dataframe to the crimedata_joined one - and store in another variable
crimedata_normalized = crimedata_joined.join(outcomedata, rsuffix='_branch')
#show it
crimedata_normalized

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month,streetname,latitude,longitude,street.id,street.name,category_branch,date
0,possession-of-weapons,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Unable to prosecute suspect', 'd...",5c57f25d5a2ed17462e08584bb53afb3ac7476868e0918...,99557405,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Unable to prosecute suspect,2022-02
1,public-order,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Under investigation', 'date': '2...",69e04fe7c5e20a2fdb5cefb9c8045c88a8c62ec7220fd6...,99558269,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Under investigation,2022-02
2,public-order,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,{'category': 'Investigation complete; no suspe...,33337d87ccfef036ecbfb8f5a21de1c48c76a4aae74752...,99561304,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Investigation complete; no suspect identified,2022-02


## Drop the columns we've now flattened

We can now use `.drop()` to specify a list of columns we want to drop from the dataframe. 

In [63]:
#drop the specified columns
crimedata_normalized = crimedata_normalized.drop(columns=['location','outcome_status'])
#show the results
crimedata_normalized

Unnamed: 0,category,location_type,context,persistent_id,id,location_subtype,month,streetname,latitude,longitude,street.id,street.name,category_branch,date
0,possession-of-weapons,Force,,5c57f25d5a2ed17462e08584bb53afb3ac7476868e0918...,99557405,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Unable to prosecute suspect,2022-02
1,public-order,Force,,69e04fe7c5e20a2fdb5cefb9c8045c88a8c62ec7220fd6...,99558269,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Under investigation,2022-02
2,public-order,Force,,33337d87ccfef036ecbfb8f5a21de1c48c76a4aae74752...,99561304,,2022-02,On or near Marquis Street,52.629909,-1.132073,883345,On or near Marquis Street,Investigation complete; no suspect identified,2022-02


## Checking which fields are dictionaries

If we wanted to know which fields in a dataframe could be flattened, we could check which ones are dictionaries, using the `type()` function:

In [60]:
#loop through the first row
for i in crimedata.iloc[0]:
  #print the item
  print('data:', i)
  #print the type
  print('type:', type(i))
  #if it is a dictionary, and can therefore be flattened
  if type(i) is dict:
    #run json_normalize on it
    normali = pd.json_normalize(i)
    #and show the results
    print('normalized:', normali)

data: possession-of-weapons
type: <class 'str'>
data: Force
type: <class 'str'>
data: {'latitude': '52.629909', 'street': {'id': 883345, 'name': 'On or near Marquis Street'}, 'longitude': '-1.132073'}
type: <class 'dict'>
normalized:     latitude  longitude  street.id                street.name
0  52.629909  -1.132073     883345  On or near Marquis Street
data: 
type: <class 'str'>
data: {'category': 'Unable to prosecute suspect', 'date': '2022-02'}
type: <class 'dict'>
normalized:                       category     date
0  Unable to prosecute suspect  2022-02
data: 5c57f25d5a2ed17462e08584bb53afb3ac7476868e09184cee4ff617595ebc05
type: <class 'str'>
data: 99557405
type: <class 'numpy.int64'>
data: 
type: <class 'str'>
data: 2022-02
type: <class 'str'>
data: On or near Marquis Street
type: <class 'str'>


## Alternative approach 1: combining dataframes using `pd.concat()`

An alternative to starting with one dataframe and joining others to it using `.join()` is to use the pandas concatenation function `concat()`.

First we store all the results of `json_normalize()` in their own dataframes.

In [None]:
#store outcome_status
outcomesdf = pd.json_normalize(crimedata['outcome_status'], sep="_")
#store locations
locationsdf = pd.json_normalize(crimedata['location'], sep="_")
outcomesdf

Unnamed: 0,category,date
0,Local resolution,2021-03
1,Unable to prosecute suspect,2021-03


The `concat` function from `pandas` needs a list of dataframes to concatenate (as the `obs=` argument) and the `axis` on which you want to concatenate. This defaults to `0`, meaning it will concatenate vertically (e.g. different time periods with the same data structure), but we need to specify `1` if we want to concatenate horizontally (e.g. extra data on the same rows).

In [None]:
#use the concat function to combine the dataframes
pd.concat(objs=[crimedata,outcomesdf,locationsdf],axis=1)

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month,category.1,date,latitude,longitude,street_id,street_name
0,drugs,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Local resolution', 'date': '2021...",21bf6ec31d7744b835eed277943ac5c2542f36429675b0...,90363437,,2021-02,Local resolution,2021-03,52.629909,-1.132073,883345,On or near Marquis Street
1,violent-crime,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Unable to prosecute suspect', 'd...",a65a86d1f35bc06ede4039382f7c04844d9c01162c1a4a...,90361215,,2021-02,Unable to prosecute suspect,2021-03,52.629909,-1.132073,883345,On or near Marquis Street


Note that the original unflattened columns will still be there, so you'll need to remove those.

In [None]:
#use the concat function to combine the dataframes
crimedataflat = pd.concat(objs=[crimedata,outcomesdf,locationsdf],axis=1)
crimedataflat

Unnamed: 0,category,location_type,location,context,outcome_status,persistent_id,id,location_subtype,month,category.1,date,latitude,longitude,street_id,street_name
0,drugs,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Local resolution', 'date': '2021...",21bf6ec31d7744b835eed277943ac5c2542f36429675b0...,90363437,,2021-02,Local resolution,2021-03,52.629909,-1.132073,883345,On or near Marquis Street
1,violent-crime,Force,"{'latitude': '52.629909', 'street': {'id': 883...",,"{'category': 'Unable to prosecute suspect', 'd...",a65a86d1f35bc06ede4039382f7c04844d9c01162c1a4a...,90361215,,2021-02,Unable to prosecute suspect,2021-03,52.629909,-1.132073,883345,On or near Marquis Street


In [None]:
crimedataflat.drop(columns=['location','outcome_status'])

Unnamed: 0,category,location_type,context,persistent_id,id,location_subtype,month,category.1,date,latitude,longitude,street_id,street_name
0,drugs,Force,,21bf6ec31d7744b835eed277943ac5c2542f36429675b0...,90363437,,2021-02,Local resolution,2021-03,52.629909,-1.132073,883345,On or near Marquis Street
1,violent-crime,Force,,a65a86d1f35bc06ede4039382f7c04844d9c01162c1a4a...,90361215,,2021-02,Unable to prosecute suspect,2021-03,52.629909,-1.132073,883345,On or near Marquis Street


## Alternative approach 2: using loops to move sub-branches into their own column

Another way to extract sub-branches of a JSON-based dataframe is to create a loop to go through each row and drill down into the relevant sub-branch.

To do that you need to generate a list of row indices, and then create a loop that goes through each, using it to access the sub-branch in the row with that index. 

The `range()` function is very useful here: it will generate a list of numbers based on the start and end numbers you provide. The resulting range will stop short of the last number, so `range(0,3)` for example, will generate the numbers 0, 1 and 2 (but not 3).

The `len()` function is the other tool we will need: we can use this to find out how many items are in a list, or rows in a dataframe, so we can use that to set the last number in our range. 

PS: One of the quirks of Python is this: on a list with 3 items `len()` will return `3`. But those items have the indices 0, 1 and 2 (because an index of 2 means 'the third item' in a zero-based index).

In [None]:
#how many items?
print(len(crimedata))
#create a list of indices, starting from 0 and ending before that number
indexrange = range(0,len(crimedata))
#this will be printed as a 'range object' with the start and end positions
print(indexrange)

3
range(0, 3)


In [None]:
#create an empty list to store the results
streetnamelist = []

#loop through a range of numbers from 0 to the number of items in the dataframe
for i in range(0,len(crimedata):
  #access the street branch of that 'location' item, then the 'name' sub-sub-branch
  streetname = crimedata['location'][i]['street']['name']
  #print it
  print(streetname)
  #add it to the list
  streetnamelist.append(streetname)

#check it worked - we should now have a list with an item for each row
print(streetnamelist)

#add to the dataframe as a new column
crimedata['streetname'] = streetnamelist

print(crimedata)

On or near Marquis Street
On or near Marquis Street
On or near Marquis Street
['On or near Marquis Street', 'On or near Marquis Street', 'On or near Marquis Street']
                category location_type  \
0  possession-of-weapons         Force   
1           public-order         Force   
2           public-order         Force   

                                            location context  \
0  {'latitude': '52.629909', 'street': {'id': 883...           
1  {'latitude': '52.629909', 'street': {'id': 883...           
2  {'latitude': '52.629909', 'street': {'id': 883...           

                                      outcome_status  \
0  {'category': 'Unable to prosecute suspect', 'd...   
1  {'category': 'Under investigation', 'date': '2...   
2  {'category': 'Investigation complete; no suspe...   

                                       persistent_id        id  \
0  5c57f25d5a2ed17462e08584bb53afb3ac7476868e0918...  99557405   
1  69e04fe7c5e20a2fdb5cefb9c8045c88a8c62ec7220fd6..