# Explore UK Crime Data with Pandas and GeoPandas


## Table of Contents

1. [London boroughs](#boroughs)<br>
2. [Crime data](#crime)<br>
    2.1. [Load data](#load2)<br>
    2.2. [Explore data](#explore2)<br>

<div class="alert alert-danger" style="font-size:100%">
When you are using <b>Watson Studio</b> to run the workshop you will need to add the project token to your notebook that you created earlier to be able to access the shape files. 

* Click the 3 dots at the top of the notebook to insert the project token. This will create a new cell in the notebook that you will need to run first before continuing with the rest of the notebook. If you are sharing this notebook you should remove this cell, else anyone can use you Cloud Object Storage from this project.

If you cannot find the new cell it is probably at the top of this notebook. Scroll up, run the cell and continue with the rest of the notebook.

* Also add the following files from [this GitHub repo](https://github.com/IBMDeveloperUK/Python-Geopandas-Workshop/tree/master/data) to your Cloud Object Store (click the 1010 button at the top right if you do not see the menu on the right of the notebook):
    - 2018-1-metropolitan-street.zip
    - 2018-2-metropolitan-street.zip
    - 2018-metropolitan-stop-and-search.zip
* And run the following cell with the helper function

</div> 

### Installing geopandas

geopandas has many dependencies with other packages, so be careful!

* [geopandas installation instructions](https://geopandas.readthedocs.io/en/latest/getting_started/install.html)
* [geoplot installation instructions](https://residentmario.github.io/geoplot/installation.html)

In [None]:
!time conda install --freeze-installed mapclassify descartes geopandas

In [None]:
!time pip install geoplot

In [None]:
import requests
import numpy as np
import pandas as pd
import geopandas as gpd
import geoplot 
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from datetime import datetime

%matplotlib inline

In [None]:
# define the helper function 
def download_file_to_local(project_filename, local_file_destination=None, project=None):
    """
    Uses project-lib to get a bytearray and then downloads this file to local.
    Requires a valid `project` object.
    
    Args:
        project_filename str: the filename to be passed to get_file
        local_file_destination: the filename for the local file if different
        
    Returns:
        0 if everything worked
    """
    
    project = project
    
    # get the file
    print("Attempting to get file {}".format(project_filename))
    _bytes = project.get_file(project_filename).read()
    
    # check for new file name, download the file
    print("Downloading...")
    if local_file_destination==None: local_file_destination = project_filename
    
    with open(local_file_destination, 'wb') as f: 
        f.write(bytearray(_bytes))
        print("Completed writing to {}".format(local_file_destination))
        
    return 0

<a id="boroughs"></a>
## 1. London boroughs

In [None]:
# load data from a url
boroughs = gpd.read_file("https://skgrange.github.io/www/data/london_boroughs.json")

In [None]:
boroughs.head()

In [None]:
boroughs.plot(column='area_hectares');

In [None]:
boroughs['all'] = 1
allboroughs = boroughs.dissolve(by='all',aggfunc='sum')
allboroughs.plot();

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/IBMDeveloperUK/crime-data-workshop/master/data/london-borough-profiles.csv',encoding = 'unicode_escape')
df.head()

In [None]:
boroughs = boroughs.set_index('code').join(df.set_index('Code'))
boroughs.head()

In [None]:
boroughs.plot(column='Proportion_of_seats_won_by_Conservatives_in_2014_election', cmap='Blues');

In [None]:
boroughs.plot(column='Proportion_of_seats_won_by_Labour_in_2014_election', cmap='Reds');

In [None]:
boroughs['paygap'] = \
    ((boroughs['Gross_Annual_Pay_-_Male_(2016)'] - boroughs['Gross_Annual_Pay_-_Female_(2016)'])/ \
    boroughs['Gross_Annual_Pay_-_Male_(2016)']) * 100

[fig,ax] = plt.subplots(1, figsize=(12, 8))

boroughs.plot(ax=ax, color="lightgrey", edgecolor='black', linewidth=0.5)

boroughs.dropna().plot(column='paygap', cmap='Reds', edgecolor='black', linewidth=0.5,
               legend=True, ax=ax, scheme='equal_interval');
ax.axis('off');
ax.set_title('Gender pay gap in London (2016)');


<a id="crime"></a>
## 2. Crime data

#### Using the API

[A list of all available data sets](https://data.police.uk/docs/method/crimes-street-dates/)

In [None]:
data_list = requests.get('https://data.police.uk/api/crimes-street-dates')  
print(data_list.status_code)

In [None]:
data_list_json = data_list.json() 
data_list_df = pd.json_normalize(data_list_json)
data_list_df.head()

In [None]:
# months with data
data_months = data_list_df['date'].unique()
print(np.sort(data_months))

In [None]:
# force IDs
force_IDs = data_list_df['stop-and-search'][0]
print(force_IDs)

<a id="load2"></a>
### 2.1. Load data

The rest of the API does not seem to work, so I downloaded all [latest data](https://data.police.uk/data/archive/latest.zip) from [here](https://data.police.uk/about/) (21GB!).

The crime data is pre-processed in this [notebook](https://github.com/IBMDeveloperUK/foss4g-geopandas/blob/master/notebooks/prepare-uk-crime-data.ipynb) so it is easier to read here. We will only look at data from 2018. But feel free to also load the data from 2017 that is also provided in repository. Or adapt the pre-processing notebook to explore even more data.

This dataset cannot be loaded into a geoDataFrame directly. Instead the data is loaded into a DataFrame and then converted:

In [None]:
download_file_to_local('2018-1-metropolitan-street.zip', project=project)
download_file_to_local('2018-2-metropolitan-street.zip', project=project)
street = pd.read_csv("./2018-1-metropolitan-street.zip")
street2 = pd.read_csv("./2018-2-metropolitan-street.zip")
street = street.append(street2) 

download_file_to_local('2018-metropolitan-stop-and-search.zip', project=project)
stop_search = pd.read_csv("./2018-metropolitan-stop-and-search.zip")

In [None]:
# convert Month to datetime
street['Month'] = pd.to_datetime(street['Month'], format='%Y-%m')

street.head()

In [None]:
# drop columns with same value for all rows
stop_search = stop_search.drop(columns=['Policing operation', 'Part of a policing operation'])

stop_search['Date'] = pd.to_datetime(stop_search['Date'], format='%Y-%m-%dT%H:%M:%S')
stop_search['Year'] = stop_search['Date'].dt.year
stop_search['Month'] = pd.to_datetime(stop_search['Date']).dt.to_period('M')

stop_search.head()

#### Convert to geoDataFrames

In [None]:
street['coordinates'] = list(zip(street.Longitude, street.Latitude))
street['coordinates'] = street['coordinates'].apply(Point)
street = gpd.GeoDataFrame(street, geometry='coordinates')
street.head()

In [None]:
stop_search['coordinates'] = list(zip(stop_search.Longitude, stop_search.Latitude))
stop_search['coordinates'] = stop_search['coordinates'].apply(Point)
stop_search = gpd.GeoDataFrame(stop_search, geometry='coordinates')
stop_search.head()

<a id="explore2"></a>
### 5.2. Explore data


In [None]:
# number of data points
print ('rows in street: '+str(len(street)))

# columns 
print ('Columns: '+str(street.columns))

In [None]:
print(street['Crime type'].unique())

In [None]:
print(street['Last outcome category'].unique())

In [None]:
fig = plt.figure();
street['Crime type'].groupby(street['Crime type']).count().plot.barh(figsize=(14,8));
plt.ylabel(None);

In [None]:
fig = plt.figure();
street['Crime type'].groupby(street['Month']).count().plot(figsize=(14,6));

In [None]:
# group by crime type
street_type = street.groupby(['Month','Crime type'])['Location'].count().unstack(fill_value=0)
street_type.head()

In [None]:
fig = plt.figure();
street_type.plot(figsize=(14,6));
plt.ylabel('arrests / month', fontsize=16);
plt.xlabel(None);
plt.legend(bbox_to_anchor=(1.02, 1.0));

In [None]:
# number of data points
print ('rows in stop_search: '+str(len(stop_search)))

In [None]:
# columns 
print ('Columns: '+str(stop_search.columns))

In [None]:
# categories
print ('Legislation: '+str(stop_search['Legislation'].unique()))
print ('Object of search: '+str(stop_search['Object of search'].unique()))
print ('Outcome: '+str(stop_search['Outcome'].unique()))

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,5))

stop_search['Object of search'].groupby(stop_search['Object of search']).count().plot.bar(ax=ax1);

stop_search['Outcome'].groupby(stop_search['Outcome']).count().plot.bar(ax=ax2);

stop_search['Month'] = stop_search.Date.dt.to_period("M")
stop_search['Object of search'].groupby(stop_search['Month']).count().plot(ax=ax3);


### Spatial join

> The below solution was found [here](https://gis.stackexchange.com/questions/306674/geopandas-spatial-join-and-count) after googling for 'geopandas count points in polygon'

The `crs` needs to be the same for both GeoDataFrames. 

In [None]:
print(boroughs.crs)
print(stop_search.crs)

Add a borough to each point with a spatial join. This will add the `geometry` and other columns from `boroughs2` to the points in `stop_search`. 

In [None]:
stop_search.crs = boroughs.crs
dfsjoin = gpd.sjoin(boroughs,stop_search) 
dfsjoin.head()

Then aggregate this table by creating a [pivot table](https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html) where for each borough the number of types each of the categories in `Object of search` are counted. Then drop the pivot level and remove the index, so you can merge this new table back into the `boroughs2` DataFrame.

In [None]:
dfpivot = pd.pivot_table(dfsjoin,index='id',columns='Object of search',aggfunc={'Object of search':'count'})
dfpivot.columns = dfpivot.columns.droplevel()
dfpivot = dfpivot.reset_index()
dfpivot.head()

In [None]:
boroughs2 = boroughs.merge(dfpivot, how='left',on='id')
boroughs2.head()

Let's make some maps!

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(20,5))

p1=boroughs2.plot(column='Controlled drugs',ax=axs[0],cmap='Blues',legend=True);
axs[0].set_title('Controlled drugs', fontdict={'fontsize': '12', 'fontweight' : '5'});

p2=boroughs2.plot(column='Stolen goods',ax=axs[1], cmap='Reds',legend=True);
axs[1].set_title('Stolen goods', fontdict={'fontsize': '12', 'fontweight' : '5'});


In [None]:
dfsjoin2 = gpd.sjoin(boroughs,stop_search[stop_search['Outcome'] == 'Arrest']) 
dfpivot2 = pd.pivot_table(dfsjoin2,index='id',columns='Object of search',aggfunc={'Object of search':'count'})
dfpivot2.columns = dfpivot2.columns.droplevel()
dfpivot2 = dfpivot2.reset_index()
boroughs3 = boroughs.merge(dfpivot2, how='left',on='id')
boroughs3.head()

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(20,5))

p1=boroughs3.plot(column='Controlled drugs',ax=axs[0],cmap='Blues',legend=True);
p2=boroughs3.plot(column='Stolen goods',ax=axs[1], cmap='Reds',legend=True);

axs[0].set_title('Controlled drugs', fontdict={'fontsize': '12', 'fontweight' : '5'});
axs[1].set_title('Stolen goods', fontdict={'fontsize': '12', 'fontweight' : '5'});

Copyright © 2019-2020 IBM. This notebook and its source code are released under the terms of the MIT License.