# Explore UK Crime Data with Pandas and GeoPandas


## Table of Contents

1. [London boroughs](#boroughs)<br>
2. [Crime data](#crime)<br>
    2.1. [Load data](#load2)<br>
    2.2. [Explore data](#explore2)<br>

In [None]:
import pandas as pd
import geopandas as gpd
import geoplot 
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from datetime import datetime

%matplotlib inline

<div class="alert alert-danger" style="font-size:100%">
When you are using <b>Watson Studio</b> to run the workshop you will need to add the project token to your notebook that you created earlier to be able to access the shape files. 

* Click the 3 dots at the top of the notebook to insert the project token. This will create a new cell in the notebook that you will need to run first before continuing with the rest of the notebook. If you are sharing this notebook you should remove this cell, else anyone can use you Cloud Object Storage from this project.

If you cannot find the new cell it is probably at the top of this notebook. Scroll up, run the cell and continue with the rest of the notebook.

* Also add the following files to your Cloud Object Store (click the 1010 button at the top right if you do not see the menu on the right of the notebook):
    - 2018-1-metropolitan-street.zip
    - 2018-2-metropolitan-street.zip
    - 2018-metropolitan-stop-and-search.zip
* And run the following cell with the helper function

</div> 

In [None]:
# define the helper function 
def download_file_to_local(project_filename, local_file_destination=None, project=None):
    """
    Uses project-lib to get a bytearray and then downloads this file to local.
    Requires a valid `project` object.
    
    Args:
        project_filename str: the filename to be passed to get_file
        local_file_destination: the filename for the local file if different
        
    Returns:
        0 if everything worked
    """
    
    project = project
    
    # get the file
    print("Attempting to get file {}".format(project_filename))
    _bytes = project.get_file(project_filename).read()
    
    # check for new file name, download the file
    print("Downloading...")
    if local_file_destination==None: local_file_destination = project_filename
    
    with open(local_file_destination, 'wb') as f: 
        f.write(bytearray(_bytes))
        print("Completed writing to {}".format(local_file_destination))
        
    return 0

<a id="boroughs"></a>
## 1. London boroughs

In [None]:
# load data from a url
boroughs = gpd.read_file("https://skgrange.github.io/www/data/london_boroughs.json")

In [None]:
boroughs.head()

<a id="crime"></a>
## 2. Crime data

The crime data is pre-processed in this [notebook](https://github.com/IBMDeveloperUK/foss4g-geopandas/blob/master/notebooks/prepare-uk-crime-data.ipynb) so it is easier to read here. We will only look at data from 2018. But feel free to also load the data from 2017 that is also provided in repository. Or adapt the pre-processing notebook to explore even more data.

Data is downloaded from https://data.police.uk/ ([License](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/))

<a id="load2"></a>
### 2.1. Load data

This dataset cannot be loaded into a geoDataFrame directly. Instead the data is loaded into a DataFrame and then converted:

In [None]:
download_file_to_local('2018-1-metropolitan-street.zip', project=project)
download_file_to_local('2018-2-metropolitan-street.zip', project=project)
street = pd.read_csv("./2018-1-metropolitan-street.zip")
street2 = pd.read_csv("./2018-2-metropolitan-street.zip")
street = street.append(street2) 

download_file_to_local('2018-metropolitan-stop-and-search.zip', project=project)
stop_search = pd.read_csv("./2018-metropolitan-stop-and-search.zip")

In [None]:
street.head()

In [None]:
stop_search.head()

#### Convert to geoDataFrames

In [None]:
street['coordinates'] = list(zip(street.Longitude, street.Latitude))
street['coordinates'] = street['coordinates'].apply(Point)
street = gpd.GeoDataFrame(street, geometry='coordinates')
street.head()

In [None]:
stop_search['coordinates'] = list(zip(stop_search.Longitude, stop_search.Latitude))
stop_search['coordinates'] = stop_search['coordinates'].apply(Point)
stop_search = gpd.GeoDataFrame(stop_search, geometry='coordinates')
stop_search.head()

<a id="explore2"></a>
### 5.2. Explore data


<div class="alert alert-success">
 <b>EXERCISES</b> <br/> 
 Explore the data with Pandas. There are no right or wrong answers, the questions below give you some suggestions at what to look at. <br/> 

Noticed anything odd about the latitudes and longitudes? Read here how the data is anonymised: https://data.police.uk/about/.
</div>  

<div class="alert alert-success">
 <b>QUESTION 1 - Explore the street DataFrame</b> <br/> 
  <ul>
  <li>How much data is there?</li>
  <li>Are there missing values? Should these rows be deleted?  </li>
  <li>Which columns of the datasets contain useful information? What kind of categories are there and are they all meaningful?</li>
  <li>Convert the Date to datetime with <font face="Courier">.apply(lambda x: datetime.strptime(x, "%Y-%m")</font> </li>     
  </ul> 
</div>  

In [None]:
# answer 1


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer1.py
# number of data points
print ('rows in street: '+str(len(street)))

# columns 
print ('Columns: '+str(street.columns))

# categories
print ('Crime type: '+str(street['Crime type'].unique()))
print ('Last outcome category: '+str(street['Last outcome category'].unique()))
print (street['Context'].unique())

# delete columns
street = street.drop(columns=['Unnamed: 0','Latitude', 'Longitude','Context'])

# convert Date to datetime
street['Month'] = street['Month'].apply(lambda x: datetime.strptime(x, "%Y-%m"))

street.head()


<div class="alert alert-success">
 <b>QUESTION 2</b> - Which crime type occurs most often? And near which location? <br/> 
</div> 

> Hints: use [groupby()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) to summarize the data and [sort_values()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) to sort the data. 

In [None]:
# answer 2


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer2.py

<div class="alert alert-success">
 <b>QUESTION 3 - Visualise the street DataFrame: </b> <br/> 
  <ul>
  <li> The total number of crimes per crime type </li>
  <li>The total number of crimes per month </li>
  </ul> 
</div> 

In [None]:
# answer 3


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer3.py

<div class="alert alert-success">
 <b>QUESTION 4 - Explore the stop_search DataFrame</b> <br/> 
  <ul>
  <li>How much data is there?</li>
  <li>Are there missing values? Should these rows be deleted?  </li>
  <li>Which columns of the datasets contain useful information? What kind of categories are there and are they all meaningful?</li>
  <li>Convert the Date to datetime with <font face="Courier">.apply(lambda x: datetime.strptime(x, "%Y-%m")</font> </li>     
  </ul> 
</div> 

In [None]:
# answer 4


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer4.py

<div class="alert alert-success">
 <b>QUESTION 5 - Visualise the stop_search DataFrame: </b> <br/> 
  <ul>
  <li> The total number of stops per object of search </li>
  <li> The total number of stops per outcome </li>
  <li> The total number of crimes per month (optional: split these out in categories)</li>
  </ul> 
</div>

In [None]:
# answer 5


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer5.py

### Some observations and ideas to further explore the data

* The number of stop and searches seems to go up. That is something you could investigate further. Is any of the categories increasing? 
* Another interesting question is how the object of search and the outcome are related. Are there types of searches where nothing is found more frequently? 
* How could you combine the two datasets?

### Spatial join

> The below solution was found [here](https://gis.stackexchange.com/questions/306674/geopandas-spatial-join-and-count) after googling for 'geopandas count points in polygon'

The `crs` needs to be the same for both GeoDataFrames. 

In [None]:
print(boroughs.crs)
print(stop_search.crs)

Add a borough to each point with a spatial join. This will add the `geometry` and other columns from `boroughs2` to the points in `stop_search`. 

In [None]:
stop_search.crs = boroughs.crs
dfsjoin = gpd.sjoin(boroughs,stop_search) 
dfsjoin.head()

Then aggregate this table by creating a [pivot table](https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html) where for each borough the number of types each of the categories in `Object of search` are counted. Then drop the pivot level and remove the index, so you can merge this new table back into the `boroughs2` DataFrame.

In [None]:
dfpivot = pd.pivot_table(dfsjoin,index='code',columns='Object of search',aggfunc={'Object of search':'count'})
dfpivot.columns = dfpivot.columns.droplevel()
dfpivot = dfpivot.reset_index()
dfpivot.head()

In [None]:
boroughs2 = boroughs.merge(dfpivot, how='left',on='code')
boroughs2.head()

Let's make some maps!

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(20,5))

p1=boroughs2.plot(column='Controlled drugs',ax=axs[0],cmap='Blues',legend=True);
axs[0].set_title('Controlled drugs', fontdict={'fontsize': '12', 'fontweight' : '5'});

p2=boroughs2.plot(column='Stolen goods',ax=axs[1], cmap='Reds',legend=True);
axs[1].set_title('Stolen goods', fontdict={'fontsize': '12', 'fontweight' : '5'});


<div class="alert alert-success">
 <b>QUESTION 6 </b> - Improve the above maps. How many actual arrests are there in each borough? Use the above method but first select only the arrests using the column 'Outcome'. <br/> 
</div>  

In [None]:
# answer 6


In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer6a.py

In [None]:
# %load https://raw.githubusercontent.com/IBMDeveloperUK/foss4g-geopandas/master/answers/crime_answer6b.py

<div class="alert alert-success">
 <b>QUESTION 7 </b> - Are there changes over time? Is there a difference between months? Use `street` and look at Westminster or another borough where the crime rate seems higher. Is there a difference in crimes commited by men and women? And what about the age groups? <br/> 
</div>  

<div class="alert alert-success">
 <b>QUESTION 8 </b> - Make heatmaps for different crime types or other classes (optionally for different periods in the year) <br/>
</div> 

### Author
Margriet Groenendijk is a Data & AI Developer Advocate for IBM. She develops and presents talks and workshops about data science and AI. She is active in the local developer communities through attending, presenting and organising meetups. She has a background in climate science where she explored large observational datasets of carbon uptake by forests during her PhD, and global scale weather and climate models as a postdoctoral fellow. 

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.