# Northumbria Police Crime Data 
### From May 2020- April 2023

The data for this notebook is downloaded from [the police.uk website](https://data.police.uk/data), for the months containing and between May 2020 up to April 2023. In the downloaded .zip file, are folders for each month called '2020-05' for example. In each folder are 3 .csv files:
- one for street crimes
- one is for crime outcomes
- and the other is stop and search data

Primarily this notebook wants to look at the **street crime data**, ultimately producing an interactive map with the location of each crime tagged and labelled.

In [None]:
import pandas as pd

## A single month

First to practice the neccesary code, we will play with the data for a single month, namely **May 2020**.

In [None]:
df_may2020 = pd.read_csv("2020-05/2020-05-northumbria-street.csv")
print("Dataset successuly loaded into pandas dataframe!")
print("Shape of dataset:", df_may2020.shape)

Now we can look at the first few elements of the dataframe to get a feel for the data.

In [None]:
df_may2020.head(5)

#### Cleaning the data

Firstly, the LSOA code/name describe geographic area, but isn't neccesary for this analysis, as we're using the longitude and latitude values only, so we can drop this straight away.

In [None]:
df_may2020.drop(['LSOA code', 'LSOA name'], axis=1, inplace=True)

The data still looks quite messy so we'll need to do some cleaning before an analysis can take place. To begin with, we'll look at NaN values.

In [None]:
missing_data = df_may2020.isnull()
missing_data.head(5)

We can count how many missing elements there are for each column, and the we can decide what to do with them.

In [None]:
for column in missing_data.columns.values.tolist():
    print(column)
    print (missing_data[column].value_counts())
    print("")   

From this, we can see that for `Crime ID` and `Last outcome category` only about half of the entries have values, so we'll drop these columns. We'll also drop the `Context` column as it is empty.

In [None]:
df_may2020.drop(['Crime ID', 'Last outcome category', 'Context'], axis=1, inplace=True)

In [None]:
df_may2020.head()

It also looks like the columns `Reported by` and `Falls within` may be irrelevant, since the crime data here should *only* contain data from Northumbria police. We can check this. 

In [None]:
print(df_may2020['Reported by'].value_counts())
print(df_may2020['Falls within'].value_counts())

As suspected, these columns *all* contian Northumbria Police, so they can be dropped too.

In [None]:
df_may2020.drop(['Reported by', 'Falls within'], axis=1, inplace=True)

In [None]:
df_may2020.head()

The last thing to check whilst cleaning the data is the data types. 

In [None]:
df_may2020.dtypes

These are all as we would expect, so we can continue with the analysis.

#### Analysis

To pin each of these crimes by their latitudes and longitudes on a map, we can use the `folium` package.

In [None]:
import folium

Lets pull up a map of Northumberland, centered on Newcastle.

In [None]:
northumberland_lat = 54.979
northumberland_long = -1.61

northumberland_map = folium.Map(location=[northumberland_lat, northumberland_long], zoom_start=13)
northumberland_map

We can now superimpose the crime data onto the map using `FeatureGroups()`.

In [None]:
street_crimes = folium.map.FeatureGroup()

# Labels data for crime type
labels = list(df_may2020['Crime type'])

# Loop through each crime and add each to the feature group
for lat, lng, labels in zip(df_may2020.Latitude, df_may2020.Longitude, df_may2020['Crime type']):
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            popup=labels,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(northumberland_map)
    
# Add the crimes to the map
northumberland_map

From this map, already you can pan around, and zoom in, and by clicking on each marker, a description of the `Crime type` is shown. 

The original plan had been to superimpose all months of crime ontop of one another, but as is clear, the map is already quite cluttered. One solution to this is to group the markers togeather when zooming out.

In [None]:
from folium import plugins

# Start with a clean map
northumberland_map = folium.Map(location=[northumberland_lat, northumberland_long], zoom_start=13)

# Instantiate a mark cluster object for the street crimes in the dataframe
street_crimes = plugins.MarkerCluster().add_to(northumberland_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_may2020.Latitude, df_may2020.Longitude, df_may2020['Crime type']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=labels,
    ).add_to(street_crimes)

# display map
northumberland_map

This map is now much more versatile for adding additional data. I'd like the versatility of being able to change the months accessed on the fly, so I'd like to create a function that will automatically produce the above plot for any of the available months. 

To do this it will first be very helpful to create one massive dataframe containing all of the crimes from every month analysed.

## Any month

Using the `pathlib` library, it is easy to read in all of the .csv files and concatenate them into one massive dataframe which can easily be cleaned.

In [None]:
from pathlib import Path #for reading files
from datetime import datetime #for month datatype

# Read each .csv file in root directory ""
dfs = []
for file in Path("").glob("**/*street.csv"):
    dfs.append(pd.read_csv(file))

# Concatenate all dataframes
df_all = pd.concat(dfs)

# As before, drop 'LSOA code', 'LSOA name', 'Crime ID', 'Reported by', 'Falls within', 'Last outcome category' and 'Context'
df_all.drop(['LSOA code', 'LSOA name', 'Crime ID', 'Reported by', 'Falls within', 'Last outcome category', 'Context'], 
            axis=1, inplace=True)

In [None]:
df_all.head()

We write a function that will automatically produce a map for a month passed as a parameter. We should check again what data type `Month` is.

In [None]:
df_all.dtypes

Month is an object here, but so long as the user inputs the date in the format `'YYYY-MM'`, there shouldn't be an issue. This will be reiterated when we use the function. I'd like the user to be able to input a list of months, such as `['2020-05', '2020-06', '2023-01']` and have the markers for all these months overlayed at once.

In [None]:
def create_northumberland_crime_map(year_month):
    # Latitude and longitude values of Northumberland, centred on Newcastle
    northumberland_lat = 54.979
    northumberland_long = -1.61
    
    # Create a clean map
    new_northumberland_map = folium.Map(location=[northumberland_lat, northumberland_long], zoom_start=13)
    
    # Create a subset of the full dataframe containing only the months the user has entered
    df_subset = pd.DataFrame(data=None, columns=df_all.columns)
    for entry_in in year_month:
        df_subsubset = df_all[df_all['Month'] == entry_in]
        df_subset = pd.concat([df_subset, df_subsubset])
    print("Successfully created a subset dataframe")
    
    # Create a list of labels from the 'Crime type' attribute
    content = df_subset['Crime type'] + " " + df_subset['Month']
    labels = list(content)

    # Instantiate a mark cluster object for the street crimes in the dataframe
    street_crimes_ = plugins.MarkerCluster().add_to(new_northumberland_map)

    # loop through the dataframe and add each data point to the mark cluster
    for lat, lng, labels, in zip(df_subset.Latitude, df_subset.Longitude, content):
        folium.CircleMarker(
            location=[lat, lng],
            icon=None,
            popup=labels,
            radius = 5, # define how big you want the circle markers to be
            color = 'yellow',
            fill = True,
            fill_color = 'blue',
            fill_opacity = 0.6
        ).add_to(street_crimes_)

    # display map
    display(new_northumberland_map)

Let's try using this function for the months of January, February and March in 2021.

In [None]:
desired_months = ['2021-01', '2021-02', '2021-03']
create_northumberland_crime_map(desired_months)

Using this interactive map, each marker contains not only the crime information, but the date too, since we have multiple months in our dataframe.

## Author

Ted Binns