# Spatial Analysis of crime patterns in the city of London to identify and predict the vulnerable areas in the city.

**Data**

Crime data: https://data.police.uk/data/

London Boundary: : https://data.london.gov.uk/dataset/london_boroughs 

In [None]:
import pandas as pd
import os
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import pyproj
import contextily as cx
import folium
from folium.plugins import MarkerCluster
from sklearn.model_selection import train_test_split

## Data download and Exploratory data analysis

In [None]:
# Specify the folder containing the CSV files
folder_path = "Crime_data"

# List all files in the folder that match the pattern
file_paths = [os.path.join(folder_path, file) for file in os.listdir(folder_path) if file.endswith(".csv")]

# Read and concatenate all CSV files
data = [pd.read_csv(file) for file in file_paths]

# Combine all the data into a single DataFrame
Crime_data = pd.concat(data, ignore_index=True)

In [None]:
Crime_data.shape

There were 24158 crimes reported in the city of London. To view the variables of the data, the **. head()**, **.columns** and **.info()** commands are used. This gives the description of all the columns and their data type (type).

In [None]:
Crime_data.head()

In [None]:
Crime_data.columns

In [None]:
Crime_data.info()

In [None]:
Crime_data = Crime_data.drop(['Crime ID','Reported by', 'Falls within', 'Location', 'Last outcome category', 'Context'], axis=1)

In [None]:
Crime_data.isnull().sum()

In [None]:
Crime_data = Crime_data.dropna()
Crime_data.isnull().sum()

In [None]:
Crime_data["Crime type"].value_counts()

The above code .value_counts() is used to study the types of crimes in London. In London, the most predominant crime is theft. 7,672 cases of all kinds of theft were reported from 2021-2023, and this number is followed by violence and sexual offences with 3931 cases. Anti-social behaviour, drugs, and public order were also significant in the city of London. 

In [None]:
Crime_data.info()

In [None]:
Crime_data[["Year", "Month"]] = Crime_data['Month'].str.split('-', expand=True)

In [None]:
# Convert 'Year' and 'Month' columns to numeric 
Crime_data['Year'] = pd.to_numeric(Crime_data['Year'])
Crime_data['Month'] = pd.to_numeric(Crime_data['Month'])
# 
Crime_data.head(3)

In [None]:
Crime_data['Year'].value_counts()

In [None]:
# Plot crime type and year
plt.figure(figsize=(12, 6))# setting the width, height of the figure
# create a count plot using Seaborn
sns.countplot(x='Crime type', hue='Year', data=Crime_data)
# Adding the plot title
plt.title('Crime Type Distribution from 2021-2023')
# Adding x lables
plt.xticks(rotation= 90)
# Add legend
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

In [None]:
Crimes_monthly = Crime_data.sort_values(by = ['Month'])
Crimes_monthly = Crime_data.groupby(['Month']).size().reset_index(name='Count')
Crimes_monthly

In [None]:
London = gpd.read_file('London_boundary/London_Borough_Excluding_MHW.shp')

In [None]:
London.head(3)

In [None]:
City_of_London = London.loc[London.NAME == 'City of London']
City_of_London.boundary.plot()

In [None]:
City_of_London.crs

In [None]:
# Define the source and target coordinate reference systems
source_crs = pyproj.CRS("EPSG:4326")  # WGS84, standard for latitude and longitude
target_crs = pyproj.CRS("EPSG:27700")  # Replace '27700' with the EPSG code of your target CRS

# Create a Transformer
transformer = pyproj.Transformer.from_crs(source_crs, target_crs, always_xy=True)

# Define a function to apply the transformation to each row
def transform_coordinates(row):
    x, y = transformer.transform(row['Longitude'], row['Latitude'])
    return pd.Series({'X': x, 'Y': y})

# Apply the transformation to the entire DataFrame
Crime_data[['X', 'Y']] = Crime_data.apply(transform_coordinates, axis=1)

In [None]:
crime_data_loc = gpd.GeoDataFrame(Crime_data, geometry=gpd.points_from_xy(Crime_data['X'], Crime_data['Y']))
crime_data_loc.head()

In [None]:
crime_data_loc.crs = City_of_London.crs

In [None]:
crime_data_loc.crs

In [None]:
# Plot boundary and points together
fig, ax = plt.subplots(figsize=(10, 10))
City_of_London.boundary.plot(ax=ax, color='blue', label='City of London Boundary')  # Plot the boundary
crime_data_loc.plot(ax=ax, markersize=5, color='red', label='Crime Data')  # Plot crime data locations

# Add legend and title
plt.legend()
plt.title('City of London Crime Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

plt.show()

In [None]:
London_city_crimes = gpd.clip(crime_data_loc, City_of_London)

In [None]:
# Plot boundary and points together
fig, ax = plt.subplots(figsize=(10, 10))
City_of_London.boundary.plot(ax=ax, color='black', label='City of London Boundary')  # Plot the boundary
London_city_crimes.plot(ax=ax, markersize=5, color='red', label='Crime Data')  # Plot crime data locations
cx.add_basemap(ax, crs=City_of_London.crs, source=cx.providers.OpenStreetMap.Mapnik)
# Add legend and title
plt.legend()
plt.title('City of London Crime Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

plt.show()

## Identifying Crime Hotspots

Crimes in the city are not speard symmetrically throughout the city landscape (Braga et al., 2012). The areas with High concentration of crimes are considered as “hotspots” (Wang et al., 2013) or the vulnerable areas. The locations with threshold crime 50 and 150 are considered are crime hotspots. These locations clusters are created using MarkerCluster, this gives points(locations) on the crime map by grouping the surrounding areas into clusters. 

In [None]:
# crime count for each location and crime type
London_crime_counts = London_city_crimes.groupby(['Latitude', 'Longitude', 'LSOA code' ]).size().reset_index(name='Count')
London_crime_counts 

**Analyzing Areas with more than Threshold of 50 and 150 Cirmes**

In [None]:
threshold_1= 50
threshold_2= 150
crimes_50 = London_crime_counts[London_crime_counts['Count'] >= threshold_1]
crimes_150 = London_crime_counts[London_crime_counts['Count'] >= threshold_2]

In [None]:
# Each Crime data 
Theft = Crime_data.loc[Crime_data["Crime type"].isin(["Other theft", "Theft from the person", "Shoplifting", 
                                                      "Bicycle theft", "Vehicle crime", "Robbery"])]

Violence_Sexual_offences = Crime_data.loc[Crime_data["Crime type"] == "Violence and sexual offences"]

Anti_social_Behavior = Crime_data.loc[Crime_data["Crime type"] == "Anti-social behaviour"]

Drugs = Crime_data.loc[Crime_data["Crime type"] == "Drugs"]

Public_order = Crime_data.loc[Crime_data["Crime type"] == "Public order"]

Criminal_damage = Crime_data.loc[Crime_data["Crime type"] == "Criminal damage and arson"]

Burglary = Crime_data.loc[Crime_data["Crime type"] == "Burglary"]

Other_crimes = Crime_data.loc[Crime_data["Crime type"] == "Other crime"]

Possession_of_weapons = Crime_data.loc[Crime_data["Crime type"] == "Possession of weapons"]

In [None]:
# Define the threshold
threshold = 10
# Function to find high crime hotspots
def get_high_crime_hotspots(data, threshold):
    # Group by Latitude and Longitude and count crimes
    hotspots = data.groupby(["Latitude", "Longitude", 'LSOA code']).size().reset_index(name="Count")
    # Filter by threshold
    high_crime_hotspots = hotspots[hotspots["Count"] > threshold]
    return high_crime_hotspots
# Get high crime hotspots for each category
theft_hotspots = get_high_crime_hotspots(Theft, threshold)
violence_hotspots = get_high_crime_hotspots(Violence_Sexual_offences, threshold)
anti_social_hotspots = get_high_crime_hotspots(Anti_social_Behavior, threshold)
drugs_hotspots = get_high_crime_hotspots(Drugs, threshold)
public_order_hotspots = get_high_crime_hotspots(Public_order, threshold)
criminal_damage_hotspots = get_high_crime_hotspots(Criminal_damage, threshold)
burglary_hotspots = get_high_crime_hotspots(Burglary, threshold)
other_crime_hotspots = get_high_crime_hotspots(Other_crimes, threshold)
weapons_hotspots = get_high_crime_hotspots(Possession_of_weapons, threshold)

In [None]:
# Create a Folium map centered on London
london_map = folium.Map(location=[51.5118200, -0.089299], zoom_start=15)

# Function to add crime hotspots to the map
def add_crime_points(data, map_object, color, group_name):
    marker_cluster = MarkerCluster(name=group_name).add_to(map_object)
    for _, row in data.iterrows():
        folium.CircleMarker(
            location=[row['Latitude'], row['Longitude']],
            radius=5,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.6,
            popup=f"Count: {row['Count']}"
        ).add_to(marker_cluster)

# Add crimes with threshold >= 50
add_crime_points(crimes_50, london_map, color='lightblue', group_name='Crimes >= 50')
# Add crimes with threshold >= 150
add_crime_points(crimes_150, london_map, color='darkred', group_name='Crimes >= 150')
# Add hotspots for each crime type
add_crime_points(theft_hotspots, london_map, color='blue', group_name='Theft Hotspots')
add_crime_points(violence_hotspots, london_map, color='red', group_name='Violence Hotspots')
add_crime_points(anti_social_hotspots, london_map, color='green', group_name='Anti-Social Behavior Hotspots')
add_crime_points(drugs_hotspots, london_map, color='purple', group_name='Drugs Hotspots')
add_crime_points(public_order_hotspots, london_map, color='orange', group_name='Public Order Hotspots')
add_crime_points(criminal_damage_hotspots, london_map, color='cyan', group_name='Criminal Damage Hotspots')
add_crime_points(burglary_hotspots, london_map, color='brown', group_name='Burglary Hotspots')
add_crime_points(other_crime_hotspots, london_map, color='pink', group_name='Other Crimes Hotspots')
add_crime_points(weapons_hotspots, london_map, color='yellow', group_name='Weapons Possession Hotspots')

In [None]:
city_boundary = City_of_London.to_crs("EPSG:4326")
city_boundary_ = city_boundary.to_json()
London = folium.GeoJson(
    city_boundary_,
    name="City of London Boundary",
    style_function=lambda feature: {
        'fillColor': 'none',
        'color': 'black',
        'weight': 2,
    }
).add_to(london_map)

# Add a layer control to toggle groups and boundary
folium.LayerControl(collapsed=False).add_to(london_map)

In [None]:
# Add a title using HTML
title_html = '''
<div style="position: fixed; 
            top: 10px; left: 50%; transform: translate(-50%, 0);
            font-size: 24px; font-weight: bold; color: black;
            border: 2px solid grey; border-radius: 5px;">
    Crime Hotspots in the City of London
</div>
'''

london_map.get_root().html.add_child(folium.Element(title_html))

In [None]:
# Save and display the map
london_map.save("london_crime_map.html")
london_map

## Predicting crime hotspots in the city of London using Random Forest Regression.

**Predicting future hotspots**

To predict crime hotspots, crime data is grouped based on the spatial coordinates (latitude and longitude), month, and year of the crime's occurrence. Then, the data is split into training and testing sets for modelling.

In [None]:
London_crime_counts = London_city_crimes.groupby(['Latitude', 'Longitude', 'Month', 'Year']).size().reset_index(name='Count')

In [None]:
# Group data and prepare for modeling 
X = London_crime_counts[['Latitude', 'Longitude']]
y = London_crime_counts['Count']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, explained_variance_score, median_absolute_error, max_error

In [None]:
Rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

In [None]:
Rf_model.fit(X_train, y_train)

In [None]:
Rf_predictions = Rf_model.predict(X_test)

In [None]:
Rf_mse = mean_squared_error(y_test, Rf_predictions)
Rf_mse

In [None]:
Rf_mae = mean_absolute_error(y_test, Rf_predictions)
Rf_mae

In [None]:
Rf_r2 = r2_score(y_test, Rf_predictions)
Rf_r2 

In [None]:
Rf_explained_variance = explained_variance_score(y_test, Rf_predictions)
Rf_explained_variance

In [None]:
# Combine predicted counts with corresponding locations
Rf_predicted = pd.DataFrame({'Latitude': X_test['Latitude'], 'Longitude': X_test['Longitude'], 'RF_Predicted_Count': Rf_predictions})

# Visualize the predicted hotspots using Folium
Rf_crimemap = folium.Map(location=[51.5118200, -0.089299], zoom_start=15.1)
Rf_marker_cluster = MarkerCluster().add_to(Rf_crimemap)

# Iterate through each predicted record and add a CircleMarker to the MarkerCluster
for index, row in Rf_predicted.iterrows():
    radius = 10
    color = 'green'  # Using green for Random Forest predictions
    
    folium.CircleMarker(location=[row['Latitude'], row['Longitude']],
                        radius=radius,
                        color=color,
                        fill=True,
                        fill_color=color,
                        fill_opacity=0.5,
                        popup=f"RF Predicted Hotspot: {row['Latitude']}, {row['Longitude']}\nCrime Count: {row['RF_Predicted_Count']}").add_to(Rf_marker_cluster)

In [None]:
# Add predictions to test data
X_test['Predicted Count'] = Rf_predictions

threshold = 5
# Filter high hotspots for visualization
hotspots = X_test[X_test['Predicted Count'] > threshold]

# Plot in Folium
london_map = folium.Map(location=[51.5118200, -0.089299], zoom_start=15)
for _, row in hotspots.iterrows():
    folium.CircleMarker(
        location=[row['Latitude'], row['Longitude']],
        radius=row['Predicted Count'] / 5,  # Scaled radius
        color='red',
        fill=True,
        fill_opacity=0.6,
        popup=f"Predicted Crimes: {row['Predicted Count']}"
    ).add_to(london_map)

In [None]:
London = folium.GeoJson(
    city_boundary_,
    name="City of London Boundary",
    style_function=lambda feature: {
        'fillColor': 'none',
        'color': 'black',
        'weight': 2,
    }
).add_to(london_map)

# Add a layer control to toggle groups and boundary
folium.LayerControl(collapsed=False).add_to(london_map)

In [None]:
london_map

In [None]:
london_map.save("Locations_of_predicted_highCrimes.html")