# GG4257 IRP: Staying Safe on Glasgow's Roads

* **Authors:** Sean Healy

* **Student ID:** 200016001
* **Date:** 04/04/2024

**Abstract:**

This study aims to build a system that effectively sorts different kinds of car accidents in Glasgow. It uses data from police reports on 10,000 accidents in the city. The sorting will look at things like weather, road conditions, speed limits, how serious the accidents were, how many people were hurt, and the age and sex of the drivers. It will also look into how important certain points in the road network are in understanding the accidents in Glasgow. By figuring out where accidents happen most in Glasgow, the study wants to help those who make policies come up with ways to lower the number and severity of car accidents.

# GitHub Repository
- **GitHub Link:** https://github.com/SeanHealy/GG4257_IRP

## Declaration

> In submitting this assignment, I hereby confirm that I have read the University's statement on Good Academic Practice. The following work is my own. Significant academic debts and borrowings have been properly acknowledged and referenced.


**Data structure**

Files required for this dataset are located in the repo under IRP_data and Glasgow_OA

**Table of Contents:** 

1. Introduction
2. Methodology
   - 2A. Node Degree Centrality Calculation
   - 2B. Data Recategorisation and Standardisation
   - 2C. K-Means Clustering
3. Results
   - 3A. Visualising Crashes and Road Connectivity
   - 3B. K-mean Clusters
4. Discussion
   - 4A. Limitations
   - 4B. Conclusions
5. Appendix

## Introduction

This report advocates for a detailed, data-driven approach to road safety. Tailored responses could offer better and more cost-effective solutions. However, pinpointing specific measures can be difficult due to the complexity of each accident. Research indicates that a mix of environmental, infrastructure, and driver error factors contribute to the likelihood and severity of a car accident (Pembuain et al., 2019). The combination of these factors makes each accident unique, complicating the prediction of crashes.

The Scottish government has set a high goal to achieve 'the best road safety performance in the world'. It plans to reduce the number and severity of road accidents by focusing on various social and infrastructure policies. These include making roads safer and more 'predictable and forgiving of mistakes' and improving post-accident responses to provide 'effective and appropriate reaction to collisions' (Transport Scotland, nd). Edinburgh, the country's capital, has taken strong measures against road accidents, implementing widespread 20mph speed limits on 80% of its streets (Nightingale et al., 2021). These measures have led to an 8% drop in accidents and a 10% decrease in injuries on roads that reduced speed limits from 30mph to 20mph (Hunter et al., 2022).

Worldwide, traffic accidents are increasing, costing many countries about 3% of their GDP on average (WHO, 2018). They also pose a major threat to public health, with traffic-related deaths reaching 1.35 million in 2016 (ibid). Addressing road safety is a significant challenge for public health and complex for infrastructure planning. Cities must balance, ensuring their residents' safety while keeping the city connected and accessible.

The model will draw on a detailed database containing information on vehicles and drivers involved in over 10,000 crashes in Glasgow from 2005 to 2016. This dataset includes a wide range of factors such as weather conditions, road conditions, speed limits, the severity of the crash, the number of people hurt, and the age and gender of the driver. These elements provide a thorough perspective for analyzing the patterns of road accidents in Glasgow.

Additionally, this study will investigate the impact of node centrality on road accidents. Node degree centrality measures the importance of a point in a network based on how well it is connected to other nodes. This connectivity is a crucial factor in determining the likelihood of accidents. According to the European Commission, 40-60% of accidents occur at intersections in most countries. Therefore, by including measures of node centrality in the analysis, it will be possible to pinpoint weaknesses in the road network and patterns that can inform targeted interventions.

This report aims to facilitate local-level, focused efforts to address road accidents and fatalities. It will use machine learning algorithms to detect patterns of accidents that happen under similar conditions and map these patterns across Glasgow's 5,486 output areas. This approach will enable policymakers to identify specific areas prone to certain types of road accidents. Spotting these local problem spots can not only point out where interventions are needed but also guide the specific changes required to lower the rates of road accidents.

## Methodology

This section includes data collection, pre-processing, and model building.

In [3]:
# import libraries
import pandas as pd 
import geopandas as gpd
import matplotlib.pyplot as plt
from folium import plugins
import numpy as np
from sklearn.preprocessing import StandardScaler
import networkx as nx
import osmnx as ox
import folium
from scipy.spatial import cKDTree
from shapely.geometry import Point
import seaborn as sns
from sklearn.cluster import KMeans
import warnings
from scipy.spatial.distance import cdist, pdist
import plotly.express as px
import matplotlib.pyplot as plt
from lonboard import Map, ScatterplotLayer, SolidPolygonLayer
from sklearn.decomposition import PCA

ModuleNotFoundError: No module named 'lonboard'

### Loading & Preprocessing the Data

The dataset was downloaded from a compiled data source from the UK Department of Transport.

In [None]:
# read crash csv data
crash_data = pd.read_csv("IRP_data/Merged_Crash_Data.csv")

# read Gaslgow output area
glasgow_oa = gpd.read_file("Glasgow_OA/scotland_oa_2011.shp")

# convert crash data to geodataframe w/ lat & long crash data
crash_gdf = gpd.GeoDataFrame(crash_data, crs = 'EPSG:4326', geometry=gpd.points_from_xy(crash_data['Longitude'], crash_data['Latitude']))

# project shapefile to match crash data
glasgow_oa = glasgow_oa.to_crs('EPSG:4326')

#  join shape files keep data within Glasgow
glasgow_data = gpd.sjoin(crash_gdf, glasgow_oa, how="inner", predicate="within")

In [None]:
# download Glasgow street network
glas = ox.graph_from_place("Glasgow, Scotland", network_type="drive")

# set degree centrality as node attribute
nx.set_node_attributes(glas, nx.degree_centrality(glas),name= "degree_centrality")

# convert to GeoDataFrame
nodes, edges = ox.graph_to_gdfs(glas)

In [None]:
# find nearest point points between two data frames

def ckdnearest(gdA, gdB):

    nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
    nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
    btree = cKDTree(nB)
    dist, idx = btree.query(nA, k=1)
    gdB_nearest = gdB.iloc[idx].drop(columns="geometry").reset_index(drop=True)
    gdf = pd.concat(
        [
            gdA.reset_index(drop=True),
            gdB_nearest,
            pd.Series(dist, name='dist')
        ], 
        axis=1)

    return gdf

# new dataframe with crash data linked to closest nodes 
crashes_and_centrality = ckdnearest(glasgow_data, nodes)

In [None]:
# filter data
keep_cols= [
    'Accident_Index',
    'code',
    'council',
    'Latitude',
    'Longitude',
    'geometry',
    'Accident_Severity',
    'Number_of_Vehicles',
    'Number_of_Casualties',
    'Road_Surface_Conditions',
    'Weather_Conditions',
    'Light_Conditions',
    'Age_Band_of_Driver',
    'Age_of_Vehicle',
    'Sex_of_Driver',
    'Speed_limit',
    'Engine_Capacity_.CC.',
    'degree_centrality'
]
filter_data = crashes_and_centrality[keep_cols]

### Data Recategorisation and Standardisation 

Variables are recategorised since there are too many unique variables to make observations that are meaningful. They will then be consolidated to larger groups which will reflect how each variable is ditributed. This will be done in two steps.

1. Recategorisation of variables into the following categories:

| Variable         | Categories        | Values                           |
|------------------|-------------------|----------------------------------|
| Weather conditions | Fine            | - Fine no high winds             |
|                  |                   | - Raining no high winds          |
||-------------------|----------------------------------|
|                  | Lower Visibility  | - Fog or mist                    |
|                  |                   | - Snowing no high winds          |
||-------------------|----------------------------------|
|                  | High Winds        | - Fine high winds                |
|                  |                   | - Raining high winds             |
|                  |                   | - Snowing high winds             |
|------------------|-------------------|----------------------------------|
| Light Conditions | Daylight          | - Daylight                       |
||-------------------|----------------------------------|
|                  | Dark but lit      | - Darkness: streetlight lit      |
||-------------------|----------------------------------|
|                  | Complete Darkness | - Darkness: No Street lighting   |
|                  |                   | - Darkness: Streetlights not lit |
|                  |                   | - Darkness: Streetlighting unknown |
|------------------|-------------------|----------------------------------|
| Road Conditions  | Dry               | - Dry                            |
||-------------------|----------------------------------|
|                  | Wet               | - Wet/Damp                       |
|                  |                   | - Flood                          |
||-------------------|----------------------------------|
|                  | Icy               | - Frost/Ice                      |
|                  |                   | - Snow                           |
|------------------|-------------------|----------------------------------|
| No. of Vehicles  | 1 vehicle         |                                  |
|                  | 2 vehicles        |                                  |
|                  | 3 or more vehicles|                                  |
|------------------|-------------------|----------------------------------|
| No. of Casualties| 1 casualty        |                                  |
|                  | 2-3 casualties    |                                  |
|                  | 3 or more casualties |                              |
|------------------|-------------------|----------------------------------|
| Crash Severity   | Slight            |                                  |
|                  | Serious           |                                  |
|                  | Fatal             |                                  |
|------------------|-------------------|----------------------------------|
| Speed Limit      | 20-30             |                                  |
|                  | 40-50             |                                  |
|                  | 60 or more        |                                  |
|------------------|-------------------|----------------------------------|
| Driver Sex       | Male              |                                  |
|                  | Female            |                                  |
|------------------|-------------------|----------------------------------|r

 
2. One-Hot-Encoding will be used to pivot the categories and convert the dataframe to binary.This is necessary since all the data will now be categorical, and allow every variables to have a standardised count.


In [None]:
# 1. Recategorising Variables

warnings.filterwarnings('ignore')

# function recategorising light conditions
def categorize_light(row):
    if pd.isna(row['Light_Conditions']):
        return np.nan
    elif row['Light_Conditions'] == 'Daylight':
        return 'Daylight'
    elif row['Light_Conditions'] in ['Darkness - lights lit']:
        return 'Dark_but_Lit'
    elif row['Light_Conditions'] in ['Darkness - lights unlit', 'Darkness - no lighting', 'Darkness: Street lighting unknown']:
        return 'Complete_Darkness'
    else:
        return np.nan
filter_data['Light_Category'] = filter_data.apply(categorize_light, axis=1)

# function recategorising weather conditiions
def categorize_weather(row):
    if row['Weather_Conditions'] == 'Fine no high winds':
        return 'Fine'
    elif row['Weather_Conditions'] in ['Snowing no high winds', 'Raining no high winds', 'Fog or mist']:
        return 'Lower_Visability'
    elif row['Weather_Conditions'] in ['Fine + high winds', 'Raining + high winds', 'Snowing + high winds']:
        return 'High_Winds'
    else:
        return np.nan        
filter_data['Weather_Category'] = filter_data.apply(categorize_weather, axis=1)

# function recategorising casualties
def categorize_casualties(row):
    if pd.isna(row['Number_of_Casualties']):
        return np.nan
    elif row['Number_of_Casualties'] == 1:
        return '1_Casualty'
    elif 2 <= row['Number_of_Casualties'] <= 3:
        return '2_to_3_Casualties'
    else: 
        return 'More_than_3_Casualties'
filter_data['Casualty_Category'] = filter_data.apply(categorize_casualties, axis=1)

# function recategorising vehicle numbers
def categorize_vehicles(row):
    if pd.isna(row['Number_of_Vehicles']):
        return np.nan
    elif row['Number_of_Vehicles'] == 1:
        return '1_Vehicle'
    elif row['Number_of_Vehicles'] == 2:
        return '2_Vehicles'
    else: 
        return '3_or_more_Vehicles'
filter_data['Vehicle_Category'] = filter_data.apply(categorize_vehicles, axis=1)

# function recategorising road conditions
def categorize_road_conditions(row):
    if pd.isna(row['Road_Surface_Conditions']):
        return np.nan
    elif row['Road_Surface_Conditions'] == 'Dry':
        return 'Road_Dry'
    elif row['Road_Surface_Conditions'] in ['Wet or damp', 'Flood over 3cm. deep']:
        return 'Road_Wet'
    elif row['Road_Surface_Conditions'] in ['Frost or ice', 'Snow']:
        return 'Road_Icy'
    else:
        return np.nan
filter_data['Surface_Condition_Category'] = filter_data.apply(categorize_road_conditions, axis=1)

# function recategorising speed limits
def categorize_speed(row):
    if pd.isna(row['Speed_limit']):
        return np.nan
    elif 20 <= row['Speed_limit'] <= 30:
        return 'Speed_20_to_30'
    elif 40 <= row['Speed_limit'] <= 50:
        return 'Speed_40_to_50'
    elif 60 <= row['Speed_limit'] <= 70:
        return 'Speed_60_Plus'
    else:  
        return np.nan
filter_data['Speed_Category'] = filter_data.apply(categorize_speed, axis=1)

# function recategorising driver age
def categorize_age(row):
    if pd.isna(row['Age_Band_of_Driver']):
        return np.nan
    elif row['Age_Band_of_Driver'] in ['0 - 5', '6 - 10', '11 - 15', '16 - 20', '21 - 25', '26 - 35']:
        return 'Young'
    elif row['Age_Band_of_Driver'] in  ['36 - 45', '46 - 55', '56 - 65']:
        return 'Middle_Aged'
    elif row['Age_Band_of_Driver'] in ['66 - 75', 'Over 75']:
        return 'Senior'
    else:
        return np.nan
filter_data['Age_Category'] = filter_data.apply(categorize_age, axis=1)

# function recategorising driver sex
def categorize_sex(row):
    if pd.isna(row['Sex_of_Driver']):
        return np.nan
    elif row['Sex_of_Driver'] == 'Female':
        return 'Female'
    elif row['Sex_of_Driver'] == 'Male':
        return 'Male'
    else:  
        return np.nan
filter_data['Driver_Sex'] = filter_data.apply(categorize_sex, axis=1)

# function recategorising degree centrality
def categorize_centrality(row):
    if pd.isna(row['degree_centrality']):
        return np.nan
    elif row['degree_centrality'] < 0.0003:
        return 'Low_Centrality'
    elif row['degree_centrality'] < 0.0005:
        return 'Medium_Centrality'
    else:
        return 'High_Centrality'
filter_data['centrality_category'] = filter_data.apply(categorize_centrality, axis=1)

In [None]:
# subset dataframe
keep_cols = ['code',
             'Accident_Index',
             'centrality_category',
             'Driver_Sex',
             'Age_Category',
             'Speed_Category',
             'Surface_Condition_Category',
             'Vehicle_Category',
             'Casualty_Category',
             'Light_Category',
             'Weather_Category',
             'Accident_Severity',
            ]

categorised_data = filter_data[keep_cols]

In [None]:
# 2. One-Hot_Encoding

# create unique ids for crash data
categorised_data['Unique_id'] = categorised_data.index

# pivot the categorised data
variables = [
    'centrality_category',
             'Driver_Sex',
             'Age_Category',
             'Speed_Category',
             'Surface_Condition_Category',
             'Vehicle_Category',
             'Casualty_Category',
             'Light_Category',
             'Weather_Category',
             'Accident_Severity',
]

def pivot_accidents(data):
    # melt dataframe to long format
    melted = pd.melt(data, id_vars=['Unique_id'], value_vars=variables)
    
    # pivot to wide format with binary indicators
    pivoted = melted.pivot_table(index='Unique_id', 
                                 columns='value', 
                                 aggfunc=len, 
                                 fill_value=0)
    
    # flatten MultiIndex to columns
    pivoted.columns = ['_'.join(col).strip() for col in pivoted.columns.values]
    
    return pivoted.reset_index()

# apply function to categorised data
pivoted_data = pivot_accidents(categorised_data)

In [None]:
# merge with categorised data to retain output areas 'code'
pivoted_merged = pivoted_data.merge(categorised_data, on = 'Unique_id')

# subset to include pivoted data and output area
keep_cols =['code',
            'Accident_Index',
            'variable_1_Casualty',
            'variable_2_to_3_Casualties',
            'variable_More_than_3_Casualties',
            'variable_1_Vehicle',
            'variable_2_Vehicles',
            'variable_3_or_more_Vehicles',
            'variable_Complete_Darkness',
            'variable_Dark_but_Lit',
            'variable_Daylight',
            'variable_Fatal',
            'variable_Slight',
            'variable_Serious',
            'variable_Female',
            'variable_Male',
            'variable_Fine',
            'variable_High_Winds',
            'variable_Lower_Visability',
            'variable_Low_Centrality',
            'variable_Medium_Centrality',
            'variable_High_Centrality',
            'variable_Young',
            'variable_Middle_Aged',
            'variable_Senior',
            'variable_Road_Dry',
            'variable_Road_Icy',
            'variable_Road_Wet',
            'variable_Speed_20_to_30',
            'variable_Speed_40_to_50',
            'variable_Speed_60_Plus',
]

pivoted_merged = pivoted_merged[keep_cols]

### K-means clustering preperation

Data will be prepared for k-means cluster analysis in this section.

#### Calculating Counts and Standardizing between each output area.

**1. Variable counts:** Occurances of every variable will be tallied in every output area. A series of trial and error is used to select the variables. 

**2. Total Crash Count:** The total number of crashes will be included.

**3. Standardization between Output Areas:** The variables counts of different output areas will be standardised to make the data comparable. The total number of crashes for each output area will be converted to percentages. 

**4. Standardization between variables:** Input variables will be standardised so that random variables do not contribute to the weighting of the clusters. Each variable will have their z-score calculated, which is the number of standard deviations the value is from the data mean.

In [None]:
# variable to provide total crash count
pivoted_merged['Total Count'] = 1

# drop 'Accident_Index' to count variable occurences by output area
oa_counts = pivoted_merged.drop('Accident_Index', axis=1).groupby('code').sum()

# new dataframe of standardised counts
df_percentages = oa_counts.copy()

# run for every column other than code and total count
for col in df_percentages.columns:
    if col not in ['code', 'Total Count']:
        df_percentages[col] = (df_percentages[col] / df_percentages['Total Count']) * 100

# variables for k-means clustering
keep_cols = ['variable_Complete_Darkness', 'variable_Dark_but_Lit', 'variable_Speed_20_to_30',
       'variable_Speed_40_to_50', 'variable_Speed_60_Plus', 'variable_Female', 'variable_Male', 'variable_Low_Centrality',
       'variable_Medium_Centrality', 'variable_High_Centrality', 'variable_Fatal', 'variable_Serious', 'variable_Slight','variable_Road_Icy', 'variable_Road_Dry', 'variable_Road_Wet']

final_df = df_percentages[keep_cols]

In [None]:
# standardization between variables using z-score
numeric_columns = final_df
z_score_df = (numeric_columns - numeric_columns.mean()) / numeric_columns.std(ddof=0)

#### Measuring variables for correlation

Variables are now tested against each other to see their correlation. If variables are highly correlated, they influence clusters by providing undue weightings to particular phenomena. This occurs since the same distribution will be shown by highly correlated variables. 

To counter this, Pearsons coefficient matrix is used for the data frame. Then, variables with a greater score than 0.8 are removed.

In [None]:
# create correlation matrix
corr = z_score_df.corr()
corr.style.background_gradient(cmap='coolwarm')

In [None]:
# set threshold
threshold = 0.8

highly_correlated = (corr.abs() > threshold) & (corr.abs() < 1.0)

# heat map to differentiate between correlations
plt.figure(figsize=(10, 8))
sns.heatmap(highly_correlated, cmap='coolwarm', cbar=False, annot=True)

plt.title('Correlated Variables')
plt.show()

The matrix above shows features that are correlated. These are:

* variable_Speed_40_to_50 & variable_Speed_20_to_30
* variable_Male & variable_Female
* variable_Serious & variable_Slight
* variable_Road_Dry & variable_Road_Wet

To reduce the impact this will have on the results, half of them are removed

In [None]:
z_score_df.drop(['variable_Speed_40_to_50', 'variable_Serious', 'variable_Male','variable_Road_Dry'], axis=1, inplace=True)

### K-Means Clustering

K-means clustering is a type of machine learning algorithm used for spotting patterns in extensive datasets.

This algorithm operates by initially placing k centroids randomly across the dataset. Then, it assigns each datapoint to the nearest centroid. After assigning the points, it calculates the average of the points in each cluster to form a new centroid. The algorithm repeats this process until it minimizes the distance between the datapoints and their respective centroids, or it reaches a predetermined number of iterations.

One of the reasons for choosing K-means clustering is its ability to efficiently handle large datasets without significant computational costs.

For our analysis, we determine the ideal number of clusters by employing the Elbow method. This technique involves running the K-means algorithm several times with an increasing count of clusters and measuring the within-cluster sum of squares (WCSS), which assesses how tight the clusters are. The 'Elbow' point, where the reduction in WCSS slows down markedly, suggests that adding more clusters beyond this number does not meaningfully enhance the fit of the model.

In [None]:
# plot WCSS for clusters
Sum_of_squared_distances = []
K_range = range(1,25)

for k in K_range:
 km = KMeans(n_clusters=k, random_state=45)
 km = km.fit(z_score_df)
 Sum_of_squared_distances.append(km.inertia_)
    
plt.plot(K_range, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

The above plot shows the elbow at 9 clusters. To confirm, we examine the between-cluster sum of squares. When different numbers of groups are produced, the next plot will show how discriminatory each model is. 

In [None]:
def elbow(dataframe, n):
    kMeansVar = [KMeans(n_clusters=k).fit(dataframe.values) for k in range(1, n)]
    centroids = [X.cluster_centers_ for X in kMeansVar]
    k_euclid = [cdist(dataframe.values, cent) for cent in centroids]
    dist = [np.min(ke, axis=1) for ke in k_euclid]
    wcss = [sum(d**2) for d in dist]
    tss = sum(pdist(dataframe.values)**2)/dataframe.values.shape[0]
    bss = tss - wcss
    plt.plot(bss)
    plt.show()
 
elbow(z_score_df,25)

This confirms the elbow to be between 9 & 11, so the analysis will use 9 cluster groups.

In [None]:
# define number of clusters and random state
kmeans = KMeans(n_clusters=9, random_state = 45)

In [None]:
# apply k-means clustering on the dataframe
kmeans.fit(z_score_df)
labels = kmeans.predict(z_score_df)
cluster_centres = kmeans.cluster_centers_
z_score_df['Cluster'] = kmeans.labels_ 

plt.hist(labels)

In [None]:
kmeans.fit(z_score_df)
labels = kmeans.predict(z_score_df)
cluster_centres = kmeans.cluster_centers_

# new collumn defining the cluster of output areas
z_score_df['Cluster'] = kmeans.labels_

#### Principle Component Analysis (PCA)

The relationship between various clusters and principle components will be visualised with the PCA. This helps since it:

1. Simplifies Complexity: By narrowing down the dataset to its most significant components that account for the bulk of the variance, we simplify the visualization of complex data.
2. Enhances Cluster Visibility: Utilizing PCA to plot the clusters helps in clearly seeing how the clusters form and differentiate from one another.
3. Highlights Key Variables: When we plot the data according to its principal components, it gives us insight into the factors that play a pivotal role in defining the clusters.


In [None]:
plt.figure(figsize=(12, 8))

# predict cluster labels
clusters = kmeans.fit_predict(z_score_df)

z_score_df['Cluster'] = clusters

# remove mean and scales
scaler = StandardScaler()

# standardise z_score_df 
stand_data_scaled = scaler.fit_transform(z_score_df)

# PCA analysis: 2 principle components
pca = PCA(n_components=2).fit(stand_data_scaled)
pca_result = pca.transform(stand_data_scaled)

# percent variance explained by selected components
variance_ratio = pca.explained_variance_ratio_

# create scatter plot
fig = px.scatter(x=pca_result[:, 0], y=pca_result[:, 1], color=clusters,
                 labels={'color': 'Cluster'},
                 #title='Cluster Plot against 1st 2 Principal Components',
                 opacity=0.7,
                 width=800, 
                 height=800)

plt.tight_layout()
fig.show()

In [None]:
# apply PCA to reduce dimensionality to 2, and visualise cluster relationship to 2 principle components
clusters = kmeans.fit_predict(z_score_df)
z_score_df['Cluster'] = clusters

# standardise z_score_df 
scaler = StandardScaler()
stand_data_scaled = scaler.fit_transform(z_score_df)

# PCA analysis: reduce to 2 principle components
pca = PCA(n_components=2).fit(stand_data_scaled)
pca_result = pca.transform(stand_data_scaled)

# percent variance explained by selected components.
variance_ratio = pca.explained_variance_ratio_

plt.figure(figsize=(10, 6))
sns.scatterplot(x=pca_result[:, 0], y=pca_result[:, 1], hue=clusters, palette='viridis', s=50, alpha=0.7)
plt.title('Cluster Plot against 1st 2 Principal Components')
plt.xlabel(f'Principal Component 1 variation: {variance_ratio[0]*100:.2f}%')
plt.ylabel(f'Principal Component 2 variation: {variance_ratio[1]*100:.2f}%')
plt.legend(title='Clusters')
plt.show()

The PCA plot reveals clear separations into different clusters, indicating the effectiveness of the k-means algorithm in identifying distinct categories within the data. However, the presence of some overlapping points between clusters shows that certain datapoints have similarities across these groups.

From the plot, it's evident that the first two principal components account for 34.73% of the variance in the data, with the first component contributing 16.42% and the second 17.53%.

This suggests that additional dimensions within the dataset could play a crucial role in explaining the variance, highlighting the complexity of the data.

The observed overlaps among clusters and the moderate percentage of explained variance are attributed to the multifaceted nature of car accidents. As previously mentioned, categorizing car accidents is challenging because each incident can result from a combination of factors, including environmental conditions, infrastructure, and driver errors.

#### Understanding Cluster Centers

Having successfully identified the clusters, it's crucial to delve into the specifics of the variables characterizing each cluster.

The most straightforward method to comprehend the essence of these cluster centers involves examining the Z-score distribution for each group.

Through radial plotting of these distributions, we gain insight into how each cluster differs in comparison to the overall sample.

In [None]:
# clustering
clusters = kmeans.fit_predict(z_score_df)

# cluster centers
cluster_centers = kmeans.cluster_centers_

# cluster centers
cluster_centers = pd.DataFrame(kmeans.cluster_centers_, columns=z_score_df.columns)

In [None]:
# clustering
clusters = kmeans.fit_predict(z_score_df)

# retrieve cluster centres
cluster_centers = kmeans.cluster_centers_

# convert cluster centres
cluster_centers = pd.DataFrame(kmeans.cluster_centers_, columns=z_score_df.columns)

We are now able to depict the Z-score distribution for each cluster through a radial plot, with an example for one cluster displayed here. Additional plots for the remaining clusters are provided in the Appendix.

In these plots, the red line represents the average value of each variable. The area above and below this line shows the deviation of each variable from the mean, measured in standard deviations.

In [None]:
# first cluster
first_row_centers = cluster_centers.iloc[0, :-1]

# len of features
num_features = len(first_row_centers)

# polar coordinates
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)

fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')

# extra red line at the 0.0.
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')

# close the plot
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')

ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')

# plot
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

**Observations from Cluster 0**

The following variables are defined by cluster 0

* Medium Centrality
* Wet Roads
* Female
* Slight Severity
* 20-30 speed limit
* Dark but lit

This will be repeated for each cluster center. The final cluster map will be renamed by their key variables for ease of visualisation.

#### Consolidation

The clusters have been created and defined and will be cleaned and consolidated. 

In [None]:
# retain cluster and oa codes
z_score_df.drop([
'variable_Complete_Darkness', 'variable_Dark_but_Lit',
       'variable_Speed_20_to_30', 'variable_Speed_60_Plus', 'variable_Female',
       'variable_Low_Centrality', 'variable_Medium_Centrality',
       'variable_High_Centrality', 'variable_Fatal', 'variable_Slight',
       'variable_Road_Icy', 'variable_Road_Wet',], axis=1, inplace=True)

In [None]:
# merge cluster data and ouput areas
results = z_score_df.merge(glasgow_oa, on='code')

# merge standardised data
results_2 = results.merge(df_percentages, on='code')

# drop excess data
results_2.drop(['objectid', 'hhcount', 'popcount', 'sqkm', 'hect', 'masterpc', 'easting', 'northing' 
                , 'council','shape_1_le','shape_1_ar', 'label', 'code', 
], axis=1, inplace=True)

In [None]:
# rename clusters
results_2['Cluster'] = results_2['Cluster'].astype('string')

#create funcion that renames the clusters
def rename_column(x): 
    x = x.replace('0', "0: Slow, Medium Centrality, Slight Severity, Female, Wet Roads")
    x = x.replace('1', "1: Slight Severity, Slow Speed, Medium Centrality")
    x = x.replace('2', "2: Slow Speeds, High Centrality")
    x = x.replace('3', "3: Slow Speeds, Low Centrality")
    x = x.replace('4', "4: High Speeds, Low Centrlity")
    x = x.replace('5', "5: Medium Centrality, Fatal")
    x = x.replace('6', "6: Dark but Street Lit, Icy Roads")
    x = x.replace('7', "7: Medium Centrality, Complete Darkness")
    x = x.replace('8', "8: Darkness, Low Centrality, High Speeds")
    return x

# run function through cluster column
results_2['Cluster'] = results_2['Cluster'].apply(rename_column)

In [None]:
# convert to geodataframe
edin_map = gpd.GeoDataFrame(results_2, crs = 'EPSG:4326', geometry=results['geometry'])

## Results

In [None]:
# folium heat map to show Glasgow crashes
map_center = [filter_data['Latitude'].median(), filter_data['Longitude'].median()]
crash_m = folium.Map(location=map_center, zoom_start=12, tiles='CartoDB positron')
location=filter_data[["Latitude","Longitude"]]
marker_cluster = plugins.MarkerCluster(location).add_to(crash_m)
heat_map = plugins.HeatMap(location,
                               radius=20,
                               blur=15,
                               ).add_to(crash_m)

folium.LayerControl().add_to(crash_m)
crash_m

The heat map presented illustrates the disproportionate occurrence of vehicular accidents throughout Glasgow. It highlights a concentration of accidents along principal roads and expressways leading into and out of the city, as well as around the southern bypass. These areas, experiencing higher volumes of traffic, are consequently more prone to accidents.

Furthermore, the map indicates that Glasgow's city center records the highest number of accidents, while the outer regions of the city show considerably fewer incidents.

A detailed examination reveals that accident hotspots in Glasgow frequently develop at intersections, suggesting that the importance of a road junction, or its degree of centrality, plays a crucial role in the prevalence of road accidents.

In [None]:
nodes.explore(tiles="cartodbdarkmatter", column="degree_centrality", marker_kwds={"radius": 3})

The diagram showcases the road network nodes in Glasgow, with each node color-coded according to its centrality level. The distribution of node centralities throughout the city appears fairly uniform. While the majority of nodes have low to medium levels of centrality, a few stand out with significantly higher centrality.

In [None]:
# distribution of node degree centrality
counts_per_node_degree = filter_data.groupby('centrality_category')['Accident_Index'].count()
counts_per_node_degree.plot(kind='bar', figsize=(5, 5))
plt.title('Crashes by Node Centrality')
plt.xlabel('Centrality Category')
plt.ylabel('# of Accidents')
plt.xticks(rotation=90)
plt.show()

In [None]:
# counts for number of crashes and severity per degree centrality
counts_severity_node = filter_data.groupby(['centrality_category', 'Accident_Severity'])['Accident_Index'].count()
counts_severity_node.plot(kind='bar', stacked = True, figsize=(5, 5))
plt.title('# of Crashes by Node Centrality')
plt.xlabel('Centrality Category')
plt.ylabel('# of Accidents')
plt.xticks(rotation=90)
plt.show()

The bar chart depicted reveals how accident occurrences correlate with node centrality within the road network, categorized by the severity of crashes. It indicates a higher frequency of accidents near nodes with medium centrality and fewer incidents adjacent to nodes of high centrality.

Furthermore, the data points out that the proximity to high centrality nodes does not equate to more severe accidents. The majority of accidents, regardless of the centrality level, lead to minor injuries. This suggests that there's no direct link between the centrality of a node and the severity of crashes that occur near it.

In [None]:
# distribution of gender
counts_per_node_degree = filter_data.groupby('Driver_Sex')['Accident_Index'].count()
counts_per_node_degree.plot(kind='bar', figsize=(5, 5))
plt.title('Number of Crashes by Gender')
plt.xlabel('Driver Gender')
plt.ylabel('Number of Accidents')
plt.xticks(rotation=90)
plt.show()

In [None]:
# Glasgow output areas coloured by cluster
glas_map.explore(column='Cluster', cmap='Set3', tiles='CartoDB positron')


The displayed map is a choropleth representation of Glasgow's output areas, with varying shades illustrating the 9 k-means clusters determined from the crash data collected within these areas. Not all output areas are marked with a cluster; this absence is due to a lack of crash data in these areas, leaving them uncategorized.

The k-means clustering approach has unveiled distinct patterns of clusters throughout Glasgow, highlighting a clear division between the central parts of Glasgow and the outskirts. The periphery of the city mainly falls under two large clusters that define the city's edges. One such cluster, identified as cluster 6, reaches into the city along major western access routes. 

In contrast, the city center of Glasgow features numerous smaller clusters, with some appearing sporadically across the map. This randomness often results from output areas recording very few crashes during the study, making them unique. Certain clusters, for instance, cluster 1, seem to trace specific pathways or mirror varying land uses. Central city clusters are significantly shaped by factors such as node centrality and speed limit regulations.

The distinct pattern of clustering observed towards the city's fringes suggests that accidents in these locations might be more straightforward to classify compared to those in the city center.

Further analysis of the map in Appendix C reveals that the distinction between Glasgow's inner and outer regions correlates with the city bypass. The nature of crashes on this larger road type could account for the differing categorization of accidents, with those on the bypass possibly being more homogeneous and predictable.

Clusters 5 and 6 stand out as they form most of the expansive output areas on the fringes of Glasgow. The large size of these output areas is a key factor in their prominent appearance on the map.

Cluster 5 is characterized by a higher-than-average fatality rate and medium node centrality. Its presence around Glasgow suggests that these areas are typically located far from the city center. One possible explanation for this pattern is their peripheral location, which might place them at a disadvantage in terms of access to medical services. Consequently, accidents in these zones could have a higher likelihood of resulting in fatalities.

On the other hand, cluster 6 predominantly features icy road conditions and, to a lesser extent, dark conditions with street lighting. This pattern may be attributed to these areas being on the city's outskirts, where they might not benefit from the same level of services as the central urban areas. For instance, priority may be given to gritting roads in the city center, increasing the risk of icing on roads near the city's edges. Additionally, the urban heat island effect could play a role. This effect causes built-up areas to retain heat more effectively than less developed areas, making them warmer (EPA, 2024). Hence, roads outside the urban center might be more prone to icing, raising the accident rate. This cluster also stretches into Glasgow along the main eastern roads and appears in scattered output areas across the city.

In Glasgow's city center, clustering of output areas tends to be more scattered compared to the more uniform patterns seen on the outskirts. Nonetheless, several distinct clusters can be observed, many of which align with specific infrastructure and land use trends. For instance, cluster 1 is prevalent in Glasgow's commercial and tourism heart, encompassing the core areas of Glasgow. These locations experience heavy pedestrian traffic and are defined by slow vehicle speeds, medium centrality, and minor injury severity. The presence of this cluster might reflect areas with extensive traffic control measures like traffic lights, roundabouts, and pedestrian crossings, due to the high volume of pedestrians.

Cluster 4 offers another insight, with a pattern of appearing along main arterial routes into the city. The accidents within these areas are marked by low centrality and high speeds, suggesting these might be roads with fewer complex intersections and higher speed limits.

Additionally, the map points out several more isolated crash clusters that occur in just one or two districts at a time. Cluster 0, identified by incidents involving female drivers, medium centrality, and wet road conditions, is dispersed throughout the city without a clear pattern of concentration. There are also sporadic appearances of cluster 8 (defined by high speeds, darkness, and low centrality) and cluster 7 (marked by darkness and medium centrality) throughout Glasgow, indicating less common but notable patterns of road accidents.

## Discussion


The k-means clustering analysis conducted in this study reveals the complex interplay of infrastructure, societal factors, and environmental conditions in shaping the diverse landscape of vehicular accidents. The detailed nature of the cluster analysis aids in pinpointing specific areas with recurring problems.

This research offers valuable insights for refining road safety strategies in Glasgow. It synthesizes key contributors to road accidents throughout the city and identifies zones requiring targeted interventions, whether infrastructural, societal, or in terms of emergency services, to effectively minimize vehicular accidents.

Heat maps of road accidents in Glasgow underline the central urban area as the primary focus for policymakers, given its high incidence of accidents. While Glasgow has already implemented widespread 20mph speed limits, this study suggests that a universal strategy may not address the distinct challenges presented by different parts of the city’s road network. Instead, it advocates for interventions tailored to the unique factors influencing accidents in specific locales.

For instance, the analysis points to a possible shortfall in road safety measures on the city’s outskirts, where accidents often result in higher fatalities and are more likely to occur on icy roads. In such regions, enhancing emergency service availability could be crucial in reducing accident severity and fatalities. As a targeted measure, placing additional emergency response units in areas identified by Cluster 5 could improve response times and, consequently, survival rates. This approach is corroborated by research from the European Commission, which indicates a correlation between the availability of ambulances and the incidence of fatal accidents. Such findings could also guide the strategic placement of future healthcare facilities.

In areas highlighted by Cluster 6, the focus should be on preventing icy conditions on roads, which significantly reduce vehicle traction and contribute to accidents. Solutions could include revising grit distribution routes, augmenting the availability of grit bins, and educating drivers about the increased risks during colder periods. A closer look at current gritting practices could further refine these strategies.

According to the analysis from the k-means clustering model, addressing road safety in Glasgow's city center demands more nuanced and specific solutions than those applied to suburban areas. The accident patterns in the city center involve a broader range of factors and occur in much denser clusters.

For instance, accidents within Cluster 1 typically involve lower speeds and occur near moderately connected road junctions, with generally less severe outcomes. The priority here might shift away from emergency response enhancements to preventive measures against frequent, minor accidents. Proposed solutions could include better signage, enhanced junction controls, and traffic calming strategies. Research supports that such measures can heighten driver awareness and minimize conflicts at intersections, thereby reducing accidents (Ewing, nd).

In contrast, Cluster 4's defining characteristics of high speeds and lower connectivity suggest a need to slow down traffic. Strategies might involve implementing lower speed limits, installing speed cameras, and bolstering police patrols. While such measures may be less feasible on main thoroughfares where high-speed flow is necessary, alternative traffic calming techniques could be employed to maintain driver alertness without significantly impeding traffic.

Cluster 0 stands out due to its lack of a clear pattern and being the sole cluster distinguished by driver gender. Despite data indicating that most accidents in Glasgow are caused by male drivers, the emergence of Cluster 0, with a higher incidence of accidents involving female drivers, signals underlying issues worth exploring. Identifying these factors could lead to targeted interventions aimed at addressing the specific risks encountered by female drivers in these areas.

## Conclusion

The increasing global trend of traffic incidents presents a significant challenge, leading to economic burdens and substantial pressures on public health infrastructures. In response, the Scottish government has taken active measures to address road safety issues, particularly in Glasgow, by enforcing widespread 20mph speed restrictions throughout the city. These measures have effectively reduced both the occurrence and impact of road accidents in the area.

The complexity of road accidents, influenced by a combination of environmental, societal, and infrastructural elements, calls for a more detailed strategy in addressing road safety. Through the application of machine learning techniques, this study has adeptly differentiated various accident types within Glasgow, shedding light on the specific and localized patterns of these incidents.

Our analysis reveals a stark contrast between accident patterns in the city's heart versus its outskirts. The outer areas of Glasgow spotlight deficiencies in service delivery, evident from the prevalence of more fatal accidents and conditions conducive to icy roads. These observations suggest a need for better emergency services and road maintenance in these zones. Conversely, the heart of Glasgow displays a series of smaller, distinct, and localized clusters, each with its specific challenges. These central areas demand targeted solutions, such as enhanced traffic calming initiatives and, in certain instances, improved street lighting.

Moreover, this research underscores the significant influence of node centrality on road accidents, with some areas marked by high centrality and others by low. Recognizing junctions with substantial connectivity could guide better infrastructure planning and accident mitigation efforts.

To sum up, the k-means clustering approach has proven effective in organizing and elucidating the diverse factors leading to road accidents. The identified clusters reveal spatial trends in these incidents, advocating for a nuanced, data-informed approach to road safety policymaking. Tailored interventions are essential to address the distinct needs of different communities effectively. Such a strategy could increase the effectiveness of road safety initiatives and lower the incidence of fatal accidents. Future policies must acknowledge the intricate nature of road safety challenges, employing a data-driven methodology to develop innovative and nuanced strategies for combating road accidents.

## Limitations

Dataset Limitations: The dataset for this analysis, encompassing over 15,000 accident records, showed an uneven distribution across Glasgow. This unevenness, especially noticeable in areas reporting a single accident, led to what appeared to be arbitrary clustering in certain locations.

Additionally, the dataset concludes with data up to 2016, missing out on the effects of Glasgow’s expansion of 20mph speed zones in 2018. As a result, the analysis does not account for changes following this implementation.

Complexity of Accident Factors: Echoing findings from the literature (Sinclair & Das, 2021), a myriad of factors influence the occurrence, location, and nature of vehicular accidents worldwide. Establishing definitive accident categories proves challenging, underscoring the need for further research and data gathering to identify more descriptive clusters of accidents.

Node Degree Centrality Limitations: The method used to assess the connectedness of nodes, while insightful, has its limitations. It solely gauges the connectivity of a node to adjacent roads but doesn’t account for its overall importance in the road network. A more comprehensive measure, degree betweenness, would have offered a deeper understanding by evaluating a node’s significance in linking all other nodes within the network. However, this study was constrained to using degree centrality due to the extensive computational resources required for calculating betweenness for Glasgow’s entire road network.

### Appendix

In [None]:
# cluster 1
first_row_centers = cluster_centers.iloc[1, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 2
first_row_centers = cluster_centers.iloc[2, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 3 
first_row_centers = cluster_centers.iloc[3, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 4
first_row_centers = cluster_centers.iloc[4, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 5
first_row_centers = cluster_centers.iloc[5, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 6
first_row_centers = cluster_centers.iloc[6, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 7 
first_row_centers = cluster_centers.iloc[7, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

In [None]:
# cluster 8
first_row_centers = cluster_centers.iloc[8, :-1]
num_features = len(first_row_centers)
theta = np.linspace(0, 2 * np.pi, num_features, endpoint=False)
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, first_row_centers, linewidth=1, color='blue', marker='o', label='Centers')
ax.plot(theta, np.zeros_like(first_row_centers), color='red', linestyle='--', label='Average')
ax.plot(np.append(theta, theta[0]), np.append(first_row_centers, first_row_centers[0]), linewidth=1, color='blue', marker='o')
ax.set_xticks(theta)
ax.set_xticklabels(cluster_centers.columns[:-1], rotation=45, ha='right')
plt.legend(loc='upper left', bbox_to_anchor=(1.05, 1))
plt.show()

## References

* Sinclair, C. and Das, S. (2021) ‘Analyzing UK Urban Traffic Accidents with K-means Clustering for Geospatial Mapping’, Proceedings of the 2021 International Conference on Sustainable Energy and Future Electric Transportation (SEFET). doi:10.1109/sefet48154.2021.9375817.

* Ewing, R. (No date) ‘Evaluating the Effects of Traffic Calming Measures on Urban Roads’, publication. Available at: https://nacto.org/docs/usdg/impacts_of_traffic_calming_ewing.pdf.

* Nightingale, G.F. et al. (2021) ‘Assessment of Edinburgh’s Citywide 20mph Speed Limit on Traffic Dynamics’, PLOS ONE, 16(12). doi:10.1371/journal.pone.0261383.

* Hunter, R.F. et al. (2022) ‘Exploring the Influence of a 20mph Speed Limit on Road Safety in Belfast, UK: A Three-Year Evaluation’, Journal of Epidemiology and Community Health, 77(1), pp. 17–25. doi:10.1136/jech-2022-219729.

* Pembuain, A., Priyanto, S., Suparma, L. (2019) ‘Influence of Road Infrastructure on Traffic Accident Occurrences’, Proceedings of the 11th Asia Pacific Conference on Transportation and the Environment (APTE 2018) [Preprint]. doi:10.2991/apte-18.2019.27.

* UK Road Safety: Accidents and Vehicles Dataset. Available at: https://www.kaggle.com/datasets/tsiaras/uk-road-safety-accidents-and-vehicles