# Road Accident Prediction


## Imports
This section contains all the required imports for this Jupyter Notebook. The data for this model can be found [here](https://www.kaggle.com/datasets/xavierberge/road-accident-dataset).

In [55]:
import pandas as pd
from sklearn.naive_bayes import CategoricalNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, f1_score, recall_score, precision_score
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import OrdinalEncoder
import ipywidgets as widgets

## Data Pre-Processing
Once the data has been downloaded we import it using Pandas. After importing we drop all rows that contain a NaN value. After this we drop all columns that contain irrelevant information.

In [56]:
df = pd.read_csv('Road Accident Data.csv', dtype = str)
df = df.dropna(axis=0)

df = df.drop('Accident_Index', axis=1)
df = df.drop('Accident Date', axis=1)
df = df.drop('Month', axis=1)
df = df.drop('Year', axis=1)
df = df.drop('Day_of_Week', axis=1)
df = df.drop('Junction_Control', axis=1)
df = df.drop('Junction_Detail', axis=1)

df.head()
# Get the number of rows and columns
rows = len(df.axes[0])
cols = len(df.axes[1])

# Print the number of rows and columns
print("Number of Rows: " + str(rows))
print("Number of Columns: " + str(cols))


print(df['Light_Conditions'].value_counts())
print(df['Accident_Severity'].value_counts())

print(df['Road_Surface_Conditions'].value_counts())

Number of Rows: 5305
Number of Columns: 16
Light_Conditions
Daylight                       3286
Darkness - no lighting         1028
Darkness - lights lit           907
Darkness - lighting unknown      50
Darkness - lights unlit          34
Name: count, dtype: int64
Accident_Severity
Slight     4461
Serious     771
Fatal        73
Name: count, dtype: int64
Road_Surface_Conditions
Dry                     3248
Wet or damp             1665
Frost or ice             270
Snow                     104
Flood over 3cm. deep      18
Name: count, dtype: int64


## Model Fitting
The first thing we do when fitting our data to the model is sort the data into our **dependant** and **independent** variables, that is to say the value we wish to predict and the values we are going to use to make this prediction. Once the data is sorted we encode it using an **ordinal encoder**. This encoder takes our categorical data and gives it a numerical representation. Due to the fact that value counts of the dependant variable are so drastically different, we use the **Synthetic Minority Oversampling Technique (SMOTE)** to generate new instances of our minority values (Serious and Fatal accidents). This allows us to balance the class distribution and helps us when determining the accuracy of our model. After the data has been encoded and smote is applied, we split it into **training** and **testing** data. The training data is the data we use to train our model to make predictions by giving it the independent variables and their corresponding dependant variable. The testing data is the data we use to test how well the model makes predictions. We do this by giving it the independent variables and making it predict the dependant variable. The data is split into 20% training and 80% testing respectively. Finally we fit both the training and test data to the model.

In [57]:
X = df[['Light_Conditions', 'Road_Surface_Conditions', 'Road_Type', 'Speed_limit', 'Urban_or_Rural_Area', 'Weather_Conditions', 'Vehicle_Type']]
y = df['Accident_Severity']

# Encode data numerically to deal with categorical data values
encoder = OrdinalEncoder()
X_encoded = encoder.fit_transform(X)

# Synthetic Minority Oversampling Technique - Generates new minority instances to balance class distribution
smote = SMOTE(sampling_strategy="not majority", random_state=42) # Do not sample majority, no need as it is imbalanced
X_resampled, y_resampled = smote.fit_resample(X_encoded, y)

X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.8, random_state=42)

gnb = CategoricalNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

## Predictions
In this section you can predict the severity of a car accident based on the independent variables. Each dropdown contains a list of values pertaining to a specific independent variable. Once all desired options have been selected, you may hit the "Predict Severity" button to see what the severity of an accident is predicted to be based on the given factors.

In [58]:
surface_conditions = df['Road_Surface_Conditions'].dropna().unique()
light_conditions = df['Light_Conditions'].dropna().unique()
speed_limits = df['Speed_limit'].dropna().unique()
urbanOrRural = df['Urban_or_Rural_Area'].dropna().unique()
weatherConditions = df['Weather_Conditions'].dropna().unique()
vehicleType = df['Vehicle_Type'].dropna().unique()

# Create dropdown widgets
surface_dropdown = widgets.Dropdown(
    options=surface_conditions,
    description="Road Surface:",
    value=surface_conditions[0]  # Default value
)

light_dropdown = widgets.Dropdown(
    options=light_conditions,
    description="Light Conditions:",
    value=light_conditions[0]
)

speed_dropdown = widgets.Dropdown(
    options=sorted(speed_limits),  # Sort numerical values
    description="Speed Limit:",
    value=speed_limits[0]
)

urban_dropdown = widgets.Dropdown(
    options=urbanOrRural,
    description="Urban or Rural Area:",
    value=urbanOrRural[0]
)

weather_dropdown = widgets.Dropdown(
    options=weatherConditions,
    description="Weather Conditions:",
    value=weatherConditions[0]
)

vehicleType_dropdown = widgets.Dropdown(
    options=vehicleType,
    description="Vehicle Type:",
    value=vehicleType[0]
)

def make_prediction(df):


# Display the widgets
display(surface_dropdown, light_dropdown, speed_dropdown, urban_dropdown, weather_dropdown, vehicleType_dropdown)

predict_button = widgets.Button(description="Predict Severity")
predict_button.on_click(make_prediction())
display(predict_button)

Dropdown(description='Road Surface:', options=('Dry', 'Wet or damp', 'Frost or ice', 'Snow', 'Flood over 3cm. …

Dropdown(description='Light Conditions:', options=('Daylight', 'Darkness - lights lit', 'Darkness - no lightin…

Dropdown(description='Speed Limit:', index=1, options=('20', '30', '40', '50', '60', '70'), value='30')

Dropdown(description='Urban or Rural Area:', options=('Urban', 'Rural'), value='Urban')

Dropdown(description='Weather Conditions:', options=('Fine no high winds', 'Raining no high winds', 'Other', '…

Dropdown(description='Vehicle Type:', options=('Car', 'Bus or coach (17 or more pass seats)', 'Van / Goods 3.5…

Button(description='Predict Severity', style=ButtonStyle())