### Software Lab 4  Exercise Neural Network

US automobile accidents that have been classified by their level of severity as no injuries, injuries, or fatality. A firm might be interested in developing a system for quickly classifying the severity of an accident, and use it assign emergency response team priorities.

In this exercise you will create neural network model to this accident dataset and see how your model performs in predicting accident severity.

#### Data Dictionary
**ALCHL_I**: Presense (1) or absense (2) of alcohol

**PROFIL_I_R**: Profile of the roadway: level (1), and other (0)

**SUR_COND**: Surface condition of the road: dry(1), wet(2), snow/slush(3), ice (4), unknown(9)

**VEH_INVL**: Number of Vehicles involved

**MAX_SEV_IR**:  Presense of injuries/fatalities: no injuries (0), injury(1), fatality (2)

Follow the steps to finish the exercise and answer the questions on canvas.



1. Read the data into your IDE, check the data types, convert the predictors that are not in the correct type.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

# Define the file path (adjust the filename as needed)
file_path = "/content/drive/My Drive/Colab Notebooks/accidentsnn.csv"

# Read the CSV file
df = pd.read_csv(file_path)
# Display the first few rows
print("First 5 rows of the dataset:")
print(df.head())

# Check data types
print("\nData Types:")
print(df.dtypes)


Mounted at /content/drive
First 5 rows of the dataset:
   ALCHL_I  PROFIL_I_R  SUR_COND  VEH_INVL  MAX_SEV_IR
0        2           0         1         1           0
1        2           1         1         1           2
2        1           0         1         1           0
3        2           0         2         2           1
4        2           1         1         2           1

Data Types:
ALCHL_I       int64
PROFIL_I_R    int64
SUR_COND      int64
VEH_INVL      int64
MAX_SEV_IR    int64
dtype: object


2. assign predictors to X,assign target variable to y. Then dummy encode the predictors using .get_dummies, labelencode the target variable using .LabelEncoder()

In [16]:
# Convert ALCHL_I and SUR_COND to categorical
df['ALCHL_I'] = df['ALCHL_I'].astype('category')
df['SUR_COND'] = df['SUR_COND'].astype('category')

# Assign predictors and target variable
X = df.drop('MAX_SEV_IR', axis=1)
y = df['MAX_SEV_IR']

# Dummy encode the predictors (including ALCHL_I and SUR_COND)
X = pd.get_dummies(X, columns=['ALCHL_I', 'SUR_COND'], drop_first=True)

# Label encode the target variable
le = LabelEncoder()
y = le.fit_transform(y)

3. split the data into training set and test set. Use 20% of the data as test set. **set random_state = 0**

In [17]:
from sklearn.model_selection import train_test_split

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

4. Create a neural network model using MLPClassifier with one hidden layer and one hidden node, **set random_state = 0, and max_iteration = 1000**. What is the model performance on the training set? keep 3 digits after decimal.

In [18]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Create the model
model = MLPClassifier(hidden_layer_sizes=(1,), random_state=0, max_iter=1000)

# Train the model
model.fit(X_train, y_train)

# Evaluate on the training set
y_train_pred = model.predict(X_train)
train_accuracy = accuracy_score(y_train, y_train_pred)
print(f"Training set accuracy: {train_accuracy:.3f}")

Training set accuracy: 0.855


5. Evaluate the model on the test set. What is the model performance on the test set? keep 3 digits after decimal.

In [19]:
# Evaluate on the test set
y_test_pred = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Test set accuracy: {test_accuracy:.3f}")

Test set accuracy: 0.835


6. Create a confusion matrix of your model in test set.

how many accidents were no injuries but classified as fatality?

how many accidents were injuries but classified as fatality?

In [20]:
from sklearn.metrics import confusion_matrix

# Create confusion matrix
conf_matrix = confusion_matrix(y_test, y_test_pred)
print(conf_matrix)

# Answer the questions
# How many accidents were no injuries but classified as fatality?
no_injury_fatality = conf_matrix[0, 2]
print(f"No injuries classified as fatality: {no_injury_fatality}")

# How many accidents were injuries but classified as fatality?
injury_fatality = conf_matrix[1, 2]
print(f"Injuries classified as fatality: {injury_fatality}")

[[121   0   1]
 [  0  46   0]
 [ 11  21   0]]
No injuries classified as fatality: 1
Injuries classified as fatality: 0


7. Create a dataframe to show the importance of the predictors (the importance was saved in .coef_). Which variable is the most important one to classify car accident severity according to your model with one hidden layer and one hidden node?

In [21]:
# Get the coefficients
coefs = model.coefs_[0]

# Create a DataFrame to show the importance of predictors
importance_df = pd.DataFrame(coefs, index=X.columns, columns=['Importance'])
print(importance_df)

# Identify the most important predictor
most_important_predictor = importance_df.idxmax().values[0]
print(f"Most important predictor: {most_important_predictor}")

            Importance
PROFIL_I_R   -1.637798
VEH_INVL     -0.423860
ALCHL_I_2     0.529811
SUR_COND_2   -1.739258
SUR_COND_3   -1.722191
SUR_COND_4   -1.784499
SUR_COND_9   -1.128344
Most important predictor: ALCHL_I_2
