# CSI 4142 Data Science
## Assignment 3 - Predictive analysis - Classification

### Identification
Name: Eli Wynn
Student Number: 300248135

Name: Jack Snelgrove
Student Number: 300247435

Our datasets have been uploaded from the public repository:

https://github.com/eli-wynn/Datasets

### Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

### Dataset Overview
This dataset contains detailed records of simulated road accident data, focusing on factors influencing survival outcomes. The dataset includes demographic, behavioral, and situational attributes, providing valuable insights into how various factors impact the survival probability during road accidents.

#### Dataset Shape
- Rows: 200
- Columns: 6

#### Features and Descriptions
Below is a list of features included in the dataset along with their descriptions:

1. `Age` (Numerical)
- The age of the individual involved in the accident

2. `Gender` (Categorical)
- The gender of the individual involved in the accident 

3. `Speed of Impact` (Numerical)
- The speed the car was moving on impact

4. `Helmet used` (Categorical)
- Whether a helmet was used or not

5. `Seatbelt used` (Categorical)
- Whether a helmet was used or not

6. `Survived` (Numerical)
- Whether the individual survived or not

### Importing Dataset

In [None]:
crash  = "https://raw.githubusercontent.com/eli-wynn/Datasets/refs/heads/main/accident.csv"
crashData = pd.read_csv(crash)

#### Cleaning / Imputing Dataset

In [None]:
#Find columns with missing data
missing_cols = crashData.columns[crashData.isnull().any()]

#Impute missing Gender values with the mode
crashData['Gender'].fillna(crashData['Gender'].mode()[0], inplace=True)

#Remove rows with missing 'Speed of Impact' values as this metric is crucial to the target and shouldnt be imputated
crashData.dropna(subset=['Speed_of_Impact'], inplace=True)

#Checking for valid age
crashData = crashData[(crashData['Age'] > 0) & (crashData['Age'] < 120)]

#Checking for valid gender
crashData = crashData[crashData['Gender'].isin(['Male', 'Female'])]

#Check for valid yes/no values
crashData = crashData[crashData['Helmet_Used'].isin(['Yes', 'No'])]
crashData = crashData[crashData['Seatbelt_Used'].isin(['Yes', 'No'])]

#Categorical values to numerical
crashData['Gender'] = crashData['Gender'].map({'Male': 0, 'Female': 1})
crashData['Helmet_Used'] = crashData['Helmet_Used'].map({'No': 0, 'Yes': 1})
crashData['Seatbelt_Used'] = crashData['Seatbelt_Used'].map({'No': 0, 'Yes': 1})
