## __Car Accident Severity__

The objective of the project is to develop an automatic learning model / solution that allows identifying the main factors that affect the severity of an accident. For this, a database is provided that consists of information on the weather, the direction of the vehicles, the type of light that existed in the environment, the number of people involved in the accident, among others.

The Target variable is severity, which has the following codes:

- 3: Fatal, at least one death.
- 2b: Serious Injury
- 2: Injury
- 1: Prop damage
- 0: Unknown

### __Import Modules__

In [227]:
import warnings
warnings.simplefilter(action='ignore')

In [228]:
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline

In [230]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier 
from sklearn import metrics

### __Cleaning Data__

In [231]:
# Loading Data and validating NANS in it. 
# Si el nÃºmero de datos NAN es mayor al 50 % entonces no es una columna valida. 
variables = [
    "SEVERITYCODE", "X", "Y", "ADDRTYPE", "HITPARKEDCAR", "COLLISIONTYPE", "PERSONCOUNT", "PEDCOUNT",
    "PEDCYLCOUNT", "VEHCOUNT", "UNDERINFL", "WEATHER", "ROADCOND", "LIGHTCOND"
]
data = pd.read_csv("datasets/Data-Collisions.csv")[variables]
valid_columns = []
for i in data.columns: 
    if data[i].isna().sum()/len(data) >0.4: 
        pass
    else: 
        valid_columns.append(i)
        
data = data[valid_columns]
data.dropna(inplace = True)

### __Data Engineering__

In [232]:
#Adding Dummies
# Convert Categorical Variables: 
weather = pd.get_dummies(data.WEATHER)
road_con = pd.get_dummies(data.ROADCOND)
colicion_d  = pd.get_dummies(data.COLLISIONTYPE)
light_d  = pd.get_dummies(data.LIGHTCOND)
add_d  = pd.get_dummies(data.ADDRTYPE)


data= pd.concat([data, weather], axis = 1)
data= pd.concat([data, road_con], axis = 1)
data= pd.concat([data, colicion_d], axis = 1)
data= pd.concat([data, light_d], axis = 1)
data= pd.concat([data, add_d], axis = 1)

data["HITPARKEDCAR"] = data["HITPARKEDCAR"].apply(lambda x: 0 if x== "N" else 1)

variable = []

for i in data.UNDERINFL: 
    if i == "N": 
        variable.append(0)
    if i == "0": 
        variable.append(0)
    if i == "1": 
        variable.append(1)
    if i == "Y": 
        variable.append(1)
        
data["UNDERINFL"]= variable

data.drop(columns = ["WEATHER", "ROADCOND", "COLLISIONTYPE", "LIGHTCOND", "ADDRTYPE"], inplace =True) 

In [233]:
y = data["SEVERITYCODE"]
X = data.loc[:, data.columns != 'SEVERITYCODE']

In [234]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [235]:
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

In [236]:
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.6942166711919631
