# Report

## Introduction to Business Problem

**The Background**

For 2016 specifically, National Highway Traffic Safety Administration (NHTSA) data shows 37,461 people were killed in 34,436 motor vehicle crashes, an average of 102 per day in the USA.
In 2010, there were an estimated 5,419,000 crashes, 30,296 deadly, killing 32,999, and injuring 2,239,000. About 2,000 children under 16 die every year in traffic collisions. Records indicate that there were 3,613,732 motor vehicle fatalities in the United States from 1899 to 2013. (Source: https://en.wikipedia.org/wiki/Motor_vehicle_fatality_rate_in_U.S._by_year

Every killed or injured human is one too much. 

**The Problem**

There are many reasons accidents happen: distracted driving is the leading cause of car accidents, but also drunk driving or driving in bad weather conditions can be a reason.

How can accidents be prevented? 
By looking at data about accidents happened in Seattle, I will try to detect patterns that could help to warn drivers and thus prevent accidents from happening.
I will especially look into the weather, road and light conditions and how it helps to predict the severity of an accident. 

**The Opportunity**
The business opportunity could be a warning system, that helps reduce accidents. Insurance companies could install such systems directly at accident black spots or drivers could be implemented in navigation systems like Google Maps or Waze.


## Description of the Data
The Seattle SDOT Traffic Management Division and Traffic Records Group have gathered all collisions provided by SPD and recorded by Traffic Records. This includes all types of collisions in Seattle. 

The data gives the details per accident:
- Severity of collisions: Tells us about the extent of damage - property damage, injury or fatality
- Collision type (head on, involved pedestrians or cyclists)
- Time of accident date and time (Do accidents occur on weekdays or weekends and do they occur more at night?)
- Affected persons (if cyclists, pedestrians or vehicles where involved)
- Number of accidents where parked cars were hit
- Address: alleys, blocks or intersections
- What is known about the driver's situation when the accident happened (attentive/ unattentive, under influence or too speedy)
- weather, road and light conditions (rain, dry or wet road etc.)

Overall there are 194673 accidents recorded.

As mentioned above, I will especially look into the weather, road and light conditions and the corresponding severity of an accident.


# Methodology

## Cleaning of Data
### Dealing with missing data
With the function ".isnull()" I looked for missing data, which was to be found in the column Roadconditions.
I also filled the missing data in those columns like "PEDROWNOTGRNT" hat for example only a 'Y' for 'yes', but the value was missing, when it was not 'Y', so I assumed that if it was not filled, there was a 'no'/'N' to add.
If a column like "JUNCTIONTYPE" had missing data, I would fill it up with "unknown" using the .fillna function.

### Deleting unnecessary columns
With the drop function I deleted different columns, that where not needed like "EXCEPTRSNDESC" or "SDOTCOLNUM".

### Transforming objects into integers
Columns like "PEDROWNOTGRNT" had a 'Y' or 'N' as values. I replaced those with the replace function into '0' and '1' and then into integers to handle them better later on.

### One-hot representation
To make it easer to analyse the columns which had strings/objects as values, I converted them by using the .get_dummies function into '0' or '1'.

## Modeling
### Looking at the correlation
First, I looked at the correlation with the severitycode of all the columns. 

### Using the most correlated to perform different Machine Learning methods
I used these columns with a little higher correlation (corr > 0.2) to apply the Machine Learning methods.
I trained and split the dataframe and then performed Logistic Regression, Support Vector Machine, Decision Tree and the Random Forst Classifier.
Then I printed the Accurarcy Score, F1 Score, Precision and Recall.

### Grouping Roadconditions and Weather
When looking at the correlation the weather and roadcondition had not shown any high correlation with the severitycode at all. 
But I wanted to look what would happen if for instense I would group "rain", "snow", "ice" etc as bad weather. But it had not a big impact on the correlation.


# Results
The ones with a little higher correlation (corr > 0.2) are accidents at intersections or accistents that involved parked cars, pedestrians or cyclists. 

The results of the diffrend Machine Learning methods are the following:

The best method to predict the severity of an accident is the Decision Tree method.

<table>
  <thead>
    <tr>
      <th>Model</th>
      <th>Accuracy</th>
      <th>F1 Score</th>
      <th>Precision</th>
      <th>Recall</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Logistic Regression</td>
      <td>0.751637</td>
      <td>0.845567</td>
      <td>0.749243</td>
      <td>0.970311</td>
    </tr>
    <tr>
      <td>Decision Tree</td>
      <td>0.752023</td>
      <td>0.84704</td>
      <td>0.74594</td>
      <td>0.979841</td>
    </tr>
      <tr>
      <td>Random Forest</td>
      <td>0.751946</td>
      <td>0.847004</td>
      <td>0.745864</td>
      <td>0.979878</td>
    </tr>
      <tr>
      <td>Support Vector Machin</td>
      <td>0.75061</td>
      <td>0.84761</td>
      <td>0.741155</td>
      <td>0.989774</td>
    </tr>
  </tbody>
</table>

# Discussion

When I started this projects, I assumpted that bad weather would be a big influence of probability of an accident. It does not look like it when looking at this big data set of Seattle. Other factors are so much more important to look at, when you want to prevent accidents from happening.

# Conclusion
If you as a driver approach a intersection, parked cars, pedestrians or cyclists: be more careful. 

Also the ones responsible for designing the intersections, pedestrian crossings, cycle lanes and parking lots should be mindful in design them. They could improve signals at intersection for the different participants of traffic (cars, pedestrians and cyclist) so that it is more clear who goes or drives where.
