# Extracting Cause and Effect to Predict Automotive Accidents
In this notebook, we will be creating different Machine Learning Models to predict car accident severity (as measured by impact on traffic)

### The Problem:
Car accidents come with a great cost: to the drivers and passengers involved in the accidents, to the individuals who must wait in congested highways, and to the transportation industries at large. These are immense tragedies with costly repercussions. This project seeks to develop a model that will tell us where and under what conditions severe car accidents are likely to occur. This information can then be implemented into solutions that will minimize the total costs of car accidents and hopefully save lives.

The model developed herein will be developed and proposed as valuable information for the transportation company Lyft, Inc. to minimze their insurance claims costs by warning drivers that they are in high-risk situations or directing drivers towards lower-risk route options.

### The Data
Data Source: https://osu.app.box.com/v/us-accidents-june20

Metadata: https://smoosavi.org/datasets/us_accidents

#### Descripion
This is a countrywide traffic accident dataset, which covers 49 states of the United States. The data was continuously collected from February 2016 thru June 2020, using several data providers, including two APIs which provide streaming traffic event data. These APIs broadcast traffic events captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 3.5 million accident records in this dataset.

#### Discussion
This data contains 49 total features that can be thought of as 3 main overarching categories: (1) Location, (2) Weather, and (3) Time of Day. Listed Below.

I will incorporate features that will make the most accurate model possible, while remaining robust and relevant to those features that will make for the best formulation of solutions to the problem of severe car accidents (such as location, weather, and time of day). Since I am trying to discover which features are highly correlated with severe car accidents (and thus presenting a possibility of causality), I will be cleaning my data of any collinearity.

The dataset provides us with the assets necessary to make the following inquiries:
- Where are severe car accidents most likely to occur?
- When are severe car accidents most likely to occur?
- Under what conditions are severe car accidents most likely to occur?

(1) Location:
- 7-10 - "Exact Location"
- 7 - Start_Lat
- 8 - Start_Lng
- 9 - End_Lat
- 10 - End_Lng
- 13-20 - "Address Data"
- 13 - Number
- 14 - Street
- 15 - Side
- 16 - City
- 17 - County
- 18 - State
- 19 - Zipcode
- 20 - Country
- 33-45 - "What exists nearby"
- 33 - Amenity
- 34 - Bump
- 35 - Crossing
- 36 - Give_Way
- 37 - Junction
- 38 - No_Exit
- 39 - Railway
- 40 - Roundabout
- 41 - Station
- 42 - Stop
- 43 - Traffic_Calming
- 44 - Traffic_Signal
- 45 - Turning_Loop

(2) Weather:
- 23 - Weather_Timestamp
- 24 - Temperature(F)
- 25 - Wind_Chill(F)
- 26 - Humidity(%)
- 27 - Pressure(in)
- 28 - Visibility
- 29 - Wind_Direction
- 30 - Wind_Speed(mph)
- 31 - Precipitation(in)
- 32 - Weather_Condition

(3) Time of Day:
- 5 - Start_Time
- 6 - End_Time
- 21 - Timezone
- 23 - Weather_Timestamp
- 46 - Sunrise_Sunset
- 47 - Civil_Twilight
- 48 - Nautical_Twilight
- 49 - Astronomical_Twilight

Target Variable(s):
- 4 - Severity
- 11 - Distance(mi)

Extra Information:
- 12 - Description
- 3 - TMC ("Traffic Message Channel" code)
- 2 - Source
- 22 - Airport_Code

##### Acknowledgements:
Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic 
    Accident Dataset.”, arXiv preprint arXiv:1906.05409 (2019).

Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. “Accident 
    Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights.” In proceedings of the 27th ACM 
    SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

### Data Wrangling

In [14]:
import pandas as pd

data = pd.read_csv("US_Accidents.csv")
data.head()

Unnamed: 0,ID,Source,TMC,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,...,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight
0,A-1,MapQuest,201.0,3,2016-02-08 05:46:00,2016-02-08 11:00:00,39.865147,-84.058723,,,...,False,False,False,False,False,False,Night,Night,Night,Night
1,A-2,MapQuest,201.0,2,2016-02-08 06:07:59,2016-02-08 06:37:59,39.928059,-82.831184,,,...,False,False,False,False,False,False,Night,Night,Night,Day
2,A-3,MapQuest,201.0,2,2016-02-08 06:49:27,2016-02-08 07:19:27,39.063148,-84.032608,,,...,False,False,False,False,True,False,Night,Night,Day,Day
3,A-4,MapQuest,201.0,3,2016-02-08 07:23:34,2016-02-08 07:53:34,39.747753,-84.205582,,,...,False,False,False,False,False,False,Night,Day,Day,Day
4,A-5,MapQuest,201.0,2,2016-02-08 07:39:07,2016-02-08 08:09:07,39.627781,-84.188354,,,...,False,False,False,False,True,False,Day,Day,Day,Day


### Exploratory Data Analysis

### Maps

### Feature Set

### Train Test Split

### Normalize the Data

### Plots

### Machine Learning Models

### Results, Evaluations, and Discussion

### Conclusion and Recommendation