# Capstone project

This jupyter notebook will be used mainly for the capstone project

In [1]:
import pandas as pd
import numpy  as np

In [2]:
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


# Introduction/Business Problem

I find a dataset called **'US_Accidents_June20.csv'** that complies some requirements like this:

1. Must have 'severity' as attribute.
2. It's big enough.
3. The main objective of the dataset is predict the severity.

This is the description on kaggle:

This is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to June 2020, using two APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 3.5 million accident records in this dataset.

### Main problem

With this dataset i have the data enough, so now defining the problem, is to predict the severity of a car accident based on attributes like temperature, wind, humidity, etc...
But first, we have to look if the data is clean, and choose the best method to model the data.

#### Bibliographical citations

* Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

* Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

# Data understanding

To understand better the data i draw a table with the attribute and the description

| Attribute  | Description |
|  ------    |   ------    |
|   ID       | This is a unique identifier of the accident record.       |
|   Source   | Indicates source of the accident report (i.e. the API which reported the accident.)       |
|   TMC      | A traffic accident may have a Traffic Message Channel (TMC) code which provides more detailed description of the event.       |
|   Severity       | Shows the severity of the accident, a number between 1 and 4, where 1 indicates the least impact on traffic (i.e., short delay as a result of the accident) and 4 indicates a significant impact on traffic (i.e., long delay). Note that severity reported by different sources may differ in their underlying impact on traffic, so please separate data from different sources when doing severity-based analysis.        |
|   Start_Time       | Shows start time of the accident in local time zone.       |
|   End_Time       | Shows end time of the accident in local time zone. End time here refers to when the impact of accident on traffic flow was dismissed.       |
|   Start_Lat| Shows latitude in GPS coordinate of the start point.         |
|   Start_Lng       | Shows longitude in GPS coordinate of the start point.       |
|   End_Lat       | Shows latitude in GPS coordinate of the end point.       |
|   End_Lng       | Shows longitude in GPS coordinate of the end point.       |
|   Distance(mi)       | The length of the road extent affected by the accident.       |
|   Description       | Shows natural language description of the accident.       |
|   Number       | Shows the street number in address record.       |
|   Street     | Shows the street name in address record.         |
|   Side     | Shows the relative side of the street (Right/Left) in address record.         |
|   City      | Shows the city in address record.        |
|   County     | Shows the county in address record.         |
|   State       | Shows the state in address record.       |
|   Zipcode       | Shows the country in address record.       |
|   Timezone       | Shows timezone based on the location of the accident (eastern, central, etc.).       |
|   Airport_Code       | Denotes an airport-based weather station which is the closest one to location of the accident.       |
|   Weather_Timestamp       | Shows the time-stamp of weather observation record (in local time).       |
|   Temperature(F)       | Shows the temperature (in Fahrenheit).       |
|   Wind_Chill(F)       | Shows the wind chill (in Fahrenheit).       |
|   Humidity(%)       | Shows the humidity (in percentage).       |
|   Pressure(in)       | Shows the air pressure (in inches).       |
|   Visibility(mi)       | Shows visibility (in miles).       |
|   Wind_Direction       | Shows wind direction.       |
|   Wind_Speed(mph)       | Shows wind speed (in miles per hour).       |
|   Precipitation(in)       | Shows precipitation amount in inches, if there is any.       |
|   Weather_Condition       | Shows the weather condition (rain, snow, thunderstorm, fog, etc.)       |
|   Amenity       | A POI annotation which indicates presence of amenity in a nearby location.       |
|   Bump       | A POI annotation which indicates presence of speed bump or hump in a nearby location.       |
|   Crossing       | A POI annotation which indicates presence of crossing in a nearby location.       |
|   Give_Way       | A POI annotation which indicates presence of give_way in a nearby location.       |
|   Junction       | A POI annotation which indicates presence of junction in a nearby location.       |
|   No_Exit       | A POI annotation which indicates presence of no_exit in a nearby location.       |
|   Railway       | A POI annotation which indicates presence of railway in a nearby location.       |
|   Roundabout       | A POI annotation which indicates presence of roundabout in a nearby location.       |
|   Station       | A POI annotation which indicates presence of station in a nearby location.       |
|   Stop       | A POI annotation which indicates presence of stop in a nearby location.       |
|   Traffic_Calming       | A POI annotation which indicates presence of traffic_calming in a nearby location.    |
|   Traffic_Signal       | A POI annotation which indicates presence of traffic_signal in a nearby location.      |
|   Turning_Loop       | A POI annotation which indicates presence of turning_loop in a nearby location.       |
|   Sunrise_Sunset       | Shows the period of day (i.e. day or night) based on sunrise/sunset.       |
|   Civil_Twilight       | Shows the period of day (i.e. day or night) based on civil twilight.       |
|   Nautical_Twilight       | Shows the period of day (i.e. day or night) based on nautical twilight.       |
|   Astronomical_Twilight       | Shows the period of day (i.e. day or night) based on astronomical twilight. |

In [4]:
filename='US_Accidents_June20.csv'
pdf = pd.read_csv(filename)

pdf.head(5)

Unnamed: 0,ID,Source,TMC,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,...,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight
0,A-1,MapQuest,201.0,3,2016-02-08 05:46:00,2016-02-08 11:00:00,39.865147,-84.058723,,,...,False,False,False,False,False,False,Night,Night,Night,Night
1,A-2,MapQuest,201.0,2,2016-02-08 06:07:59,2016-02-08 06:37:59,39.928059,-82.831184,,,...,False,False,False,False,False,False,Night,Night,Night,Day
2,A-3,MapQuest,201.0,2,2016-02-08 06:49:27,2016-02-08 07:19:27,39.063148,-84.032608,,,...,False,False,False,False,True,False,Night,Night,Day,Day
3,A-4,MapQuest,201.0,3,2016-02-08 07:23:34,2016-02-08 07:53:34,39.747753,-84.205582,,,...,False,False,False,False,False,False,Night,Day,Day,Day
4,A-5,MapQuest,201.0,2,2016-02-08 07:39:07,2016-02-08 08:09:07,39.627781,-84.188354,,,...,False,False,False,False,True,False,Day,Day,Day,Day
