<h4 align="center"> G8_Project Proposal_IST5520</h4>

<h1 align="center"> Missouri Traffic Accident Data</h1>
<h2 align="center"> Descriptive & Predictive Analysis on different factors influencing road accidents </h2>
<h3 align="center"> Group Members: Dennis Baleta, Sai Rachana Bandi, Austin Kovis, Debasis Roy, Apurv Saxena </h3>

## 1. Introduction:
US-Accidents can be used for numerous applications such as real-time accident prediction, studying accident hotspot locations, casualty analysis and extracting cause and effect rules to predict accidents, and studying the impact of precipitation or other environmental stimuli on accident occurrence. This data has been collected in real-time, using multiple Traffic APIs. Currently, it contains data that is collected from February 2016 to December 2019 for the Contiguous United States.

This is a countrywide traffic accident dataset, which covers 49 states of the United States from which we are taking 3 states into consideration. The data is collected from February 2016 to December 2019, using several data providers, including two APIs that provide streaming traffic incident data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 3.0 million accident records in this dataset.

Most of the accidents take place due to bad weather conditions (like heavy fog, rain, sleet, wind), traffic, because of drivers mood variations and so on.

Our study will focus on Weather_Condition factor, Temperature factor, Visibility factor, Wind_Speed factor, Wind_Direction factor and all other weather related factors. Also Severity factor, factors related to the place where the accident has occurred. Using this data we will create a model which would predict the list of accident prone areas for the next day based on the weather conditions, which would help the department of transportation and police to take necessary precautions to avoid accidents from occuring or to take immediate action if required.

In this project, we want to conduct an exploratory study on the road accidents and factors inﬂuencing it. Speciﬁcally, we want to answer the following research questions:
1. What are the predicting variables actually aﬀecting the road accidents?
2. How does weather conditions impact road accidents?
3. What are factors of the accident and how it could be mapped with severity?


## 2. Data
The dataset is collected from the below link

https://www.kaggle.com/sobhanmoosavi/us-accidents

The dataset will be filtered to contain the data of three states. This is due to the size of the original dataset being too large to feasibly manipulate. The three states the new dataset will be based on will be Missouri (MO), California (CA), and Maryland (MD). This will pull data from the west coast, midwest, and the east coast of the United States.

The final dataset will consist of 49 columns. Below is the description and details.

In [1]:
# Import Modules
import pandas as pd

In [3]:
#Read in the data
dat = pd.read_csv("C:\\Users\\ajkov\\us-accidents 5520 datafile\\US_Accidents_Dec19.csv")

In [11]:
#Above shows us the data types of each variable included within the study.
dat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2974335 entries, 0 to 2974334
Data columns (total 49 columns):
ID                       object
Source                   object
TMC                      float64
Severity                 int64
Start_Time               object
End_Time                 object
Start_Lat                float64
Start_Lng                float64
End_Lat                  float64
End_Lng                  float64
Distance(mi)             float64
Description              object
Number                   float64
Street                   object
Side                     object
City                     object
County                   object
State                    object
Zipcode                  object
Country                  object
Timezone                 object
Airport_Code             object
Weather_Timestamp        object
Temperature(F)           float64
Wind_Chill(F)            float64
Humidity(%)              float64
Pressure(in)             float64
Visibility(mi

In [7]:
#Visualizes the data in a readable format
dat.head(1).transpose()

Unnamed: 0,0
ID,A-1
Source,MapQuest
TMC,201
Severity,3
Start_Time,2016-02-08 05:46:00
End_Time,2016-02-08 11:00:00
Start_Lat,39.8651
Start_Lng,-84.0587
End_Lat,
End_Lng,


In [8]:
#Apply() allows us to see what variables include NULL values and how many NULL values are there.
dat.apply(lambda x: sum(x.isnull()), axis=0)

ID                             0
Source                         0
TMC                       728071
Severity                       0
Start_Time                     0
End_Time                       0
Start_Lat                      0
Start_Lng                      0
End_Lat                  2246264
End_Lng                  2246264
Distance(mi)                   0
Description                    1
Number                   1917605
Street                         0
Side                           0
City                          83
County                         0
State                          0
Zipcode                      880
Country                        0
Timezone                    3163
Airport_Code                5691
Weather_Timestamp          36705
Temperature(F)             56063
Wind_Chill(F)            1852623
Humidity(%)                59173
Pressure(in)               48142
Visibility(mi)             65691
Wind_Direction             45101
Wind_Speed(mph)           440840
Precipitat

In [10]:
#Summary of the statistics
dat.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
TMC,2246264.0,207.831632,20.329586,200.0,201.0,201.0,201.0,406.0
Severity,2974335.0,2.36019,0.541473,1.0,2.0,2.0,3.0,4.0
Start_Lat,2974335.0,36.493605,4.918849,24.555269,33.550402,35.849689,40.37026,49.0022
Start_Lng,2974335.0,-95.426254,17.218806,-124.623833,-117.291985,-90.250832,-80.918915,-67.11317
End_Lat,728071.0,37.580871,5.004757,24.57011,33.957554,37.90367,41.37263,49.075
End_Lng,728071.0,-99.976032,18.416647,-124.497829,-118.28661,-96.63169,-82.32385,-67.10924
Distance(mi),2974335.0,0.285565,1.548392,0.0,0.0,0.0,0.01,333.63
Number,1056730.0,5837.003544,15159.278074,0.0,837.0,2717.0,7000.0,9999997.0
Temperature(F),2918272.0,62.351203,18.788549,-77.8,50.0,64.4,76.0,170.6
Wind_Chill(F),1121712.0,51.326849,25.191271,-65.9,32.0,54.0,73.0,115.0


### Key Information in The Dataset


SlNo. Attribute: Description
1. ID: This is a unique identifier of the accident record.
2. Source: Indicates source of the accident report (i.e. the API which reported the accident.).
3. TMC: A traffic accident may have a Traffic Message Channel (TMC) code which provides a more detailed description of the event.
4. Severity: Shows the severity of the accident, a number between 1 and 4, where 1 indicates the least impact on traffic (i.e., short delay as a result of the accident) and 4 indicates a significant impact on traffic (i.e., long delay).
5. Start_Time: Shows start time of the accident in the local time zone.
6. End_Time: Shows end time of the accident in the local time zone.
7. Start_Lat: Shows latitude in GPS coordinate of the start point.
8. Start_Lng: Shows longitude in GPS coordinate of the start point.
9. End_Lat: Shows latitude in GPS coordinate of the end point.
10. End_Lng: Shows longitude in GPS coordinate of the end point.
11. Distance(mi): The length of the road extent affected by the accident.
12. Description: Shows natural language description of the accident.
13. Number: Shows the street number in the address field.
14. Street: Shows the street name in the address field.
15. Side: Shows the relative side of the street (Right/Left) in the address field.
16. City: Shows the city in the address field.
17. County: Shows the county in the address field.
18. State: Shows the state in the address field.
19. Zip Code: Shows the zip code in the address field.
20. Country: Shows the country in the address field.
21. Timezone: Shows timezone based on the location of the accident (eastern, central, etc.).
22. Airport_Code: Denotes an airport-based weather station which is the closest one to the location of the accident.
23. Weather_Timestamp: Shows the time-stamp of weather observation record (in local time).
24. Temperature(F): Shows the temperature (in Fahrenheit).
25. Wind_Chill(F): Shows the wind chill (in Fahrenheit).
26. Humidity(%): Shows the humidity (in percentage).
27. Pressure(in): Shows the air pressure (in inches).
28. Visibility(mi): Shows visibility (in miles).
29. Wind_Direction: Shows wind direction.
30. Wind_Speed(mph): Shows wind speed (in miles per hour).
31. Precipitation(in): Shows precipitation amount in inches, if there is any.
32. Weather_Condition: Shows the weather condition (rain, snow, thunderstorm, fog, etc.)
33. Amenity: A POI annotation which indicates the presence of amenity in a nearby location.
34. Bump: A POI annotation which indicates presence of speed bump or hump in a nearby location.
35. Crossing: A POI annotation which indicates the presence of crossing in a nearby location.
36. Give_Way: A POI annotation which indicates the presence of a give_way in a nearby location.
37. Junction: A POI annotation which indicates the presence of a junction in a nearby location.
38. No_Exit: A POI annotation which indicates the presence of no_exit in a nearby location.
39. Railway: A POI annotation which indicates the presence of railway in a nearby location.
40. Roundabout: A POI annotation which indicates the presence of roundabout in a nearby location.
41. Station: A POI annotation which indicates the presence of a station in a nearby location.
42. Stop: A POI annotation which indicates the presence of a stop in a nearby location.
43. Traffic_Calming: A POI annotation which indicates the presence of traffic_calming in a nearby location.
44. Traffic_Signal: A POI annotation which indicates the presence of traffic_signal in a nearby location.
45. Turning_Loop: A POI annotation which indicates the presence of a turning_loop in a nearby location.
46. Sunrise_Sunset: Shows the period of day (i.e. day or night) based on sunrise/sunset.
47. Civil_Twilight: Shows the period of day (i.e. day or night) based on civil twilight.
48. Nautical_Twilight: Shows the period of day (i.e. day or night) based on nautical twilight.
49. Astronomical_Twilight: Shows the period of day (i.e. day or night) based on astronomical twilight.



## 3. Timeline

    Sl. No       Tasks                           Activity                                     Start/Complete Date
      1      Data Exploration             Explore dataset                                     02/20-03/01
      
      2      Data Analysis I              Data Remodeling, Data Manipulation                  03/02-03/16
      
      3      Data Analysis II             Create Visualizations, Gather Insights              03/17-03/31
      
      4      Predictive Analysis          Work on the Machine Learning part of the project    04/01-04/14
      
      5      Presentation Preparation     List the important topics that will need to be 
                                          covered during the presentation, assign slides to 
                                          team members, Final check of the slides before 
                                          the presentation                                    04/15-04/21
                                          
      6      Write Report                 Gather requirements for the report, assign 
                                          sections within the report to team members, 
                                          finalize the report                                 04/22-05/05
                                          
      7      Presentation                 Project Presentation                                   05/06
