<h1 align = "Center"> IST 5520 Spring 2020: Group 1 Project Proposal </h1>
<h2 align = "Center"> An Analysis of Vermont Crashes Data</h2>
<h3 align = "Center"> By: Bryce Cordry, Kyle Johnson, Matthew Kovar, Yitian Luo, Brian Middleton </h3>

# Introduction

Traffic crashes happen every day in the world and they rank as the 9th leading cause of death. In the United States, there are an average of 16,438 car accidents per day. Moreover, nearly 1.25 million people die in car accidents per year, by this we mean that, car accidents cause 3,287 deaths per day. （https://www.thewanderingrv.com/car-accident-statistics/) Therefore, it is important for us to get more information about car crashes and try to avoid them.

We found a dataset of car crashes in the State of Vermont from the year 2014 to 2018. We are considering the dataset from the perspective of an insurance company using telematics devices in vehicles to create individualized risk profiles for policyholders located in Vermont and set their premiums accordingly. We are interested in determining what kinds of characteristics are responsible for vehicle accidents and how to use this information to adjust a policyholder's future premium once they are in an accident. To be more specific, we want to figure out what matters the most in the accidents. Also, we hope to find out factors that lead to various outcomes (property damage, injury or fatal). In addition, we want to find out areas or roads that are most prone to accidents.

# Data Source and Collection

The dataset was downloaded from the Vermont Open Geodata Portal (https://geodata.vermont.gov/datasets/VTrans::vt-crashes-2018?orderBy=ACCIDENTDATE&orderByAsc=false). 
We combined the data from year 2014 to year 2018 and it contains 61,562 car accidents in Vermont. The data was collected by various reporting agencies within Vermont. 

This dataset contains 36 columns listed below. Some of the key information includes the type of injury, the impairment of the driver, the accident location, date, weather, the road conditions, as well as the reporting agency. All of this information can be useful for us in determining how the insured's future premiums will be adjusted.
 

- 	OBJECTID 
-   REPORTINGAGENCYid 
- 	ReportingAgency 
- 	REPORTNUMBER 
- 	ACCIDENTDATE 
- 	STREETADDRESS 
- 	INTERSECTIONWITH 
- 	DirOfCollision 
- 	RoadGroup 
- 	AOTACTUALMILEPOINT 
- 	RoadCharacteristics 
- 	NonReportableAddress 
- 	CITYORTOWNid 
- 	CITYORTOWN 
- 	EASTING 
- 	NORTHING
- 	AOTROUTEid 
-	AOTROUTE 
-	LRSNUMBER 
-	HOWMAPPED 
-	Animal 
-	Impairment 
-	Involving 
-	Weather 
-	DayNight 
-	InjuryType 
-	LOC_ERROR 
-	RDFLNAME 
-	SurfaceCondition 
-	RoadCondition 
-	Route 
-	LATITUDE 
-	LONGITUDE 
-	AOTROADWAYGROUPid 


# Technical Approach

### Data Manipulation

Use Pandas/Numpy/Scipy for datasets manipulation.

In [1]:
# Import modules
import pandas as pd
import numpy as np

# Import and clean data, deal with missing values
years = [2015, 2016, 2017, 2018]
df_dict = {year:pd.read_csv("VT_Crashes__{}.csv".format(str(year))) for year in years}

for year, df in df_dict.items():
  df_dict[year] = df.rename(columns={"ACCIDENTDA":"ACCIDENTDATE", 
                                     "AOTACTUALM":"AOTACTUALMILEPOINT",
                                     "INTERSECTI":"INTERSECTIONWITH",
                                     "STREETADDR":"STREETADDRESS",
                                     "REPORTNUMB":"REPORTNUMBER",
                                     "ReportingAgency":"REPORTINGAGENCY",
                                     "DirOfCollision":"DIROFCOLLI",
                                     "VCSG_LRSNUMBER":"LRSNUMBER",
                                     "REPORTINGA":"REPORTINGAGENCYid",
                                     "VCSG_EASTING":"EASTING",
                                     "VCSG_NORTHING":"NORTHING",
                                     "VCSG_AOTROUTE":"AOTROUTE",
                                     "VCSG_CITYORTOWN":"CITYORTOWN"})
  df_dict[year]["Year"] = year

df_dict[2018] = df_dict[2018].rename(columns={"LATITUDE":"LAT_DD","LONGITUDE":"LONG_DD"})
df_dict[2017] = df_dict[2017].rename(columns={"GIS_LATITUDE":"LAT_DD","GIS_LONGITUDE":"LONG_DD"})

df = pd.concat(df_dict.values(), sort=True)

df = df.drop(columns=['ACCIDENTTI','AOTROADWAYGROUPid','AOTROUTEid','CITYORTOWNid','MILEMARKER1','MILEMARKER_1',
                      'MILEMARKER_2','NUMBER','NUMBER1','NUMBER2','NUMBER3','OBJECTID','RDFLNAME','REPORTINGAGENCYid',
                      'VCSG_AOTROUTEid','VCSG_CITYORTOWNid','NORTHING','EASTING','DIRFROMNEA','CrashType',
                     'DISTANCE_1','POSTEDSPEE','VCSG_LATITUDE','VCSG_LONGITUDE','Route', 'AOTROADWAY'])

df['AOTACTUALMILEPOINT'] = df['AOTACTUALMILEPOINT'].fillna("Unmarked")
df['AOTROUTE'] = df['AOTROUTE'].fillna("Unknown")
df['Animal'] = df['Animal'].fillna("None/Other")
df['CITYORTOWN'] = df['CITYORTOWN'].fillna("Unknown")
df['DIROFCOLLI'] = df['DIROFCOLLI'].fillna("Unknown")
df['INTERSECTIONWITH'] = df['INTERSECTIONWITH'].fillna("Unknown")
df['Impairment'] = df['Impairment'].fillna("None")
df['Involving'] = df['Involving'].fillna("None")
df['LOCALID'] = df['LOCALID'].fillna("Unknown")
df['LOC_ERROR'] =df['LOC_ERROR'].fillna("NO ERROR")
df['LRSNUMBER'] = df['LRSNUMBER'].fillna("None")
df['NonReportableAddress'] = df['NonReportableAddress'].fillna("Reportable")
df['RoadCharacteristics'] = df['RoadCharacteristics'].fillna("Other - Explain in Narrative")
df['Weather'] = df['Weather'].fillna("Unknown")
df['STREETADDRESS'] = df['STREETADDRESS'].fillna("Unknown")

values_dist = df.DayNight.value_counts(normalize=True)
missing = df['DayNight'].isnull()
df.loc[missing,'DayNight'] = np.random.choice(values_dist.index, size=len(df[missing]),p=values_dist.values)

values_dist = df.InjuryType.value_counts(normalize=True)
missing = df['InjuryType'].isnull()
df.loc[missing,'InjuryType'] = np.random.choice(values_dist.index, size=len(df[missing]),p=values_dist.values)

Freezing_Precipitation_df = df[(df['SurfaceCondition'].isin(['Snow','Ice','Slush','Wet']))]
values_dist = Freezing_Precipitation_df.SurfaceCondition.value_counts(normalize=True)
df['SurfaceCondition'] = np.where(((df.SurfaceCondition.isnull())& (df['Weather'] == 'Rain')),'Wet',df.SurfaceCondition)
df['SurfaceCondition'] = np.where(((df.SurfaceCondition.isnull())& (df['Weather'] == 'Clear')),'Dry',df.SurfaceCondition)
df['SurfaceCondition'] = np.where(((df.SurfaceCondition.isnull())& (df['Weather'] == 'Freezing Precipitation')),np.random.choice(values_dist.index,p=values_dist.values),df.SurfaceCondition)
df['RoadCondition'] = np.where(((df.RoadCondition.isnull())& (df['Weather'] == 'Freezing Precipitation')),'Road Surface Condition(wet, icy, snow, slush, etc)',df.RoadCondition)
df['RoadCondition'] = np.where(((df.RoadCondition.isnull())& (df['Weather'] == 'Rain')),'Road Surface Condition(wet, icy, snow, slush, etc)',df.RoadCondition)

values_dist = df.SurfaceCondition.value_counts(normalize=True)
missing = df['SurfaceCondition'].isnull()
df.loc[missing,'SurfaceCondition'] = np.random.choice(values_dist.index, size=len(df[missing]),p=values_dist.values)

values_dist = df.RoadCondition.value_counts(normalize=True)
missing = df['RoadCondition'].isnull()
df.loc[missing,'RoadCondition'] = np.random.choice(values_dist.index, size=len(df[missing]),p=values_dist.values)

df = df.dropna()

df = df.astype(str)

nullCount = df.isna().sum()
nullCount.to_csv("nullCount.csv")
print(type(nullCount))
df.to_csv("combined.csv")

<class 'pandas.core.series.Series'>


### Data Analysis

Analyze the different accidents type under different conditions. Data visualization including hot map of the areas that car crashes happened in Vermont and the histogram of the accidents under different situations.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23735 entries, 60 to 10309
Data columns (total 29 columns):
ACCIDENTDATE            23735 non-null object
AOTACTUALMILEPOINT      23735 non-null object
AOTROUTE                23735 non-null object
Animal                  23735 non-null object
CITYORTOWN              23735 non-null object
DIROFCOLLI              23735 non-null object
DayNight                23735 non-null object
HOWMAPPED               23735 non-null object
INTERSECTIONWITH        23735 non-null object
Impairment              23735 non-null object
InjuryType              23735 non-null object
Involving               23735 non-null object
LAT_DD                  23735 non-null object
LOCALID                 23735 non-null object
LOC_ERROR               23735 non-null object
LONG_DD                 23735 non-null object
LRSNUMBER               23735 non-null object
NonReportableAddress    23735 non-null object
REPORTINGAGENCY         23735 non-null object
REPORTNUMBER    

In [4]:
# Get the descriptive summary
df.describe().transpose()

Unnamed: 0,count,unique,top,freq
ACCIDENTDATE,23735,23195,2017-05-16T00:00:00.000Z,4
AOTACTUALMILEPOINT,23735,1774,999.99,7922
AOTROUTE,23735,1631,Unknown,2640
Animal,23735,6,None/Other,19934
CITYORTOWN,23735,253,Burlington,3000
DIROFCOLLI,23735,23,Unknown,6069
DayNight,23735,2,Day,17909
HOWMAPPED,23735,4,LRS,15777
INTERSECTIONWITH,23735,10553,Unknown,4210
Impairment,23735,4,,22710


In [5]:
# Show relative frequency tabel of variables
AOTR=pd.crosstab(index=df['AOTROUTE'],columns="Percent")/pd.crosstab(index=df['AOTROUTE'],columns="Percent").sum()
AOTR.sort_values(by='Percent',ascending=False).head()

col_0,Percent
AOTROUTE,Unnamed: 1_level_1
Unknown,0.111228
US-7,0.076301
I-89,0.071203
I-91,0.045755
US-2,0.040573


In [6]:
Ani=pd.crosstab(index=df['Animal'],columns="Percent")/pd.crosstab(index=df['Animal'],columns="Percent").sum()
Ani.sort_values(by='Percent',ascending=False)

col_0,Percent
Animal,Unnamed: 1_level_1
None/Other,0.839857
Unknown,0.133979
Deer,0.021951
Wild,0.001812
Domestic,0.001222
Moose,0.00118


In [7]:
CITY=pd.crosstab(index=df['CITYORTOWN'],columns="Percent")/pd.crosstab(index=df['CITYORTOWN'],columns="Percent").sum()
CITY.sort_values(by='Percent',ascending=False).head()

col_0,Percent
CITYORTOWN,Unnamed: 1_level_1
Burlington,0.126396
Brattleboro,0.05612
South Burlington,0.053255
Essex,0.046303
Bennington,0.042595


In [8]:
DIRE=pd.crosstab(index=df['DIROFCOLLI'],columns="Percent")/pd.crosstab(index=df['DIROFCOLLI'],columns="Percent").sum()
DIRE.sort_values(by='Percent',ascending=False).head()

col_0,Percent
DIROFCOLLI,Unnamed: 1_level_1
Unknown,0.255698
Single Vehicle Crash,0.22924
Rear End,0.161955
Other - Explain in Narrative,0.083126
Same Direction Sideswipe,0.067622


In [9]:
INTERW=pd.crosstab(index=df['INTERSECTIONWITH'],columns="Percent")/pd.crosstab(index=df['INTERSECTIONWITH'],columns="Percent").sum()
INTERW.sort_values(by='Percent',ascending=False).head()

col_0,Percent
INTERSECTIONWITH,Unnamed: 1_level_1
Unknown,0.177375
Parking Lot,0.008848
Main St,0.008047
Main Street,0.003834
Maple St,0.003708


In [10]:
Imp=pd.crosstab(index=df['Impairment'],columns="Percent")/pd.crosstab(index=df['Impairment'],columns="Percent").sum()
Imp.sort_values(by='Percent',ascending=False).head()

col_0,Percent
Impairment,Unnamed: 1_level_1
,0.956815
Alcohol,0.041711
Drugs,0.000927
Alcohol and Drugs,0.000548


In [11]:
Inv=pd.crosstab(index=df['Involving'],columns="Percent")/pd.crosstab(index=df['Involving'],columns="Percent").sum()
Inv.sort_values(by='Percent',ascending=False).head()

col_0,Percent
Involving,Unnamed: 1_level_1
,0.921171
Heavy Truck,0.048957
Motorcycle,0.012555
Pedestrian,0.01146
Bicycle,0.005856


In [12]:
RoadChar=pd.crosstab(index=df['RoadCharacteristics'],columns="Percent")/pd.crosstab(index=df['RoadCharacteristics'],columns="Percent").sum()
RoadChar.sort_values(by='Percent',ascending=False)

col_0,Percent
RoadCharacteristics,Unnamed: 1_level_1
Not at a Junction,0.396967
Unknown,0.230756
T - Intersection,0.106552
Parking Lot,0.099178
Four-way Intersection,0.089362
Driveway,0.022035
Other - Explain in Narrative,0.017653
Y - Intersection,0.012682
Off Ramp,0.007668
Traffic circle / roundabout,0.006404


In [13]:
StAdd=pd.crosstab(index=df['STREETADDRESS'],columns="Percent")/pd.crosstab(index=df['STREETADDRESS'],columns="Percent").sum()
StAdd.sort_values(by='Percent',ascending=False).head()

col_0,Percent
STREETADDRESS,Unnamed: 1_level_1
Unknown,0.029282
Main St,0.018707
Shelburne Rd,0.007162
I-91,0.005519
Putney Rd,0.004929


In [14]:
# Contigency tables
pd.crosstab(df['InjuryType'],df['Weather'], margins=True)

Weather,Clear,Cloudy,Freezing Precipitation,Rain,Unknown,Wind,All
InjuryType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Fatal,75,24,10,8,28,0,145
Injury,2094,811,457,301,876,10,4549
Property Damage Only,7582,2889,1999,996,5535,40,19041
All,9751,3724,2466,1305,6439,50,23735


In [19]:
pd.crosstab(df['DIROFCOLLI'],df['Weather'], margins=True)

Weather,Clear,Cloudy,Freezing Precipitation,Rain,Unknown,Wind,All
DIROFCOLLI,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Head On,410,146,183,55,32,5,831
"Left Turn and Thru, Angle Broadside -->v--",432,137,41,48,21,0,679
"Left Turn and Thru, Broadside v<--",205,67,27,25,10,1,335
"Left Turn and Thru, Head On ^v--",70,25,6,6,3,0,110
"Left Turn and Thru, Same Direction Sideswipe/Angle Crash vv--",83,35,9,6,3,0,136
"Left Turns, Opposite Directions, Head On/Angle Crash --^v--",18,6,0,4,0,0,28
"Left Turns, Same Direciton, Rear End v--v--",9,2,1,2,0,0,14
"Left Turns, Same Direction, Rear End v--v--",11,4,2,1,0,0,18
"Left and Right Turns, Simultaneous Turn Crash --vv--",24,7,4,3,1,0,39
"No Turns, Thru moves only, Broadside ^<",722,259,146,86,33,3,1249


In [15]:
pd.crosstab(df['RoadCondition'], df['InjuryType'], margins=True)

InjuryType,Fatal,Injury,Property Damage Only,All
RoadCondition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Debris,0,11,16,27
Non-highway work,0,0,9,9
,108,3311,13901,17320
Not reported,0,11,39,50
Obstruction in roadway,0,29,60,89
Other - Explain in Narrative,4,87,168,259
"Road Surface Condition(wet, icy, snow, slush, etc)",27,971,4311,5309
"Rut, holes, bumps",0,21,46,67
"Shoulders (none, low, soft, high)",0,12,39,51
"Traffic control device inoperative, missing, or obscured",0,2,13,15


In [16]:
pd.crosstab(df['RoadGroup'], df['InjuryType'], margins=True)

InjuryType,Fatal,Injury,Property Damage Only,All
RoadGroup,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"City, Village or Urban Compact Street not in FA Urban Area (Class 2 and 3 Non-Federal Aid)",24,614,2059,2697
Federal Aid Secondary System (Class 2 TH),9,216,639,864
Federal Aid Urban System (Class 2 TH's and 3 TH's only),2,353,2098,2453
Minor Collector - Non Fed Aid Rural TH,4,161,420,585
"Other Public Roadway (Rest Areas, Shopping Center - anything open to public)",4,133,2753,2890
Private Property (Driveways),0,4,40,44
Ramp or Spur,0,45,187,232
"State Highway numbered route, Class 1 TH",9,495,2364,2868
"State Highway numbered route, State owned",91,2416,7283,9790
"State Highway numbered route, unknown ownership",0,0,1,1


In [17]:
pd.crosstab( df['SurfaceCondition'], df['Weather'], margins=True)

Weather,Clear,Cloudy,Freezing Precipitation,Rain,Unknown,Wind,All
SurfaceCondition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Dry,8579,2265,19,18,4214,9,15104
Ice,246,182,374,27,238,4,1071
Not Reported,9,2,0,0,20,0,31
Other - Explain in Narrative,28,14,10,2,36,1,91
"Sand, mud, dirt, oil, gravel",56,33,2,14,38,0,143
Slush,65,69,164,11,81,0,390
Snow,265,296,1722,8,716,30,3037
Unknown,49,25,8,4,264,0,350
Water (standing / moving),1,0,3,45,9,0,58
Wet,453,838,164,1176,823,6,3460


In [20]:
pd.crosstab(df['DIROFCOLLI'],df['Weather'], margins=True)

Weather,Clear,Cloudy,Freezing Precipitation,Rain,Unknown,Wind,All
DIROFCOLLI,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Head On,410,146,183,55,32,5,831
"Left Turn and Thru, Angle Broadside -->v--",432,137,41,48,21,0,679
"Left Turn and Thru, Broadside v<--",205,67,27,25,10,1,335
"Left Turn and Thru, Head On ^v--",70,25,6,6,3,0,110
"Left Turn and Thru, Same Direction Sideswipe/Angle Crash vv--",83,35,9,6,3,0,136
"Left Turns, Opposite Directions, Head On/Angle Crash --^v--",18,6,0,4,0,0,28
"Left Turns, Same Direciton, Rear End v--v--",9,2,1,2,0,0,14
"Left Turns, Same Direction, Rear End v--v--",11,4,2,1,0,0,18
"Left and Right Turns, Simultaneous Turn Crash --vv--",24,7,4,3,1,0,39
"No Turns, Thru moves only, Broadside ^<",722,259,146,86,33,3,1249


In [18]:
# Encoding categorical variables --- not sure if it's the correct way though. plase change it if you think it's not suitable!
fe=df.groupby('AOTROUTE').size()/len(df)
df.loc[:, 'AOTROUTE_freq_encode'] = df['AOTROUTE'].map(fe)

fe1=df.groupby('Animal').size()/len(df)
df.loc[:, 'Animal_freq_encode'] = df['Animal'].map(fe1)

fe2=df.groupby('CITYORTOWN').size()/len(df)
df.loc[:, 'CITYORTOWN_freq_encode'] = df['CITYORTOWN'].map(fe2)

fe3=df.groupby('DIROFCOLLI').size()/len(df)
df.loc[:, 'DIROFCOLLI_freq_encode'] = df['DIROFCOLLI'].map(fe3)

fe4=df.groupby('DayNight').size()/len(df)
df.loc[:, 'DayNight_freq_encode'] = df['DayNight'].map(fe4)

fe5=df.groupby('HOWMAPPED').size()/len(df)
df.loc[:, 'HOWMAPPED_freq_encode'] = df['HOWMAPPED'].map(fe5)

fe6=df.groupby('INTERSECTIONWITH').size()/len(df)
df.loc[:, 'INTERSECTIONWITH_freq_encode'] = df['INTERSECTIONWITH'].map(fe6)

fe7=df.groupby('Impairment').size()/len(df)
df.loc[:, 'Impairment_freq_encode'] = df['Impairment'].map(fe7)

fe8=df.groupby('InjuryType').size()/len(df)
df.loc[:, 'InjuryType_freq_encode'] = df['InjuryType'].map(fe8)

fe9=df.groupby('Involving').size()/len(df)
df.loc[:, 'Involving_freq_encode'] = df['Involving'].map(fe9)

fe10=df.groupby('LOC_ERROR').size()/len(df)
df.loc[:, 'LOC_ERROR_freq_encode'] = df['LOC_ERROR'].map(fe10)

fe11=df.groupby('NonReportableAddress').size()/len(df)
df.loc[:, 'NonReportableAddress_freq_encode'] = df['NonReportableAddress'].map(fe11)

fe12=df.groupby('REPORTINGAGENCY').size()/len(df)
df.loc[:, 'REPORTINGAGENCY_freq_encode'] = df['REPORTINGAGENCY'].map(fe12)

fe13=df.groupby('REPORTNUMBER').size()/len(df)
df.loc[:, 'REPORTNUMBER_freq_encode'] = df['REPORTNUMBER'].map(fe13)

fe14=df.groupby('RoadCharacteristics').size()/len(df)
df.loc[:, 'RoadCharacteristics_freq_encode'] = df['RoadCharacteristics'].map(fe14)

fe15=df.groupby('RoadCondition').size()/len(df)
df.loc[:, 'RoadCondition_freq_encode'] = df['RoadCondition'].map(fe15)

fe16=df.groupby('RoadGroup').size()/len(df)
df.loc[:, 'RoadGroup_freq_encode'] = df['RoadGroup'].map(fe16)

fe17=df.groupby('STREETADDRESS').size()/len(df)
df.loc[:, 'STREETADDRESS_freq_encode'] = df['STREETADDRESS'].map(fe17)

fe18=df.groupby('SurfaceCondition').size()/len(df)
df.loc[:, 'SurfaceCondition_freq_encode'] = df['SurfaceCondition'].map(fe18)

fe19=df.groupby('Weather').size()/len(df)
df.loc[:, 'Weather_freq_encode'] = df['Weather'].map(fe19)

df.head()

Unnamed: 0,ACCIDENTDATE,AOTACTUALMILEPOINT,AOTROUTE,Animal,CITYORTOWN,DIROFCOLLI,DayNight,HOWMAPPED,INTERSECTIONWITH,Impairment,...,LOC_ERROR_freq_encode,NonReportableAddress_freq_encode,REPORTINGAGENCY_freq_encode,REPORTNUMBER_freq_encode,RoadCharacteristics_freq_encode,RoadCondition_freq_encode,RoadGroup_freq_encode,STREETADDRESS_freq_encode,SurfaceCondition_freq_encode,Weather_freq_encode
60,2016-12-14T08:47:00.000Z,1.41,US-7,None/Other,South Burlington,Rear End,Day,LRS,Laurel Hill Dr,,...,0.664715,0.866189,0.047946,4.2e-05,0.396967,0.729724,0.412471,8.4e-05,0.63636,0.410828
61,2016-12-13T07:30:00.000Z,70.1,I-89,None/Other,Bolton,Same Direction Sideswipe,Day,LRS,Approximately 11 Miles South Of Exit 11,,...,0.664715,0.866189,0.057089,4.2e-05,0.396967,0.729724,0.412471,0.000463,0.045123,0.103897
62,2016-12-13T01:58:00.000Z,1.48,I-189,None/Other,South Burlington,Other - Explain in Narrative,Night,LRS,Interstate 89 North/Dorset Ramp,,...,0.664715,0.866189,0.047946,4.2e-05,0.007668,0.223678,0.412471,4.2e-05,0.127954,0.103897
63,2016-12-13T07:55:00.000Z,3.13,VT-116,None/Other,South Burlington,Rear End,Day,LRS,Unknown,,...,0.664715,0.866189,0.047946,4.2e-05,0.396967,0.223678,0.412471,4.2e-05,0.016431,0.410828
64,2016-12-05T08:00:00.000Z,0.42,VT-128,None/Other,Essex,Single Vehicle Crash,Day,LRS,MARION AVE,,...,0.664715,0.866189,0.046471,4.2e-05,0.396967,0.223678,0.412471,4.2e-05,0.127954,0.103897


### Machine Learning

Conduct the regression and time series analysis to provide the future prediction for the premiums.

# Business Questions

After understanding where the data comes from and what it includes, we are able to simplify our business questions down to the following: 

1. Which factors cause accidents?
2. Which factors best determine the outcome of an accident
3. Which roads are most prone to accidents, and how can we improve them?