# Accident Severity Predictability
### *Using a range of variables to predict severity of an accident*
##### Author: Elliot Eisenberg
##### *IBM Data Science Captstone*


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>
Many people suffer from an irrational fear of flying. They hear about sensationalized stories of planes crashing, and extrapolate that flying is of the most dangerous activities that one can undertake. That being the case, it is well documented that far and away the most dangerous form of travel is in a vehicle. The severity of an incident is dependent on many criteria, which have been fed into a decision tree machine learning algorithm. The original dataset has been cleaned and preprocessed to make it compatible with decision tree requirements. Through various levels of data analysis, relevant variables were deemed to be worthy of omission, due to their lack of impact on severity. The decision tree is, in fact, a very good predictor of severity. The R<sup>2</sup> = 0.752, the Jaccard score is 0.736, and the F1 Score is 0.715.  

This model is useless without an applicable audience. Emergency responders, when receiving a phone call telling them about an accident that occurred, can quickly feed the data into the model, to predict the level of severity of the accident that they may come upon at the scene. Additionally, determining what factors contribute to the severity of an accident in the most meaningful ways can aid legislators in determining appropriate measures to put in place (changing posted speed limit, varying the location of a cross-walk, etc.).

## Data <a name="data"></a>
The data used in this problem are the data provided through the course - the 'example' data. They certainly met the requirements for what constituted 'good data', and they were suitable to tackle the prompt, which was predicting accident severity. Uncleaned, the data were 194673 rows, and 38 columns. At first glance, this translates to 37 independent variables. Upon closer examination, two of the variables can and should immediately be dropped. They are the 'SEVERITYCODE', and the 'SEVERITYDESCRIPTION'. They are a carbon copy of the dependent variable, and as such, need to be dropped for this exercise to have any meaning. As for the rest of the data that was employed for this analysis, some variables were strings, while others were integers. Largely, the strings were categorical variables, so altering the variables into a numeric form was an exercise in encoding the data properly. The data contained certain key values that were unique to each entry. These values would not contribute to the accuracy of the model, since there is no predictability for a 1:1 variable. 



<div>
        
                <script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
        <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>    
            <div id="664f51d6-c1bb-4bf1-bf39-c25e64034d0b" class="plotly-graph-div" style="height:100%; width:100%;"></div>
            <script type="text/javascript">
                
                    window.PLOTLYENV=window.PLOTLYENV || {};
                    
                if (document.getElementById("664f51d6-c1bb-4bf1-bf39-c25e64034d0b")) {
                    Plotly.newPlot(
                        '664f51d6-c1bb-4bf1-bf39-c25e64034d0b',
                        [{"alignmentgroup": "True", "hovertemplate": "Classification of Incident=%{x}<br># of Occurrences=%{y}<extra></extra>", "legendgroup": "", "marker": {"color": "#636efa"}, "name": "", "offsetgroup": "", "orientation": "v", "showlegend": false, "textposition": "auto", "type": "bar", "x": ["Property Damage Only Collision", "Injury Collision"], "xaxis": "x", "y": [124634, 56432], "yaxis": "y"}],
                        {"barmode": "relative", "legend": {"tracegroupgap": 0}, "template": {"data": {"bar": [{"error_x": {"color": "#2a3f5f"}, "error_y": {"color": "#2a3f5f"}, "marker": {"line": {"color": "#E5ECF6", "width": 0.5}}, "type": "bar"}], "barpolar": [{"marker": {"line": {"color": "#E5ECF6", "width": 0.5}}, "type": "barpolar"}], "carpet": [{"aaxis": {"endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f"}, "baxis": {"endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f"}, "type": "carpet"}], "choropleth": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "type": "choropleth"}], "contour": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "contour"}], "contourcarpet": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "type": "contourcarpet"}], "heatmap": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "heatmap"}], "heatmapgl": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "heatmapgl"}], "histogram": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "histogram"}], "histogram2d": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "histogram2d"}], "histogram2dcontour": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "histogram2dcontour"}], "mesh3d": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "type": "mesh3d"}], "parcoords": [{"line": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "parcoords"}], "pie": [{"automargin": true, "type": "pie"}], "scatter": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scatter"}], "scatter3d": [{"line": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scatter3d"}], "scattercarpet": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scattercarpet"}], "scattergeo": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scattergeo"}], "scattergl": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scattergl"}], "scattermapbox": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scattermapbox"}], "scatterpolar": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scatterpolar"}], "scatterpolargl": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scatterpolargl"}], "scatterternary": [{"marker": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "type": "scatterternary"}], "surface": [{"colorbar": {"outlinewidth": 0, "ticks": ""}, "colorscale": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "type": "surface"}], "table": [{"cells": {"fill": {"color": "#EBF0F8"}, "line": {"color": "white"}}, "header": {"fill": {"color": "#C8D4E3"}, "line": {"color": "white"}}, "type": "table"}]}, "layout": {"annotationdefaults": {"arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1}, "coloraxis": {"colorbar": {"outlinewidth": 0, "ticks": ""}}, "colorscale": {"diverging": [[0, "#8e0152"], [0.1, "#c51b7d"], [0.2, "#de77ae"], [0.3, "#f1b6da"], [0.4, "#fde0ef"], [0.5, "#f7f7f7"], [0.6, "#e6f5d0"], [0.7, "#b8e186"], [0.8, "#7fbc41"], [0.9, "#4d9221"], [1, "#276419"]], "sequential": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]], "sequentialminus": [[0.0, "#0d0887"], [0.1111111111111111, "#46039f"], [0.2222222222222222, "#7201a8"], [0.3333333333333333, "#9c179e"], [0.4444444444444444, "#bd3786"], [0.5555555555555556, "#d8576b"], [0.6666666666666666, "#ed7953"], [0.7777777777777778, "#fb9f3a"], [0.8888888888888888, "#fdca26"], [1.0, "#f0f921"]]}, "colorway": ["#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52"], "font": {"color": "#2a3f5f"}, "geo": {"bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white"}, "hoverlabel": {"align": "left"}, "hovermode": "closest", "mapbox": {"style": "light"}, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": {"angularaxis": {"gridcolor": "white", "linecolor": "white", "ticks": ""}, "bgcolor": "#E5ECF6", "radialaxis": {"gridcolor": "white", "linecolor": "white", "ticks": ""}}, "scene": {"xaxis": {"backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white"}, "yaxis": {"backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white"}, "zaxis": {"backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white"}}, "shapedefaults": {"line": {"color": "#2a3f5f"}}, "ternary": {"aaxis": {"gridcolor": "white", "linecolor": "white", "ticks": ""}, "baxis": {"gridcolor": "white", "linecolor": "white", "ticks": ""}, "bgcolor": "#E5ECF6", "caxis": {"gridcolor": "white", "linecolor": "white", "ticks": ""}}, "title": {"x": 0.05}, "xaxis": {"automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": {"standoff": 15}, "zerolinecolor": "white", "zerolinewidth": 2}, "yaxis": {"automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": {"standoff": 15}, "zerolinecolor": "white", "zerolinewidth": 2}}}, "title": {"text": "Total Count of Each Collision Type"}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "Classification of Incident"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "# of Occurrences"}}},
                        {"responsive": true}
                    )
                };
                
            </script>
        </div>

## Methodology <a name="methodology"></a>
We eliminated columns that did not seem to have any bearing on the outcome; latitude/longitude coordinates were irrelevant, as well as the primary/secondary keys that each accident was uniquely assigned. Any component of the analysis that was unique to the incident and was not repeated with any level of meaning was dropped from the dataset. A good example of this is the date, since the date of the incident will bear very little meaning on the severity of the incident itself. The first 10 rows of the resulting dataframe are below.

|   | SEVERITYCODE | ADDRTYPE     | COLLISIONTYPE | PERSONCOUNT | PEDCOUNT | PEDCYLCOUNT | VEHCOUNT | JUNCTIONTYPE                               | INATTENTIONIND | UNDERINFL | WEATHER  | ROADCOND | LIGHTCOND                | PEDROWNOTGRNT | SPEEDING | ST\_COLDESC                                                            | CROSSWALKKEY | HITPARKEDCAR | HOUROFDAY |
|:---:|:------------:|:------------:|:-------------:|:-----------:|:--------:|:-----------:|:--------:|:------------------------------------------:|:--------------:|:---------:|:--------:|:--------:|:------------------------:|:-------------:|:--------:|:----------------------------------------------------------------------:|:------------:|:------------:|:---------:|
| 0 | 2            | Intersection | Angles        | 2           | 0        | 0           | 2        | At Intersection \(intersection related\)   | 0              | 0         | Overcast | Wet      | Daylight                 | 0             | 0        | Entering at angle                                                      | 0            | 0            | 14        |
| 1 | 1            | Block        | Sideswipe     | 2           | 0        | 0           | 2        | Mid\-Block \(not related to intersection\) | 0              | 0         | Raining  | Wet      | Dark \- Street Lights On | 0             | 0        | From same direction \- both going straight \- both moving \- sideswipe | 0            | 0            | 18        |
| 2 | 1            | Block        | Parked Car    | 4           | 0        | 0           | 3        | Mid\-Block \(not related to intersection\) | 0              | 0         | Overcast | Dry      | Daylight                 | 0             | 0        | One parked\-\-one moving                                               | 0            | 0            | 10        |
| 3 | 1            | Block        | Other         | 3           | 0        | 0           | 3        | Mid\-Block \(not related to intersection\) | 0              | 0         | Clear    | Dry      | Daylight                 | 0             | 0        | From same direction \- all others                                      | 0            | 0            | 9         |
| 4 | 2            | Intersection | Angles        | 2           | 0        | 0           | 2        | At Intersection \(intersection related\)   | 0              | 0         | Raining  | Wet      | Daylight                 | 0             | 0        | Entering at angle                                                      | 0            | 0            | 8         |
| 5 | 1            | Intersection | Angles        | 2           | 0        | 0           | 2        | At Intersection \(intersection related\)   | 0              | 0         | Clear    | Dry      | Daylight                 | 0             | 0        | Entering at angle                                                      | 0            | 0            | 17        |
| 6 | 1            | Intersection | Angles        | 2           | 0        | 0           | 2        | At Intersection \(intersection related\)   | 0              | 0         | Raining  | Wet      | Daylight                 | 0             | 0        | Entering at angle                                                      | 0            | 0            | 0         |
| 7 | 2            | Intersection | Cycles        | 3           | 0        | 1           | 1        | At Intersection \(intersection related\)   | 0              | 0         | Clear    | Dry      | Daylight                 | 0             | 0        | Vehicle Strikes Pedalcyclist                                           | 0            | 0            | 17        |
| 8 | 1            | Block        | Parked Car    | 2           | 0        | 0           | 2        | Mid\-Block \(not related to intersection\) | 0              | 0         | Clear    | Dry      | Daylight                 | 0             | 0        | One parked\-\-one moving                                               | 0            | 0            | 13        |
| 9 | 2            | Intersection | Angles        | 2           | 0        | 0           | 2        | At Intersection \(intersection related\)   | 0              | 0         | Clear    | Dry      | Daylight                 | 0             | 0        | Entering at angle                                                      | 0            | 0            | 15        |


Next, we identified categorical variables that used strings as their values. We chose to include the string version instead of the numeric version in order to create the appropriate list of references for our dataset, thereby making the analysis component much smoother. With the string versions included, and the numeric versions dropped, we then ran a quick for loop to apply integer values to the categorical variables, and replaced the strings with the new numbers. The first 10 rows of the resulting dataframe are below.

|   | SEVERITYCODE | ADDRTYPE | COLLISIONTYPE | PERSONCOUNT | PEDCOUNT | PEDCYLCOUNT | VEHCOUNT | JUNCTIONTYPE | INATTENTIONIND | UNDERINFL | WEATHER | ROADCOND | LIGHTCOND | PEDROWNOTGRNT | SPEEDING | ST\_COLDESC | CROSSWALKKEY | HITPARKEDCAR | HOUROFDAY |
|:---:|:------------:|:--------:|:-------------:|:-----------:|:--------:|:-----------:|:--------:|:------------:|:--------------:|:---------:|:-------:|:--------:|:---------:|:-------------:|:--------:|:-----------:|:------------:|:------------:|:---------:|
| 0 | 2            | 3        | 1             | 2           | 0        | 0           | 2        | 2            | 0              | 0         | 5       | 9        | 6         | 0             | 0        | 5           | 0            | 0            | 14        |
| 1 | 1            | 2        | 10            | 2           | 0        | 0           | 2        | 5            | 0              | 0         | 7       | 9        | 3         | 0             | 0        | 17          | 0            | 0            | 18        |
| 2 | 1            | 2        | 6             | 4           | 0        | 0           | 3        | 5            | 0              | 0         | 5       | 1        | 6         | 0             | 0        | 28          | 0            | 0            | 10        |
| 3 | 1            | 2        | 5             | 3           | 0        | 0           | 3        | 5            | 0              | 0         | 2       | 1        | 6         | 0             | 0        | 15          | 0            | 0            | 9         |
| 4 | 2            | 3        | 1             | 2           | 0        | 0           | 2        | 2            | 0              | 0         | 7       | 9        | 6         | 0             | 0        | 5           | 0            | 0            | 8         |
| 5 | 1            | 3        | 1             | 2           | 0        | 0           | 2        | 2            | 0              | 0         | 2       | 1        | 6         | 0             | 0        | 5           | 0            | 0            | 17        |
| 6 | 1            | 3        | 1             | 2           | 0        | 0           | 2        | 2            | 0              | 0         | 7       | 9        | 6         | 0             | 0        | 5           | 0            | 0            | 0         |
| 7 | 2            | 3        | 2             | 3           | 0        | 1           | 1        | 2            | 0              | 0         | 2       | 1        | 6         | 0             | 0        | 53          | 0            | 0            | 17        |
| 8 | 1            | 2        | 6             | 2           | 0        | 0           | 2        | 5            | 0              | 0         | 2       | 1        | 6         | 0             | 0        | 28          | 0            | 0            | 13        |
| 9 | 2            | 3        | 1             | 2           | 0        | 0           | 2        | 2            | 0              | 0         | 2       | 1        | 6         | 0             | 0        | 5           | 0            | 0            | 15        |


Finally, there were some columns in the dataset that has many "blank" values. We assumed that any blank value was a "null" value, and that the officer/data entrist simply neglected to fill in the information. The assumption, while potentially slightly flawed, is that the "importance" of a variable would have incentivized a data entrist to record in the affirmative, such as in the case of a driver under the influence. If the driver was indeed under the influence, then we can assume that the entrist would have included it in the data. If the driver was not, however, then the entrist may have simply neglected to record the information. Finally, we converted the timestamp provided into buckets by hour, since the approximate time of day might have an impact. While "LIGHTCONDITIONS" exists as an independent variable, we also account for driver/pedestrian tiredness by incorporating the hour in the dataset.  
Now, with our data in numerical integer form, we created a correlation matrix. While some correlation values seem small and otherwise insignificant, we retained them, as they still might add to our accuracy, and not detract.

|                | SEVERITYCODE            | ADDRTYPE                | COLLISIONTYPE            | PERSONCOUNT             | PEDCOUNT                | PEDCYLCOUNT             | VEHCOUNT                | JUNCTIONTYPE           | INATTENTIONIND          | UNDERINFL               | WEATHER                   | ROADCOND                 | LIGHTCOND                | PEDROWNOTGRNT           | SPEEDING                | ST\_COLDESC             | CROSSWALKKEY              | HITPARKEDCAR             | HOUROFDAY               |
|:--------------:|:-----------------------:|:-----------------------:|:------------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:----------------------:|:-----------------------:|:-----------------------:|:-------------------------:|:------------------------:|:------------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:-------------------------:|:------------------------:|:-----------------------:|
| SEVERITYCODE   | 1\.0                    | 0\.191200646766418      | \-0\.12629256825735      | 0\.12379246575110793    | 0\.244163900446319      | 0\.2134774608210875     | \-0\.07984822946977412  | \-0\.19874039647213307 | 0\.040399595956430646   | 0\.039705444885795554   | \-0\.08424842187479553    | \-0\.033417077721198146  | \-0\.036926267020523235  | 0\.2060378176787998     | 0\.033914198023406464   | 0\.09901349726726746    | 0\.17277719449069706      | \-0\.08711986771957039   | 0\.031488738137615435   |
| ADDRTYPE       | 0\.191200646766418      | 1\.0                    | \-0\.4822318157759276    | 0\.05958720354820206    | 0\.14321708344275758    | 0\.08247962443318457    | \-0\.09004302811291137  | \-0\.9191403322156457  | \-0\.08347119862248925  | \-0\.04752690913328248  | \-0\.0699453083941407     | \-0\.018597949472804083  | \-0\.0333250343970173    | 0\.15525794660712133    | \-0\.0650890115038366   | \-0\.16827540916206726  | 0\.17612028011801878      | \-0\.11450095080352318   | 0\.041482077422204174   |
| COLLISIONTYPE  | \-0\.12629256825735     | \-0\.4822318157759276   | 1\.0                     | 0\.015784686253438394   | 0\.09346251331874447    | \-0\.21199573418518328  | 0\.1049747575303049     | 0\.4825098141968341    | 0\.1227792574604948     | 0\.005498178416061656   | 0\.01896291802109589      | \-0\.0066162352187536045 | 0\.02514084102796157     | \-0\.02056410730790035  | \-0\.002296102906710027 | 0\.3612381469742593     | 0\.03351675158115731      | 0\.03265821398902281     | \-0\.005693643078088916 |
| PERSONCOUNT    | 0\.12379246575110793    | 0\.05958720354820206    | 0\.015784686253438394    | 1\.0                    | \-0\.026629137491569872 | \-0\.04253360710592638  | 0\.3997145789604048     | \-0\.06982993669783019 | 0\.07111097372187056    | 0\.018098356006780357   | \-0\.050895353905702274   | \-0\.023666678300028936  | \-0\.027322533123391755  | \-0\.0317311819395436   | \-0\.007834937640564714 | \-0\.06791628942149443  | \-0\.0343626933552312     | \-0\.042440921313604225  | 0\.030907130774080496   |
| PEDCOUNT       | 0\.244163900446319      | 0\.14321708344275758    | 0\.09346251331874447     | \-0\.026629137491569872 | 1\.0                    | \-0\.018562422296751712 | \-0\.3159813080781344   | \-0\.13042377875322045 | \-0\.00824048739543068  | 0\.014795290624500298   | \-0\.004350669234352103   | 0\.009656789590245834    | \-0\.03513492722469803   | 0\.49680112404867877    | \-0\.035002748077391926 | 0\.564381728584372      | 0\.5687356938564399       | \-0\.031187152261803314  | 0\.025824545704866128   |
| PEDCYLCOUNT    | 0\.2134774608210875     | 0\.08247962443318457    | \-0\.21199573418518328   | \-0\.04253360710592638  | \-0\.018562422296751712 | 1\.0                    | \-0\.30628233951525935  | \-0\.08759978259049958 | 0\.0010435298795950453  | \-0\.018474512077028594 | \-0\.05005865862758052    | \-0\.04735656190181774   | 0\.01900101312785524     | 0\.325584760406876      | \-0\.022378367494947177 | 0\.3574008890795979     | 0\.10944367683271501      | \-0\.02737924224457384   | 0\.022931483907633766   |
| VEHCOUNT       | \-0\.07984822946977412  | \-0\.09004302811291137  | 0\.1049747575303049      | 0\.3997145789604048     | \-0\.3159813080781344   | \-0\.30628233951525935  | 1\.0                    | 0\.08832817729915725   | 0\.051240249606404546   | \-0\.011347191500401976 | \-0\.012245693292843581   | \-0\.01782467971379741   | 0\.03469720009345008     | \-0\.27755613718948235  | \-0\.04884531537137844  | \-0\.21636766574297278  | \-0\.2368498779804141     | 0\.07398725232402145     | 0\.010128772752304952   |
| JUNCTIONTYPE   | \-0\.19874039647213307  | \-0\.9191403322156457   | 0\.4825098141968341      | \-0\.06982993669783019  | \-0\.13042377875322045  | \-0\.08759978259049958  | 0\.08832817729915725    | 1\.0                   | 0\.07203452187134804    | 0\.057061095380626925   | 0\.0813280352239832       | 0\.02530218480099429     | 0\.026614521543805367    | \-0\.1536238234695014   | 0\.0671512645965795     | 0\.17263998909751907    | \-0\.16008351671440715    | 0\.13785982208173317     | \-0\.03351397066679152  |
| INATTENTIONIND | 0\.040399595956430646   | \-0\.08347119862248925  | 0\.1227792574604948      | 0\.07111097372187056    | \-0\.00824048739543068  | 0\.0010435298795950453  | 0\.051240249606404546   | 0\.07203452187134804   | 1\.0                    | \-0\.030593043257807404 | \-0\.07454778277965134    | \-0\.05076241391734914   | 0\.011342683156567841    | \-0\.03037958512378161  | \-0\.05407099433131972  | 0\.024330089153978784   | \-0\.004677203135779789   | 0\.019401397341631755    | 0\.026331791131673277   |
| UNDERINFL      | 0\.039705444885795554   | \-0\.04752690913328248  | 0\.005498178416061656    | 0\.018098356006780357   | 0\.014795290624500298   | \-0\.018474512077028594 | \-0\.011347191500401976 | 0\.057061095380626925  | \-0\.030593043257807404 | 1\.0                    | \-0\.03386066762294529    | \-0\.007447157369636397  | \-0\.21866003683225044   | \-0\.01946954900072346  | 0\.09049474290450867    | \-0\.006102104604221878 | \-0\.01066060496094127    | 0\.02289330771878172     | \-0\.0307346802888702   |
| WEATHER        | \-0\.08424842187479553  | \-0\.0699453083941407   | 0\.01896291802109589     | \-0\.050895353905702274 | \-0\.004350669234352103 | \-0\.05005865862758052  | \-0\.012245693292843581 | 0\.0813280352239832    | \-0\.07454778277965134  | \-0\.03386066762294529  | 1\.0                      | 0\.7495293872324161      | 0\.14044759894787245     | \-0\.009420441310260593 | 0\.05184109043738106    | 0\.035161019377579254   | \-0\.00025009717282924827 | 0\.017033812168909437    | \-0\.027889277001772128 |
| ROADCOND       | \-0\.033417077721198146 | \-0\.018597949472804083 | \-0\.0066162352187536045 | \-0\.023666678300028936 | 0\.009656789590245834   | \-0\.04735656190181774  | \-0\.01782467971379741  | 0\.02530218480099429   | \-0\.05076241391734914  | \-0\.007447157369636397 | 0\.7495293872324161       | 1\.0                     | \-0\.017991334921690184  | 0\.0021797645169019334  | 0\.09576704704398536    | \-0\.017557061223713078 | 0\.01171432871998508      | \-0\.0031300181454154818 | \-0\.023917384880256548 |
| LIGHTCOND      | \-0\.036926267020523235 | \-0\.0333250343970173   | 0\.02514084102796157     | \-0\.027322533123391755 | \-0\.03513492722469803  | 0\.01900101312785524    | 0\.03469720009345008    | 0\.026614521543805367  | 0\.011342683156567841   | \-0\.21866003683225044  | 0\.14044759894787245      | \-0\.017991334921690184  | 1\.0                     | \-0\.009389154719871761 | \-0\.09700857834713762  | 0\.044615456365283995   | \-0\.01875448934336294    | \-0\.0012158550575295099 | \-0\.04215475056877722  |
| PEDROWNOTGRNT  | 0\.2060378176787998     | 0\.15525794660712133    | \-0\.02056410730790035   | \-0\.0317311819395436   | 0\.49680112404867877    | 0\.325584760406876      | \-0\.27755613718948235  | \-0\.1536238234695014  | \-0\.03037958512378161  | \-0\.01946954900072346  | \-0\.009420441310260593   | 0\.0021797645169019334   | \-0\.009389154719871761  | 1\.0                    | \-0\.030461616071447276 | 0\.44081535087673757    | 0\.45320630016005997      | \-0\.02778488691775295   | 0\.013348265602893396   |
| SPEEDING       | 0\.033914198023406464   | \-0\.0650890115038366   | \-0\.002296102906710027  | \-0\.007834937640564714 | \-0\.035002748077391926 | \-0\.022378367494947177 | \-0\.04884531537137844  | 0\.0671512645965795    | \-0\.05407099433131972  | 0\.09049474290450867    | 0\.05184109043738106      | 0\.09576704704398536     | \-0\.09700857834713762   | \-0\.030461616071447276 | 1\.0                    | \-0\.08314072979586946  | \-0\.02674632688274288    | \-0\.022223506561497033  | \-0\.03391851628723704  |
| ST\_COLDESC    | 0\.09901349726726746    | \-0\.16827540916206726  | 0\.3612381469742593      | \-0\.06791628942149443  | 0\.564381728584372      | 0\.3574008890795979     | \-0\.21636766574297278  | 0\.17263998909751907   | 0\.024330089153978784   | \-0\.006102104604221878 | 0\.035161019377579254     | \-0\.017557061223713078  | 0\.044615456365283995    | 0\.44081535087673757    | \-0\.08314072979586946  | 1\.0                    | 0\.4043238522786571       | 0\.10587445208081782     | 0\.005502867244037324   |
| CROSSWALKKEY   | 0\.17277719449069706    | 0\.17612028011801878    | 0\.03351675158115731     | \-0\.0343626933552312   | 0\.5687356938564399     | 0\.10944367683271501    | \-0\.2368498779804141   | \-0\.16008351671440715 | \-0\.004677203135779789 | \-0\.01066060496094127  | \-0\.00025009717282924827 | 0\.01171432871998508     | \-0\.01875448934336294   | 0\.45320630016005997    | \-0\.02674632688274288  | 0\.4043238522786571     | 1\.0                      | \-0\.0236443678769799    | 0\.029859614613145843   |
| HITPARKEDCAR   | \-0\.08711986771957039  | \-0\.11450095080352318  | 0\.03265821398902281     | \-0\.042440921313604225 | \-0\.031187152261803314 | \-0\.02737924224457384  | 0\.07398725232402145    | 0\.13785982208173317   | 0\.019401397341631755   | 0\.02289330771878172    | 0\.017033812168909437     | \-0\.0031300181454154818 | \-0\.0012158550575295099 | \-0\.02778488691775295  | \-0\.022223506561497033 | 0\.10587445208081782    | \-0\.0236443678769799     | 1\.0                     | 0\.031115132474982665   |
| HOUROFDAY      | 0\.031488738137615435   | 0\.041482077422204174   | \-0\.005693643078088916  | 0\.030907130774080496   | 0\.025824545704866128   | 0\.022931483907633766   | 0\.010128772752304952   | \-0\.03351397066679152 | 0\.026331791131673277   | \-0\.0307346802888702   | \-0\.027889277001772128   | \-0\.023917384880256548  | \-0\.04215475056877722   | 0\.013348265602893396   | \-0\.03391851628723704  | 0\.005502867244037324   | 0\.029859614613145843     | 0\.031115132474982665    | 1\.0                    |


It is important to recognize that any variable that is categorical and unordered is useless for the correlation matrix. It is useful for the decision tree algorithm, but is misleading when it comes to correlation. Therefore, we can dismiss any correlative effects that one might have inferred from the correlation.  

Naturally, as has already been discussed, we are going to use a Classification algorithm to achieve our goal. Decision tree was chosen, as there are many categorical variables that present the 'fork in the road' question; based on the criteria, do I go 'left' or do I go 'right' (or anything in between that might be relevant). In this case, we are attempting to settle on the determination of '1' or '2', to indicate the severity of the incident from a damages perspective.  

The decision tree model was settled on, and performed well. Through the implementation of an automated check, we were able to determine that a maximum 12 branch tree was most robust

## Results and Discussion <a name="results"></a>
The decision tree model performed very well as a predictor of severity of accidents.  
The various relevant scores to portray accuracy are shown in the table below.  

|        | R\_Squared          | Jaccard\_Score      | F1\_Score           |
|:------:|:-------------------:|:-------------------:|:-------------------:|
| Score: | 0\.7518722247995229 | 0\.7366514237744936 | 0\.7156395805949464 |


Due to the relatively high level of accuracy, this model can be exported and utilized in field. Emergency responders can easily gather data over the phone, quickly feed as many relevant parameters into it as possible, and dispatch an appropriate number of personnel to handle the incident.

## Conclusion <a name="conclusion"></a>

We analyzed data provided to us by IBM/Coursera to determine if we could predict the severity of an accident. The R<sup>2</sup> is 0.752, the Jaccard score is 0.736, and the F1 Score is 0.715. These are relatively high scores, indicating that it would seem to be possible to meaningfully predict the severity of an incident based on the given criteria.  

Regarding further analysis, additional work can be done with similar datasets. Giving responders the ability to predict likelihood of an injury given an accident allows them to adequately staff emergency vehicles.  

This total dataset was missing some parts, since further investigation revealed that 'no damage incidents' and 'fatalities' were excluded. These are key factors for drawing meaningful conclusions, since part of the value of this process is to allow emergency personnel to make decisions in preparation for severity of accidents. As a quick example, if there is a time of day that has high predictability for 'no damage incidents', emergency responders can either adjust the number of staff required or place rookie/green personnel on that shift. If, on the other hand, the time of day has high predictability for 'fatalities', the the number of personnel can be commensurate to the risk associated with that time of the day.  

This analysis is certainly valuable for emergency responders, and can be fairly easily deployed in real life scenarios.