## Severity in the Community | Business Problem | Introduction

### Introduction:

The purpose of this project is to predict the severity of an accident given various conditions based on a metadata set. This is for people looking to improve awareness of car accidents in their area based on data to their region. The data in this set will be used to train the program to better predict car accidents given weather conditions, road conditions, and other factors associated with accidents.

This Project aims to create a comparative analysis between accidents. The features include accident severity analyzing particular wreck descriptions, such as: area, road connectivity, weather conditions, 
and type of collision. 

The project aims to help people gain awareness of a destination area before traveling or to gain awareness of their current areas. It also aims to help existing transportation commissions improve awareness of their operating areas and help implementation of future accident precautions.

### Business Problem to Solve:

The major purpose of this project is to analyze existing car accident reports to perform the following:
1. Analyze current weather and road conditions and place it against the existing data for the area.
2. Worn the user of a car accident severity based on the conditions and display areas to potentially avoid. 

## Data:

### Data Description:

This project will utilize Jupyter Notebooks to analyze a metadata set containing a rating of accident severity, street location, collision address type, weather condition, road condition, vehicle count, injuries, fatalities, and whether the driver at fault was under the influence. 

The dataset we will use in this project is the shared data originally provided by Seattle Department of Transportation(SDOT) Traffic Management Division, Traffic Records Group, and modified to particularly meet the project criteria. This is the head of the data:

In [22]:
import os
import pandas as pd

In [21]:
df = pd.read_csv('Data.csv', low_memory= False)
df.head()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2.0,-122.323148,47.70314,1.0,1307.0,1307.0,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0.0,0.0,N
1,1.0,-122.347294,47.647172,2.0,52200.0,52200.0,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0.0,0.0,N
2,1.0,-122.33454,47.607871,3.0,26700.0,26700.0,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0.0,0.0,N
3,1.0,-122.334803,47.604803,4.0,1144.0,1144.0,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0.0,0.0,N
4,2.0,-122.306426,47.545739,5.0,17700.0,17700.0,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0.0,0.0,N


The entire dataset originally had 194,673 observations(rows) and 38 attributes(columns). The metadata of the dataset can be found in [here](https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf). To better suit the applications of the project, certain attributes and observations will be omitted to balance the cases. Balancing the data will help create an unbiased model.

The ommited columns are the attributes that will be used to train the predictive model and try to predict the possible areas of accidents and possible severity. Using processes associated with feature engineering, we can improve the predictability of the model. We will explain this in the Methodology section.

For a better understanding of Severity and how we will begin to build the model, let's look at the table below. Severity is ranked 0-3 with severity being the first (and important) category that we will use to cluster the data. We will also utilize location, weather, and time of accident to build a more precise model.

In [33]:
from prettytable import PrettyTable
import numpy as np
t= PrettyTable()
t.field_names = ["Severity Code","Description","Count"]
t.add_row([3,"Fatality",0])
t.add_row([2,"Injury",136485])
t.add_row([1,"Property Damage",58188])
t.add_row([0,"Unknown",0])

print(t)

+---------------+-----------------+--------+
| Severity Code |   Description   | Count  |
+---------------+-----------------+--------+
|       3       |     Fatality    |   0    |
|       2       |      Injury     | 136485 |
|       1       | Property Damage | 58188  |
|       0       |     Unknown     |   0    |
+---------------+-----------------+--------+


### Foursquare API:

This project will incorporate Foursquare's API, an in-depth database of locations, to analyze locations and pinpoint hot spots during certain times and/or weather conditions and cluster them.  

### Data Approach:

To help analyze the data, a clustering approach will be used to group together elements of the data to help better predict what conditions can influence a stronger possibility of being in a car accident or to help show certain areas to avoid. To be able to do that, clustering the data into an unsupervised, machine learning: k-means clustering algorithm.

### Libraries to be Used:

_Pandas_: For creating and manipulating dataframes

_Folium_: Python visualization library 

_Scikit Learn_: For importing k-means clustering

_JSON_: Library to handle JSON files

_Matplotlib_: Python Plotting Module