# Capstone Project -  Capstone Project - Car accident severity (Week 1)

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [References](#notes)

## Introduction: Business Problem <a name="introduction"></a>

According to the WHO ([1](#notes)), every year around 1.35 million people die because of a road traffic crash and between 20 and 50 million more people suffer non-fatal injuries with many incurring a disability.

Considering the economic impact at national level, road traffic accidents also cost around 3% of gross domestic product to most countries ([2](#notes)).

Therefore, it would be great if it could be possible to warn a driver about the possibility of getting into a car accident and how severe the accident would be, given the weather and road conditions. In this way people would drive more carefully or even change their travel if they are able to. 

Most of all, this could actually save lives.

This is exactly what we will try to do in this project. We will use the data provided by Seattle city, take in consideration the specific road and weather conditions and we will try to predict the likelihood and severity of an accident.

## Data <a name="data"></a>

### The dataset

The dataset from Seattle city shows all the collisions (i.e. 194.673 in total) provided by SPD (Seattle Police Department) and recorded by the Traffic Records. It includes all types of collisions from 2004 to present involving cars, bikes and pedestrians.

Thanks to the description of the data provided by Seattle city, we have been able to define what - in our opinion - are the best independent variables and their dependent variable.

### Dependent variable

The scope of the project is to predict the likelihood and severity of an accident. Therefore, it becomes obvious that we use SEVERITYCODE (i.e. the severity of the accident) as the dependent variable. SEVERITYCODE is a categorical variable and follows a code that corresponds to the severity of the collision:
* 3 — fatality
* 2b — serious injury
* 2 — injury
* 1 — property damage
* 0 — unknown

However, in our database the severity of collision are circumscribed to two results that are 2 (injury) and 1 (property damage).

### Independent variables

Out of the 37 attributes available in Seattle accident database, we will consider 8 of them as independent variables.

| VARIABLE | DESCRIPTION |
| :-:  | :-:  |
|LOCATION |Description of the general location of the collision |
| PERSONCOUNT | The total number of people involved in the collision |
| VEHCOUNT | The number of vehicles involved in the collision. This is entered by the state|
| JUNCTIONTYPE |Category of junction at which collision took place |
| WEATHER | A description of the weather conditions during the time of the collision|
| ROADCOND | The condition of the road during the collision|
| LIGHTCOND | The light conditions during the collision|
| SPEEDING | Whether or not speeding was a factor in the collision. (Y/N)|


 type of machine learning are you going to use for predicting accident severity: supervised
 
 37 attributes in Seattle accident data
 
 total number of empty inputs in " ROADCOND"
 
 unbalanced labeled dataset 

## Methodology <a name="methodology"></a>

In [3]:
import itertools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import NullFormatter
import pandas as pd
import numpy as np
import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

In [6]:
!wget -O /Users/carlopeano/Desktop/projects/Coursera_Capstone/Data_Collisions.csv https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv 

--2020-09-23 16:50:51--  https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv
Resolving s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)... 67.228.254.196
Connecting to s3.us.cloud-object-storage.appdomain.cloud (s3.us.cloud-object-storage.appdomain.cloud)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73917638 (70M) [text/csv]
Saving to: ‘/Users/carlopeano/Desktop/projects/Coursera_Capstone/Data_Collisions.csv’


2020-09-23 16:52:55 (587 KB/s) - ‘/Users/carlopeano/Desktop/projects/Coursera_Capstone/Data_Collisions.csv’ saved [73917638/73917638]



In [10]:
df = pd.read_csv('/Users/carlopeano/Desktop/projects/Coursera_Capstone/Data_Collisions.csv')
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


In [13]:
df.describe()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,INTKEY,SEVERITYCODE.1,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,SDOT_COLCODE,SDOTCOLNUM,SEGLANEKEY,CROSSWALKKEY
count,194673.0,189339.0,189339.0,194673.0,194673.0,194673.0,65070.0,194673.0,194673.0,194673.0,194673.0,194673.0,194673.0,114936.0,194673.0,194673.0
mean,1.298901,-122.330518,47.619543,108479.36493,141091.45635,141298.811381,37558.450576,1.298901,2.444427,0.037139,0.028391,1.92078,13.867768,7972521.0,269.401114,9782.452
std,0.457778,0.029976,0.056157,62649.722558,86634.402737,86986.54211,51745.990273,0.457778,1.345929,0.19815,0.167413,0.631047,6.868755,2553533.0,3315.776055,72269.26
min,1.0,-122.419091,47.495573,1.0,1001.0,1001.0,23807.0,1.0,0.0,0.0,0.0,0.0,0.0,1007024.0,0.0,0.0
25%,1.0,-122.348673,47.575956,54267.0,70383.0,70383.0,28667.0,1.0,2.0,0.0,0.0,2.0,11.0,6040015.0,0.0,0.0
50%,1.0,-122.330224,47.615369,106912.0,123363.0,123363.0,29973.0,1.0,2.0,0.0,0.0,2.0,13.0,8023022.0,0.0,0.0
75%,2.0,-122.311937,47.663664,162272.0,203319.0,203459.0,33973.0,2.0,3.0,0.0,0.0,2.0,14.0,10155010.0,0.0,0.0
max,2.0,-122.238949,47.734142,219547.0,331454.0,332954.0,757580.0,2.0,81.0,6.0,2.0,12.0,69.0,13072020.0,525241.0,5239700.0


## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>

## References <a name="notes"></a>

1. "Road traffic injuries", World Health Organisation (WHO), 07/02/2020, https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries
2. Ibidem
3. "ArcGIS Metadata Form", https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf