# Seattle - Car Collisions 

## Business Problem and Background

Seattle  is a seaport city on the West Coast of the United States and the largest city in both the state of Washington and the Pacific Northwest region of North America.
The Seattle metropolitan area's population stands at 3.98 million, making it the 15th-largest in the United States.
Seattle is located about 100 miles (160 km) south of the Canadian border. 

A major gateway for trade with Asia, Seattle is the fourth-largest port in North America in terms of container handling as of 2015.
The purpose of this case is to understand the impact of different factors in  vehicle accidents in Seattle from to .  
Weather, light conditions, and road conditions are factors that can increase the incidence of car accidents. 
Finding patterns on vehicle collisions can help to avoid accidents by warning drivers, transport authorities, and police.

## Data Description and Methodology

The vehicle collisions (Data-Collisions.csv) was provided by the Seattle Police Department (SPD). It includes all types of collisions. Collisions will display at the intersection or mid -block of a segment.

Data-Collisions consists of 38 columns: 37 independent variables, 1 dependent variable (SeverityCode), and 194,673 rows. 
Severity codes are:

- 1: Property Damage Only Collision
- 2: Injury Collision

Seattle longitude is -122.335167 (column X) and its latitude is 47.608013 (column Y).
Python will be used to analyze and visualize data.

#### Columns to be used in the analysis:
- SEVERITYCODE    194673 non-null  Code that corresponds to the severity of the collision
- X               189339 non-null  Longitude
- Y               189339 non-null  Latitude
- STATUS          194673 non-null
- ADDRTYPE        192747 non-null  Block, Intersection 
- LOCATION        191996 non-null  Address 
- SEVERITYDESC    194673 non-null  Property Damage Only Collision, Injury Collision
- COLLISIONTYPE   189769 non-null  Sideswipe, Parked Car, Other, Head On, Rear Ended, etc
- PERSONCOUNT     194673 non-null  Total number of people involved in the collision
- PEDCOUNT        194673 non-null  Number of pediestrians involved in the collision
- PEDCYLCOUNT     194673 non-null  Number of bicycles involved in the collision  
- VEHCOUNT        194673 non-null  Number of vehicles involved in the collision
- INCDATE         194673 non-null  Date of collision
- INCDTTM         194673 non-null  Date and time of collision
- JUNCTIONTYPE    188344 non-null  Category of junction at which collision took place
- SDOT_COLCODE    194673 non-null  Code Giving to the collision
- SDOT_COLDESC    194673 non-null  Description of the collision
- INATTENTIONIND  29805 non-null   Whether or not collision was due to inattention (Y/N)
- UNDERINFL       189789 non-null  Whether or not a driver involved was under the influence of drugs or alcohol
- WEATHER         189592 non-null  Description of the weather conditions during the time of the collision
- ROADCOND        189661 non-null  Condition of the road during the collision 
- LIGHTCOND       189503 non-null  Light conditions during the collision
- SDOTCOLNUM      114936 non-null  Number given to the collision
- SPEEDING        9333 non-null    Whether or not speeding was a factor in the collision (Y/N)
- ST_COLCODE      194655 non-null  Code provided by the state that describe the collision
- ST_COLDESC      189769 non-null  Description that corresponds to the coding designation of the state


Steps :
1. Clean the data. Looking for duplicates. Split the file into two sets: one set with geo-coordinates, the other does not have any.
2. Explore data data using visualizations to see trends in accidents by location and year.
3. Determine the best model to predict future accidentes taking into account variables such as weather, road conditions, light conditions, etc.

In [5]:
# Import data
import pandas as pd
df_collision = pd.read_csv("Data-Collisions.csv", low_memory=False)
df_collision.head(3)

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N


In [6]:
df_collision.info()
# Columns: 38
# There are 194673 SEVERITYCODE, but only 189339 X and Y.
# So, the file will be split in two

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194673 entries, 0 to 194672
Data columns (total 38 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   SEVERITYCODE    194673 non-null  int64  
 1   X               189339 non-null  float64
 2   Y               189339 non-null  float64
 3   OBJECTID        194673 non-null  int64  
 4   INCKEY          194673 non-null  int64  
 5   COLDETKEY       194673 non-null  int64  
 6   REPORTNO        194673 non-null  object 
 7   STATUS          194673 non-null  object 
 8   ADDRTYPE        192747 non-null  object 
 9   INTKEY          65070 non-null   float64
 10  LOCATION        191996 non-null  object 
 11  EXCEPTRSNCODE   84811 non-null   object 
 12  EXCEPTRSNDESC   5638 non-null    object 
 13  SEVERITYCODE.1  194673 non-null  int64  
 14  SEVERITYDESC    194673 non-null  object 
 15  COLLISIONTYPE   189769 non-null  object 
 16  PERSONCOUNT     194673 non-null  int64  
 17  PEDCOUNT  

In [7]:
df_collision.describe()

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,INTKEY,SEVERITYCODE.1,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,SDOT_COLCODE,SDOTCOLNUM,SEGLANEKEY,CROSSWALKKEY
count,194673.0,189339.0,189339.0,194673.0,194673.0,194673.0,65070.0,194673.0,194673.0,194673.0,194673.0,194673.0,194673.0,114936.0,194673.0,194673.0
mean,1.298901,-122.330518,47.619543,108479.36493,141091.45635,141298.811381,37558.450576,1.298901,2.444427,0.037139,0.028391,1.92078,13.867768,7972521.0,269.401114,9782.452
std,0.457778,0.029976,0.056157,62649.722558,86634.402737,86986.54211,51745.990273,0.457778,1.345929,0.19815,0.167413,0.631047,6.868755,2553533.0,3315.776055,72269.26
min,1.0,-122.419091,47.495573,1.0,1001.0,1001.0,23807.0,1.0,0.0,0.0,0.0,0.0,0.0,1007024.0,0.0,0.0
25%,1.0,-122.348673,47.575956,54267.0,70383.0,70383.0,28667.0,1.0,2.0,0.0,0.0,2.0,11.0,6040015.0,0.0,0.0
50%,1.0,-122.330224,47.615369,106912.0,123363.0,123363.0,29973.0,1.0,2.0,0.0,0.0,2.0,13.0,8023022.0,0.0,0.0
75%,2.0,-122.311937,47.663664,162272.0,203319.0,203459.0,33973.0,2.0,3.0,0.0,0.0,2.0,14.0,10155010.0,0.0,0.0
max,2.0,-122.238949,47.734142,219547.0,331454.0,332954.0,757580.0,2.0,81.0,6.0,2.0,12.0,69.0,13072020.0,525241.0,5239700.0
