## NYPD Motor Vehicle Collision Data

### Overview

The Motor Vehicle Collisions Crash table contains details on the crash events. Each row represents a crash event. The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The dataset can be found by following this link: https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions-Crashes/h9gi-nx95

### High-Level Description

The data dates from 2012 to the current day, with data being updated on a daily basis. At the time of this writing, there are 1.59 million rows, each row representing a crash event, and 29 columns which represent date, time, borough, zip code, latitude, longitude, location, on and off street name, cross street name, number of persons injured, number of persons killed, number of pedestrians injured, number of pedestrians killed, number of cyclist injured, number of cyclist killed, number of motorist injured, number of motorist killed, contributing factors, vehicle type codes and collision ID.

### Bring in the data

I will begin by bringing in only 2000000 rows.

In [None]:
import pandas as pd
import numpy as np
import datetime as d
datanyc = pd.read_csv("https://data.cityofnewyork.us/resource/h9gi-nx95.csv?$limit=2000000")

Let's take a peek at what the data looks like.

In [None]:
pd.set_option('display.max_columns', 50)
datanyc.head()

In [None]:
datanyc.shape

There is around 347061 rows and 29 columns. Let's do some data cleaning. I will begin by transforming the string timestamp for `accident_date` to a true datetime data type.

In [None]:
datanyc['accident_date'] = pd.to_datetime(datanyc['accident_date'])
pd.set_option('display.max_columns', 50)
datanyc.head()

And let's remove "unspecified" values in the `contributing_factor_vehicle_1` column since I will use it in my visualizations. I will also change the `number_of_persons_injured` and `number_of_persons_killed` values from float to integer.

In [None]:
indexNames = datanyc[datanyc['contributing_factor_vehicle_1'] == 'Unspecified' ].index
datanyc.drop(indexNames , inplace=True)

In [None]:
# First, I will first get rid of any missing values
datanyc.dropna(subset = ['number_of_persons_injured'], how='all', inplace=True)
datanyc.dropna(subset = ['number_of_persons_killed'], how='all', inplace=True)

datanyc['number_of_persons_injured'] = datanyc.number_of_persons_injured.astype(int)
datanyc['number_of_persons_killed'] = datanyc.number_of_persons_killed.astype(int)
datanyc.head()

Let's see the unique values in the `contributing_factor_vehicle_1` column.

In [None]:
unique_contirbuting_factors = datanyc['contributing_factor_vehicle_1'].unique()
unique_contirbuting_factors

A lot of unique values... Let's do some combining to make our job easy.

In [None]:
datanyc['contributing_factor_vehicle_1'].replace({'Backing Unsafely': 'Driver Inexperience', 
                                                  'Unsafe Speed': 'Driver Inexperience', 
                                                 'Passing or Lane Usage Improper': 'Driver Inexperience',
                                                 'Turning Improperly': 'Driver Inexperience',
                                                 'Following Too Closely': 'Driver Inexperience',
                                                 'Passing Too Closely' : 'Driver Inexperience',
                                                 'Outside Car Distraction': 'Driver Inexperience',
                                                 'Steering Failure': 'Driver Inexperience',
                                                 'Reaction to Uninvolved Vehicle': 'Driver Inexperience',
                                                 'Traffic Control Disregarded': 'Driver Inexperience',
                                                 'Failure to Yield Right-of-Way': 'Driver Inexperience',
                                                 'Aggressive Driving/Road Rage': 'Driver Inexperience',
                                                 'Unsafe Lane Changing': 'Driver Inexperience',
                                                  
                                                 'Passenger Distraction': 'Driver Inattention/Distraction',
                                                 'Failure to Keep Right': 'Driver Inattention/Distraction',
                                                 'Eating or Drinking': 'Driver Inattention/Distraction',
                                                 'Animals Action': 'Driver Inattention/Distraction',
                                                 'Using On Board Navigation Device': 'Driver Inattention/Distraction',
                                                 'Reaction to Other Uninvolved Vehicle': 'Driver Inattention/Distraction',
                                                 'Cell Phone (hands-free)': 'Driver Inattention/Distraction',
                                                 'Cell Phone (hand-Held)': 'Driver Inattention/Distraction',
                                                 'Other Electronic Device': 'Driver Inattention/Distraction',
                                                 'Cell Phone (hand-held)': 'Driver Inattention/Distraction',
                                                 'Texting': 'Driver Inattention/Distraction',
                                                 'Listening/Using Headphones': 'Driver Inattention/Distraction',
                                                 'Fatigued/Drowsy': 'Driver Inattention/Distraction',
                                                 'Fell Asleep': 'Driver Inattention/Distraction',
                                                  
                                                  
                                                 'Brakes Defective': 'Car Defects',
                                                 'Tinted Windows': 'Car Defects',
                                                 'Tire Failure/Inadequate': 'Car Defects',
                                                 'Tow Hitch Defective': 'Car Defects',
                                                 'Headlights Defective': 'Car Defects',
                                                 'Accelerator Defective': 'Car Defects',
                                                 'Windshield Inadequate': 'Car Defects',
                                                 'Driverless/Runaway Vehicle': 'Car Defects',
                                                 'Oversized Vehicle': 'Car Defects',

                                                  
                                                 'Traffic Control Disregarded':'Road Defects',
                                                 'Glare':'Road Defects',
                                                 'Tinted Windows':'Road Defects',
                                                 'Lane Marking Improper/Inadequate': 'Road Defects',
                                                 'View Obstructed/Limited': 'Road Defects',
                                                 'Pavement Defective': 'Road Defects',
                                                 'Other Lighting Defects': 'Road Defects',
                                                 'Obstruction/Debris': 'Road Defects',
                                                 'Traffic Control Device Improper/Non-Working': 'Road Defects',
                                                 'Shoulders Defective/Improper': 'Road Defects',
                                                 'Pavement Slippery': 'Road Defects',
                                                  
                                                 'Illnes': 'Illness',
                                                 'Lost Consciousness': 'Illness',
                                                 'Physical Disability': 'Illness',
                                                 'Prescription Medication': 'Illness',
                                                  
                                                 'Drugs (illegal)': 'Drugs (Illegal)',
                                                 'Alcohol Involvement': 'Drugs (Illegal)',
                                                  
                                                 'Pedestrian/Bicyclist/Other Pedestrian Error/Confusion': 'Outside Error',
                                                 'Vehicle Vandalism': 'Outside Error',
                                                 'Other Vehicular': 'Outside Error',
                                                  
                                                 }, inplace=True)

In [None]:
datanyc['contributing_factor_vehicle_1'].unique()

I will also drop 80 and 1 since I do not know what they are about. I will also drop the nan value.

In [None]:
nyc80 = datanyc[datanyc['contributing_factor_vehicle_1'] == '80' ].index
datanyc.drop(nyc80, inplace=True)

nyc1 = datanyc[datanyc['contributing_factor_vehicle_1'] == '1' ].index
datanyc.drop(nyc1, inplace=True)

datanyc.dropna(subset = ['contributing_factor_vehicle_1'], how='all', inplace=True)

In [None]:
datanyc['contributing_factor_vehicle_1'].unique()

Next, I will use Plotly, specifically, `plotly.express` to visualize some contributing factor trends.

In [None]:
!pip install plotly.express
import plotly.express as px

In [None]:
import plotly.graph_objects as go
fig = px.box(datanyc, x="number_of_persons_killed", y="contributing_factor_vehicle_1")
fig.show()

Most collisions that result in one person's death are due to driver inexperience, meanwhile most collisions that result with two persons death are due to an outside error.

Let's see when the accidents, in which more than one person die, tend to occur.

In [None]:
datanyc['time'] = pd.to_datetime(datanyc.accident_time)
datanyc['hour'] = datanyc['time'].dt.hour
datanyc.head()

In [None]:
import plotly.express as px
fig = px.scatter (datanyc, x= 'number_of_persons_killed', y= 'hour')
fig.show()

The collisions that result with more than one persons death tend to occur between 22:00 and 04:00.

## Thank you for reading!