## NEW YORK CITY TRAFFIC ACCIDENTS

Motor vehicle collisions reported by the New York City Police Department from January 2021 to 
April 2023. Each record represents an individual collision, including information on the date, time, 
and location of the accident (borough, zip code, street name, latitude/longitude), vehicles and 
victims involved, and contributing factors. The dataset and the data dictionary is provided in 
‘accident_data.zip file’. 

In [None]:
# import necessary libaries
import pandas as pd
import datetime as dt
# load accident data 
acd = pd.read_csv(r"C:\Users\HP\Documents\DATA ANALYSIS CLASS\VSCODE\DATASET\PROJECT\accident_data (2)\NYC_Collisions.csv")
acd = acd.dropna()
acd.head()

Unnamed: 0,Collision ID,Timestamp,Borough,Street Name,Cross Street,Latitude,Longitude,Contributing Factor,Vehicle Type,Persons Injured,Persons Killed,Pedestrians Injured,Pedestrians Killed,Cyclists Injured,Cyclists Killed,Motorists Injured,Motorists Killed
4,4380940,1609483000.0,Brooklyn,Cortelyou Road,Mc Donald Avenue,40.63791,-73.97864,Unspecified,Passenger Vehicle,0.0,0,0,0,0,0,0,0
8,4381082,1609536000.0,Brooklyn,Utica Avenue,East New York Avenue,40.663227,-73.93159,Unspecified,Passenger Vehicle,0.0,0,0,0,0,0,0,0
10,4380780,1609466000.0,Queens,230 Street,148 Avenue,40.656384,-73.75306,Unspecified,Passenger Vehicle,0.0,0,0,0,0,0,0,0
15,4381013,1609457000.0,Queens,21 Street,21 Avenue,40.78228,-73.914604,Fell Asleep,Passenger Vehicle,0.0,0,0,0,0,0,0,0
19,4380859,1609515000.0,Queens,Utopia Parkway,28 Avenue,40.77204,-73.79237,Unsafe Speed,Passenger Vehicle,0.0,0,0,0,0,0,0,0


Your Tasks: 
1. Compare the % of total accidents by month. Do you notice any seasonal patterns? 

In [None]:
# convert time from timestamp to datetime
acd["Date"] = acd["Timestamp"].map(lambda x: dt.datetime.fromtimestamp(x))
# extract month from datetime
acd["Month"] = acd["Date"].dt.month_name()
acd.head()
# groupby month
by_month = acd["Month"].value_counts(normalize=True)*100
by_month = by_month.round(2).to_frame().reset_index()
by_month

# I observe that the percentage of accident occurence reduces as the year comes to an end

Unnamed: 0,Month,proportion
0,March,10.5
1,January,9.82
2,February,8.78
3,June,8.3
4,October,8.18
5,September,8.08
6,August,7.91
7,July,7.88
8,May,7.81
9,April,7.71


Break down accident frequency by day of week and hour of day. Based on this data, when 
do accidents occur most frequently? 

In [None]:
acd["Day_of_week"] = acd["Date"].dt.day_name() # by day of the week
acd["hour_of_day"] = acd["Date"].dt.hour # by hour of the day
acd.head()
freq_by_day = acd["Day_of_week"].value_counts().sort_values(ascending=False).to_frame().reset_index()
freq_by_day.head(1)

# Based on this analysis, accident occur mostly on Friday

Unnamed: 0,Day_of_week,count
0,Friday,15958


In [115]:
freq_by_hour = acd["hour_of_day"].value_counts().sort_values(ascending=False).to_frame().reset_index()
freq_by_hour.head(1)

Unnamed: 0,hour_of_day,count
0,17,6769


On which particular street were the most accidents reported? What does that represent as a 
% of all reported accidents?

In [None]:
freq_by_street = acd["Street Name"].value_counts(normalize=True)*100 #by_street
freq_by_street= freq_by_street.round(2).to_frame().reset_index()
freq_by_street["Percentage"] = freq_by_street["proportion"].map(lambda x: f"{x}%")
freq_by_street.head(1)

# accidents were most reported from Broadway street with 1.2% 


Unnamed: 0,Street Name,proportion,Percentage
0,Broadway,1.16,1.16%


What was the most common contributing factor for the accidents reported? What about for 
fatal accidents specifically?

In [None]:
frq_fact = acd["Contributing Factor"].value_counts(normalize=True)*100 
frq_fact= frq_fact.round(2).to_frame().reset_index()
frq_fact.head(1)

# T he most contributing factor for all accident reported is Driver Inattention/Distraction

Unnamed: 0,Contributing Factor,proportion
0,Driver Inattention/Distraction,24.95


In [None]:
fatal = acd[acd["Persons Killed"] == 1] # filter based on accident fatality
frq_fact_fat = fatal["Contributing Factor"].value_counts(normalize=True)*100
frq_fact_fat = frq_fact_fat.round(2).to_frame().reset_index()
frq_fact_fat.head(1)
# For fatal accidents, the most contrivuting factor is not specified in the dataset

Unnamed: 0,Contributing Factor,proportion
0,Unspecified,28.25
