TEAM MEMBERS: BILL KISUYA, JOAN NJOROGE, BRENDA MUTAI, BRIAN NGENY, JEFF KIARIE & IVAN KIBET.

# 1.Introduction

Vehicle accidents are a major source of worry for public safety and transportation agencies. They not only cause death and property damage, but also interrupt traffic flow and incur economic losses. Understanding the factors that contribute to these collisions and their effects is critical for putting effective measures in place to decrease their occurrence and impact.

The City of Chicago has collected extensive crash data through its electronic crash reporting system (E-Crash), providing a valuable resource to analyse and gain insights into the factors contributing to accidents. The dataset comprises a wide range of crash parameters, including crash circumstances, vehicles involved, and people affected.

The goal of this study is to analyse the dataset and provide a full understanding of car crashes and their characteristics in the city of Chicago. We hope to discover key factors that contribute to collisions, measure the impact they have, and analyse the circumstances surrounding the accidents by studying the numerous characteristics associated with each crash event.

This project's target audience includes numerous road safety stakeholders such as transportation authorities, law enforcement agencies, policymakers, and insurance companies. Decision-makers can establish focused plans and activities to reduce the frequency and severity of accidents by knowing the fundamental elements that contribute to collisions.

# 2. Data Understanding

In [16]:
# Import necessary libraries
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns 
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier 
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier

In [17]:
# Set display options for pandas
pd.set_option('display.max_columns', None)
# Load data from CSV files into dataframes
crash = pd.read_csv('https://data.cityofchicago.org/resource/85ca-t3if.csv')
vehicle = pd.read_csv('https://data.cityofchicago.org/resource/68nd-jvt3.csv')
person = pd.read_csv('https://data.cityofchicago.org/resource/u6pd-qa9d.csv')

In [18]:
# merging all three databases into 1, observing shape and previewing data 
merged = pd.merge(left=crash, right = vehicle, left_on='crash_record_id', right_on="crash_record_id")
df = pd.merge(left=merged, right=person, left_on = 'vehicle_id', right_on='vehicle_id')
# Display the shape and a preview of the merged dataframe
print(df.shape)
df.head()

(1221, 149)


Unnamed: 0,crash_record_id_x,rd_no_x,crash_date_est_i,crash_date_x,posted_speed_limit,traffic_control_device,device_condition,weather_condition,lighting_condition,first_crash_type,trafficway_type,lane_cnt,alignment,roadway_surface_cond,road_defect,report_type,crash_type,intersection_related_i,private_property_i,hit_and_run_i,damage,date_police_notified,prim_contributory_cause,sec_contributory_cause,street_no,street_direction,street_name,beat_of_occurrence,photos_taken_i,statements_taken_i,dooring_i,work_zone_i,work_zone_type,workers_present_i,num_units,most_severe_injury,injuries_total,injuries_fatal,injuries_incapacitating,injuries_non_incapacitating,injuries_reported_not_evident,injuries_no_indication,injuries_unknown,crash_hour,crash_day_of_week,crash_month,latitude,longitude,location,crash_unit_id,rd_no_y,crash_date_y,unit_no,unit_type,num_passengers,vehicle_id,cmrc_veh_i,make,model,lic_plate_state,vehicle_year,vehicle_defect,vehicle_type,vehicle_use,travel_direction,maneuver,towed_i,fire_i,occupant_cnt,exceed_speed_limit_i,towed_by,towed_to,area_00_i,area_01_i,area_02_i,area_03_i,area_04_i,area_05_i,area_06_i,area_07_i,area_08_i,area_09_i,area_10_i,area_11_i,area_12_i,area_99_i,first_contact_point,cmv_id,usdot_no,ccmc_no,ilcc_no,commercial_src,gvwr,carrier_name,carrier_state,carrier_city,hazmat_placards_i,hazmat_name,un_no,hazmat_present_i,hazmat_report_i,hazmat_report_no,mcs_report_i,mcs_report_no,hazmat_vio_cause_crash_i,mcs_vio_cause_crash_i,idot_permit_no,wide_load_i,trailer1_width,trailer2_width,trailer1_length,trailer2_length,total_vehicle_length,axle_cnt,vehicle_config,cargo_body_type,load_type,hazmat_out_of_service_i,mcs_out_of_service_i,hazmat_class,person_id,person_type,crash_record_id_y,rd_no,crash_date,seat_no,city,state,zipcode,sex,age,drivers_license_state,drivers_license_class,safety_equipment,airbag_deployed,ejection,injury_classification,hospital,ems_agency,ems_run_no,driver_action,driver_vision,physical_condition,pedpedal_action,pedpedal_visibility,pedpedal_location,bac_result,bac_result_value,cell_phone_use
0,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,,,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",2023-08-06T00:05:00.000,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,,,,,,,2,NONINCAPACITATING INJURY,1,0,0,1,0,1,0,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633296,,2023-08-05T21:30:00.000,1,DRIVER,,1554223.0,N,NOVA BUS,OTHER (EXPLAIN IN NARRATIVE),IL,2014.0,UNKNOWN,BUS OVER 15 PASS.,CTA,E,STRAIGHT AHEAD,,,1.0,,,,,Y,,,,,,,,,,Y,Y,,FRONT,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,O1633296,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,,2023-08-05T21:30:00.000,,,,,X,,,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,UNKNOWN,NO INDICATION OF INJURY,,,,DISREGARDED CONTROL DEVICES,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED,,
1,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,,,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",2023-08-06T00:05:00.000,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,,,,,,,2,NONINCAPACITATING INJURY,1,0,0,1,0,1,0,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633297,,2023-08-05T21:30:00.000,2,DRIVER,,1554233.0,,FORD,FUSION,IL,2011.0,UNKNOWN,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,,,1.0,,,,,,,,,,,,,,Y,Y,Y,,FRONT-LEFT-CORNER,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,O1633297,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,,2023-08-05T21:30:00.000,,CHICAGO,IL,60617.0,F,37.0,IL,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,NONE,NONINCAPACITATING INJURY,,,,NONE,UNKNOWN,NORMAL,,,,TEST NOT OFFERED,,
2,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,,,,,,,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,Y,,3.0,,PRIVATE TOW,MECHANICS,,Y,,,,,,,,,,,Y,,FRONT-RIGHT-CORNER,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,O1633236,DRIVER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,2023-08-05T21:25:00.000,,CHICAGO,IL,,F,20.0,IL,D,SAFETY BELT USED,"DEPLOYED, FRONT",NONE,NO INDICATION OF INJURY,,,,FAILED TO YIELD,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED,,
3,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,,,,,,,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,Y,,3.0,,PRIVATE TOW,MECHANICS,,Y,,,,,,,,,,,Y,,FRONT-RIGHT-CORNER,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,P361406,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,2023-08-05T21:25:00.000,3.0,CHICAGO,IL,,F,75.0,,,SAFETY BELT USED,"DEPLOYED, SIDE",NONE,NO INDICATION OF INJURY,,,,,,,,,,,,
4,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,,,,,,,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,Y,,3.0,,PRIVATE TOW,MECHANICS,,Y,,,,,,,,,,,Y,,FRONT-RIGHT-CORNER,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,P361408,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,,2023-08-05T21:25:00.000,4.0,CHICAGO,IL,,F,17.0,,,SAFETY BELT USED,DID NOT DEPLOY,NONE,NO INDICATION OF INJURY,,,,,,,,,,,,


In [19]:
# Calculate the number of null values for each column 
nulls = df.isna().sum()
nulls

crash_record_id_x         0
rd_no_x                1221
crash_date_est_i       1142
crash_date_x              0
posted_speed_limit        0
                       ... 
pedpedal_visibility     696
pedpedal_location       696
bac_result              218
bac_result_value       1221
cell_phone_use         1221
Length: 149, dtype: int64

In [20]:
# calculating nulls percentage
nulls = df.isna().sum()
# Calculate the percentage of null values for columns with missing data
null_percent = nulls[nulls>0] / len(df)
# Display columns with their corresponding null percentages using a heatmap
null_percent.to_frame('% Null').style.background_gradient(cmap='Reds')

Unnamed: 0,% Null
rd_no_x,1.0
crash_date_est_i,0.935299
lane_cnt,1.0
report_type,0.009009
intersection_related_i,0.675676
private_property_i,0.976249
hit_and_run_i,0.573301
photos_taken_i,0.990172
statements_taken_i,0.955774
dooring_i,0.981982


In [21]:
# Extracting columns with excessive nulls which is set at 95%
# Identify columns with null percentages greater than 80% and store their indices in a list
Index_label = null_percent[null_percent>.80].index.tolist()
Index_label

['rd_no_x',
 'crash_date_est_i',
 'lane_cnt',
 'private_property_i',
 'photos_taken_i',
 'statements_taken_i',
 'dooring_i',
 'work_zone_i',
 'work_zone_type',
 'workers_present_i',
 'rd_no_y',
 'cmrc_veh_i',
 'towed_i',
 'fire_i',
 'exceed_speed_limit_i',
 'towed_by',
 'towed_to',
 'area_00_i',
 'area_01_i',
 'area_02_i',
 'area_03_i',
 'area_04_i',
 'area_05_i',
 'area_06_i',
 'area_07_i',
 'area_08_i',
 'area_09_i',
 'area_10_i',
 'area_11_i',
 'area_99_i',
 'cmv_id',
 'usdot_no',
 'ccmc_no',
 'ilcc_no',
 'commercial_src',
 'gvwr',
 'carrier_name',
 'carrier_state',
 'carrier_city',
 'hazmat_placards_i',
 'hazmat_name',
 'un_no',
 'hazmat_present_i',
 'hazmat_report_i',
 'hazmat_report_no',
 'mcs_report_i',
 'mcs_report_no',
 'hazmat_vio_cause_crash_i',
 'mcs_vio_cause_crash_i',
 'idot_permit_no',
 'wide_load_i',
 'trailer1_width',
 'trailer2_width',
 'trailer1_length',
 'trailer2_length',
 'total_vehicle_length',
 'axle_cnt',
 'vehicle_config',
 'cargo_body_type',
 'load_type',
 'h

In [22]:
# previewing shape, data and info 
df = df.drop(columns = Index_label)
print(df.shape)
# Display the first few rows of the cleaned DataFrame
display(df.head())
# Display information about the DataFrame's columns and non-null counts
df.info()

(1221, 81)


Unnamed: 0,crash_record_id_x,crash_date_x,posted_speed_limit,traffic_control_device,device_condition,weather_condition,lighting_condition,first_crash_type,trafficway_type,alignment,roadway_surface_cond,road_defect,report_type,crash_type,intersection_related_i,hit_and_run_i,damage,date_police_notified,prim_contributory_cause,sec_contributory_cause,street_no,street_direction,street_name,beat_of_occurrence,num_units,most_severe_injury,injuries_total,injuries_fatal,injuries_incapacitating,injuries_non_incapacitating,injuries_reported_not_evident,injuries_no_indication,injuries_unknown,crash_hour,crash_day_of_week,crash_month,latitude,longitude,location,crash_unit_id,crash_date_y,unit_no,unit_type,num_passengers,vehicle_id,make,model,lic_plate_state,vehicle_year,vehicle_defect,vehicle_type,vehicle_use,travel_direction,maneuver,occupant_cnt,area_12_i,first_contact_point,person_id,person_type,crash_record_id_y,crash_date,city,state,zipcode,sex,age,drivers_license_state,drivers_license_class,safety_equipment,airbag_deployed,ejection,injury_classification,hospital,ems_agency,driver_action,driver_vision,physical_condition,pedpedal_action,pedpedal_visibility,pedpedal_location,bac_result
0,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,"OVER $1,500",2023-08-06T00:05:00.000,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,2,NONINCAPACITATING INJURY,1,0,0,1,0,1,0,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633296,2023-08-05T21:30:00.000,1,DRIVER,,1554223.0,NOVA BUS,OTHER (EXPLAIN IN NARRATIVE),IL,2014.0,UNKNOWN,BUS OVER 15 PASS.,CTA,E,STRAIGHT AHEAD,1.0,Y,FRONT,O1633296,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,,,,X,,,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,UNKNOWN,NO INDICATION OF INJURY,,,DISREGARDED CONTROL DEVICES,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED
1,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,"OVER $1,500",2023-08-06T00:05:00.000,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,2,NONINCAPACITATING INJURY,1,0,0,1,0,1,0,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633297,2023-08-05T21:30:00.000,2,DRIVER,,1554233.0,FORD,FUSION,IL,2011.0,UNKNOWN,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,1.0,Y,FRONT-LEFT-CORNER,O1633297,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,CHICAGO,IL,60617.0,F,37.0,IL,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,NONE,NONINCAPACITATING INJURY,,,NONE,UNKNOWN,NORMAL,,,,TEST NOT OFFERED
2,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,O1633236,DRIVER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,CHICAGO,IL,,F,20.0,IL,D,SAFETY BELT USED,"DEPLOYED, FRONT",NONE,NO INDICATION OF INJURY,,,FAILED TO YIELD,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED
3,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,P361406,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,CHICAGO,IL,,F,75.0,,,SAFETY BELT USED,"DEPLOYED, SIDE",NONE,NO INDICATION OF INJURY,,,,,,,,,
4,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,NOT ON SCENE (DESK REPORT),INJURY AND / OR TOW DUE TO CRASH,Y,,"OVER $1,500",2023-08-05T21:35:00.000,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,NONINCAPACITATING INJURY,2,0,0,2,0,3,0,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,P361408,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,CHICAGO,IL,,F,17.0,,,SAFETY BELT USED,DID NOT DEPLOY,NONE,NO INDICATION OF INJURY,,,,,,,,,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1221 entries, 0 to 1220
Data columns (total 81 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   crash_record_id_x              1221 non-null   object 
 1   crash_date_x                   1221 non-null   object 
 2   posted_speed_limit             1221 non-null   int64  
 3   traffic_control_device         1221 non-null   object 
 4   device_condition               1221 non-null   object 
 5   weather_condition              1221 non-null   object 
 6   lighting_condition             1221 non-null   object 
 7   first_crash_type               1221 non-null   object 
 8   trafficway_type                1221 non-null   object 
 9   alignment                      1221 non-null   object 
 10  roadway_surface_cond           1221 non-null   object 
 11  road_defect                    1221 non-null   object 
 12  report_type                    1210 non-null   o

### Dropping Irrelevant Columns

In [23]:
# dropping following columns due to irrelevance in predicting the cause of car accidents 
# irrelevent columns were dropped due to column description
drop = ['report_type', 'crash_type', 'damage', 'date_police_notified', 'injuries_fatal', 
        'injuries_incapacitating', 'most_severe_injury','injuries_non_incapacitating',
        'injuries_reported_not_evident', 'injuries_no_indication', 'injuries_unknown', 'crash_date',
        'crash_date', 'ejection','injury_classification']

df = df.drop(columns = drop)
print(df.shape)
df.head()

(1221, 67)


Unnamed: 0,crash_record_id_x,crash_date_x,posted_speed_limit,traffic_control_device,device_condition,weather_condition,lighting_condition,first_crash_type,trafficway_type,alignment,roadway_surface_cond,road_defect,intersection_related_i,hit_and_run_i,prim_contributory_cause,sec_contributory_cause,street_no,street_direction,street_name,beat_of_occurrence,num_units,injuries_total,crash_hour,crash_day_of_week,crash_month,latitude,longitude,location,crash_unit_id,crash_date_y,unit_no,unit_type,num_passengers,vehicle_id,make,model,lic_plate_state,vehicle_year,vehicle_defect,vehicle_type,vehicle_use,travel_direction,maneuver,occupant_cnt,area_12_i,first_contact_point,person_id,person_type,crash_record_id_y,city,state,zipcode,sex,age,drivers_license_state,drivers_license_class,safety_equipment,airbag_deployed,hospital,ems_agency,driver_action,driver_vision,physical_condition,pedpedal_action,pedpedal_visibility,pedpedal_location,bac_result
0,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,Y,,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,2,1,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633296,2023-08-05T21:30:00.000,1,DRIVER,,1554223.0,NOVA BUS,OTHER (EXPLAIN IN NARRATIVE),IL,2014.0,UNKNOWN,BUS OVER 15 PASS.,CTA,E,STRAIGHT AHEAD,1.0,Y,FRONT,O1633296,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,,,,X,,,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,DISREGARDED CONTROL DEVICES,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED
1,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,2023-08-05T21:30:00.000,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,STRAIGHT AND LEVEL,UNKNOWN,UNKNOWN,Y,,DISREGARDING TRAFFIC SIGNALS,NOT APPLICABLE,7900,S,YATES BLVD,414,2,1,21,7,8,41.751636,-87.566378,POINT (-87.56637812136 41.751636269168),1633297,2023-08-05T21:30:00.000,2,DRIVER,,1554233.0,FORD,FUSION,IL,2011.0,UNKNOWN,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,1.0,Y,FRONT-LEFT-CORNER,O1633297,DRIVER,83f47d2f7607dd5cfc088448c0c6b2628e8df16741e6bf...,CHICAGO,IL,60617.0,F,37.0,IL,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,NONE,UNKNOWN,NORMAL,,,,TEST NOT OFFERED
2,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,Y,,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,2,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,O1633236,DRIVER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,CHICAGO,IL,,F,20.0,IL,D,SAFETY BELT USED,"DEPLOYED, FRONT",,,FAILED TO YIELD,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED
3,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,Y,,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,2,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,P361406,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,CHICAGO,IL,,F,75.0,,,SAFETY BELT USED,"DEPLOYED, SIDE",,,,,,,,,
4,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,2023-08-05T21:25:00.000,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,STRAIGHT AND LEVEL,WET,NO DEFECTS,Y,,DISREGARDING STOP SIGN,DISREGARDING STOP SIGN,9198,S,YATES BLVD,413,2,2,21,7,8,41.728152,-87.565916,POINT (-87.565915750169 41.728151636186),1633236,2023-08-05T21:25:00.000,1,DRIVER,2.0,1554169.0,VOLKSWAGEN,JETTA,IL,2016.0,NONE,PASSENGER,PERSONAL,N,STRAIGHT AHEAD,3.0,Y,FRONT-RIGHT-CORNER,P361408,PASSENGER,46950c5f81c77133005e15d19b7f97d1aaa99d967af509...,CHICAGO,IL,,F,17.0,,,SAFETY BELT USED,DID NOT DEPLOY,,,,,,,,,


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1221 entries, 0 to 1220
Data columns (total 67 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   crash_record_id_x        1221 non-null   object 
 1   crash_date_x             1221 non-null   object 
 2   posted_speed_limit       1221 non-null   int64  
 3   traffic_control_device   1221 non-null   object 
 4   device_condition         1221 non-null   object 
 5   weather_condition        1221 non-null   object 
 6   lighting_condition       1221 non-null   object 
 7   first_crash_type         1221 non-null   object 
 8   trafficway_type          1221 non-null   object 
 9   alignment                1221 non-null   object 
 10  roadway_surface_cond     1221 non-null   object 
 11  road_defect              1221 non-null   object 
 12  intersection_related_i   396 non-null    object 
 13  hit_and_run_i            521 non-null    object 
 14  prim_contributory_cause 

### Dropping Redundant Columns

In [25]:
# dropping redundant columns, previewing shape, data and info 
drop = ['crash_record_id_x', 'crash_date_x', 'alignment', 'intersection_related_i', 'sec_contributory_cause',
        'num_units','crash_unit_id', 'vehicle_id', 'person_id', "crash_record_id_y", 'street_no', 'street_direction',
       'street_name', 'location', 'zipcode', 'crash_month', 'latitude', 'longitude', 'crash_date_y', 'unit_no',
       'model', 'vehicle_year', 'vehicle_use', 'travel_direction', 'maneuver', 'occupant_cnt', 'first_contact_point',
       'lic_plate_state', 'city']
df = df.drop(columns=drop)
print(df.shape)
display(df.head())
df.info()

(1221, 38)


Unnamed: 0,posted_speed_limit,traffic_control_device,device_condition,weather_condition,lighting_condition,first_crash_type,trafficway_type,roadway_surface_cond,road_defect,hit_and_run_i,prim_contributory_cause,beat_of_occurrence,injuries_total,crash_hour,crash_day_of_week,unit_type,num_passengers,make,vehicle_defect,vehicle_type,area_12_i,person_type,state,sex,age,drivers_license_state,drivers_license_class,safety_equipment,airbag_deployed,hospital,ems_agency,driver_action,driver_vision,physical_condition,pedpedal_action,pedpedal_visibility,pedpedal_location,bac_result
0,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,UNKNOWN,UNKNOWN,,DISREGARDING TRAFFIC SIGNALS,414,1,21,7,DRIVER,,NOVA BUS,UNKNOWN,BUS OVER 15 PASS.,Y,DRIVER,,X,,,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,DISREGARDED CONTROL DEVICES,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED
1,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,UNKNOWN,UNKNOWN,,DISREGARDING TRAFFIC SIGNALS,414,1,21,7,DRIVER,,FORD,UNKNOWN,PASSENGER,Y,DRIVER,IL,F,37.0,IL,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,NONE,UNKNOWN,NORMAL,,,,TEST NOT OFFERED
2,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,DRIVER,IL,F,20.0,IL,D,SAFETY BELT USED,"DEPLOYED, FRONT",,,FAILED TO YIELD,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED
3,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,PASSENGER,IL,F,75.0,,,SAFETY BELT USED,"DEPLOYED, SIDE",,,,,,,,,
4,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,PASSENGER,IL,F,17.0,,,SAFETY BELT USED,DID NOT DEPLOY,,,,,,,,,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1221 entries, 0 to 1220
Data columns (total 38 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   posted_speed_limit       1221 non-null   int64  
 1   traffic_control_device   1221 non-null   object 
 2   device_condition         1221 non-null   object 
 3   weather_condition        1221 non-null   object 
 4   lighting_condition       1221 non-null   object 
 5   first_crash_type         1221 non-null   object 
 6   trafficway_type          1221 non-null   object 
 7   roadway_surface_cond     1221 non-null   object 
 8   road_defect              1221 non-null   object 
 9   hit_and_run_i            521 non-null    object 
 10  prim_contributory_cause  1221 non-null   object 
 11  beat_of_occurrence       1221 non-null   int64  
 12  injuries_total           1221 non-null   int64  
 13  crash_hour               1221 non-null   int64  
 14  crash_day_of_week       

In [27]:
# Previewing the dataset
df.head()

Unnamed: 0,posted_speed_limit,traffic_control_device,device_condition,weather_condition,lighting_condition,first_crash_type,trafficway_type,roadway_surface_cond,road_defect,hit_and_run_i,prim_contributory_cause,beat_of_occurrence,injuries_total,crash_hour,crash_day_of_week,unit_type,num_passengers,make,vehicle_defect,vehicle_type,area_12_i,person_type,state,sex,age,drivers_license_state,drivers_license_class,safety_equipment,airbag_deployed,hospital,ems_agency,driver_action,driver_vision,physical_condition,pedpedal_action,pedpedal_visibility,pedpedal_location,bac_result
0,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,UNKNOWN,UNKNOWN,,DISREGARDING TRAFFIC SIGNALS,414,1,21,7,DRIVER,,NOVA BUS,UNKNOWN,BUS OVER 15 PASS.,Y,DRIVER,,X,,,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,DISREGARDED CONTROL DEVICES,UNKNOWN,UNKNOWN,,,,TEST NOT OFFERED
1,30,TRAFFIC SIGNAL,UNKNOWN,CLEAR,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,UNKNOWN,UNKNOWN,,DISREGARDING TRAFFIC SIGNALS,414,1,21,7,DRIVER,,FORD,UNKNOWN,PASSENGER,Y,DRIVER,IL,F,37.0,IL,,USAGE UNKNOWN,DEPLOYMENT UNKNOWN,,,NONE,UNKNOWN,NORMAL,,,,TEST NOT OFFERED
2,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,DRIVER,IL,F,20.0,IL,D,SAFETY BELT USED,"DEPLOYED, FRONT",,,FAILED TO YIELD,NOT OBSCURED,NORMAL,,,,TEST NOT OFFERED
3,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,PASSENGER,IL,F,75.0,,,SAFETY BELT USED,"DEPLOYED, SIDE",,,,,,,,,
4,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,FOUR WAY,WET,NO DEFECTS,,DISREGARDING STOP SIGN,413,2,21,7,DRIVER,2.0,VOLKSWAGEN,NONE,PASSENGER,Y,PASSENGER,IL,F,17.0,,,SAFETY BELT USED,DID NOT DEPLOY,,,,,,,,,
