# Discovery notebook for cycling crash data

Crash data from 2010 to 2017

This data came from: https://zenodo.org/records/5603036

Source: https://zenodo.org/records/5603036/files/louisville.zip

Shape: 1273 rows x 54 columns

In [1]:
import pandas as pd
import matplotlib

DATA = "../data/cycling_safety_louisville.csv"
df = pd.read_csv(DATA)

# Building the data dictionaries


## cycling_safety data
| column name | type | description | value notes | cleaning notes |
|-------------|------|-------------|-------------|----------------|
|  Unnamed: 0 | number | index number for row | | drop | 
| MASTER FILE NUMBER | number | case number for authorities? | | possibly ignore | 
| INVESTIGATING AGENCY | string | agency responding to crash | usually LMPD, but others too | | 
| LOCAL CODE | number | case number for local authority | a couple repeat values | possibly ignore |
| COLLISION STATUS CODE | string | code for case status | usually AC |  | 
| COUNTY NAME | number | county crash occurred in | always "56" (Jefferson County, KY) | drop | 
| ROADWAY NUMBER | alphanumeric | code for state/county/etc roads like KY303 | a lot of null values |  | 
| BLOCK/HOUSE # | number | street number for a location | NAN's and blank values present | | 
| ROADWAY NAME | string | name of primary road where the crash occurred | some blank and null values; there might be some weird long strings | | 
| ROADWAY SUFFIX | string | RD, AVE, LN, WAY, stuff like that | some blank and null values | | 
| ROADWAY DIRECTION CODE | string | S, N, E, W, stuff like that | | | 
| GPS LATITUDE DECIMAL | float/decimal | latitude coordinate | same as "Latitude"? | redundant; drop | 
| GPS LONGITUDE DECIMAL | float/decimal | longitudue coordinate | same as "longitude"? | redundant; drop | 
| MILEPOINT DERIVED | float | mile marker along road | some null values | maybe useful for hiways? | 
| COLLISION DATE | string | date of collision | 2010-2017 | |
| COLLISION TIME |string | time of collision | | | 
| INTERSECTION ROADWAY # | number | route number for intersecting road|  blanks and NaN's| | 
| INTERSECTION ROADWAY NAME | string | name of intersecting road | blanks and NaN's; at least one problematic string | | 
| INTERSECTION ROADWAY SFX | string | suffix for intersecting road | blanks and NaN's | | 
| BETWEEN STREET ROADWAY # 1 | number | route number for between road #1 | blanks and NaN's | | 
| BETWEEN STREET ROADWAY NAME 1 | string | name for between road #1 | blanks and NaN's | | 
| BETWEEN STREET ROADWAY SFX 1 | string | suffix for between street #1 | blanks and NaN's | | 
| BETWEEN STREET ROADWAY # 2 | number | route number for between road # 2 | blanks and NaN's | | 
| BETWEEN STREET ROADWAY NAME 2 | string | name for between road #2 | blanks and NaN's | | 
| BETWEEN STREET ROADWAY SFX 2 | string | suffix for between road #2 | blanks and NaN's | | 
| UNITS INVOLVED | number | number of units involved in crash (includes bikes, pedestrians, cars, etc.) | | | 
| MOTOR VEHICLES INVOLVED | number | number of motor vehicles involved in crash  | | | 
| KILLED | number | number of people killed in the crash | | | 
| INJURED | number | number of people injured in the crash | | |
| WEATHER CODE | number | numeric code for WEATHER condition | maps 1:1 with WEATHER | redundant? |
| WEATHER | string | human readable for WEATHER CODE | maps 1:1 with WEATHER CODE | | 
| ROADWAY CONDITION CODE | number | numeric code for ROADWAY CONDITION | | | 
| ROADWAY CONDITION | string | human readable for ROWADWAY CONDITION CODE | | |
| HIT & RUN INDICATOR | string | was the crash a hit and run?  | Y/N string indicator | change name to avoid "&"; convert to boolean | 
| ROADWAY TYPE CODE | number | numeric code for ROADWAY TYPE | | | 
| ROADWAY TYPE | string | human readable for ROADWAY TYPE CODE | | | 
| DIRECTIONAL ANALYSIS CODE | number | numeric code for DIRECTIONAL ANALYSIS | | | 
| DIRECTIONAL ANALYSIS | string | human readable for DIRECTIONAL ANALYSIS CODE  | | | 
| MANNER OF COLLISION CODE | number | numeric code for MANNER OF COLLISION  | | | 
| MANNER OF COLLISION | string | human readable for MANNER OF COLLISION CODE | | |
| ROADWAY CHARACTER CODE | number | numeric code for ROADWAY CHARACTER | | | 
| ROADWAY CHARACTER | string | human readable for ROADWAY CHARACTER CODE | | |
| LIGHT CONDITION CODE | number | numeric code for LIGHT CONDITION | | | 
| LIGHT CONDITION | string | human readable string for LIGHT CONDITION CODE | | |
| RAMP FROM ROADWAY ID | number? | road number for off-ramp | all nulls | ignore | 
| RAMP TO ROADWAY ID | number? | road number for on-ramp | all nulls | ignore | 
| SECONDARY COLLISION INDICATOR | string/NAN | was this collision a result of another collision | Y/N string indicator; some null values | is this always 0/False? convert to boolean; investigate null values | 
| hour | int | hour of collision | 24 hour clock | isn't this a repeat of info in COLLISION TIME? | 
| minute | int | minute of collision | normal minutes | isn't this a repeat of info in COLLISION TIME? |
| Date | datestring | another date field? | do these values match with 'COLLISION DATE'? | figure out what the difference is between this field and COLLISION DATE, if any. If they're redundant, ignore one of them |
| Latitude | float | latitude coordinate of crash site | within boundary range for Jefferson County | keep | 
| Longitude | float | longitude coordinate of crash site | within boundary range for Jefferson Cty | keep |
| geometry | POINT | point location in logitude, latitude form | repeated data elsewhere | drop; can reconstruct from other fields |  
| index_right | number | another index value | always 0 | ignore | 

## Exploring cycling_safety data

In [3]:
# List all the columns
df.columns

Index(['Unnamed: 0', 'MASTER FILE NUMBER', 'INVESTIGATING AGENCY',
       'LOCAL CODE', 'COLLISION STATUS CODE', 'COUNTY NAME', 'ROADWAY NUMBER',
       'BLOCK/HOUSE #', 'ROADWAY NAME', 'ROADWAY SUFFIX',
       'ROADWAY DIRECTION CODE', 'GPS LATITUDE DECIMAL',
       'GPS LONGITUDE DECIMAL', 'MILEPOINT DERIVED', 'COLLISION DATE',
       'COLLISION TIME', 'INTERSECTION ROADWAY #', 'INTERSECTION ROADWAY NAME',
       'INTERSECTION ROADWAY SFX', 'BETWEEN STREET ROADWAY # 1',
       'BETWEEN STREET ROADWAY NAME 1', 'BETWEEN STREET ROADWAY SFX 1',
       'BETWEEN STREET ROADWAY # 2', 'BETWEEN STREET ROADWAY NAME 2',
       'BETWEEN STREET ROADWAY SFX 2', 'UNITS INVOLVED',
       'MOTOR VEHICLES INVOLVED', 'KILLED', 'INJURED', 'WEATHER CODE',
       'WEATHER', 'ROADWAY CONDITION CODE', 'ROADWAY CONDITION',
       'HIT & RUN INDICATOR', 'ROADWAY TYPE CODE', 'ROADWAY TYPE',
       'DIRECTIONAL ANALYSIS CODE', 'DIRECTIONAL ANALYSIS',
       'MANNER OF COLLISION CODE', 'MANNER OF COLLISION',
   

Column names can be organized into groups of related data
### Date / Time columns
['COLLISION DATE', 'COLLISION TIME', 'hour', 'minute', 'Date']
### Geolocation columns
['GPS LATITUDE DECIMAL', 'GPS LONGITUDE DECIMAL', 'Latitude', 'Longitude', 'geometry']
### Address columns
['COUNTY NAME', 'ROADWAY NUMBER', 'BLOCK/HOUSE #', 'ROADWAY NAME', 'ROADWAY SUFFIX',
    'ROADWAY DIRECTION CODE', 'MILEPOINT DERIVED','INTERSECTION ROADWAY #', 'INTERSECTION ROADWAY NAME',
       'INTERSECTION ROADWAY SFX', 'BETWEEN STREET ROADWAY # 1',
       'BETWEEN STREET ROADWAY NAME 1', 'BETWEEN STREET ROADWAY SFX 1',
       'BETWEEN STREET ROADWAY # 2', 'BETWEEN STREET ROADWAY NAME 2',
       'BETWEEN STREET ROADWAY SFX 2', 'RAMP FROM ROADWAY ID', 'RAMP TO ROADWAY ID']
### Code columns
['LOCAL CODE', 'COLLISION STATUS CODE', 'WEATHER CODE', 
 'ROADWAY CONDITION CODE', 'ROADWAY TYPE CODE', 'DIRECTIONAL ANALYSIS CODE',
 'MANNER OF COLLISION CODE', 'ROADWAY CHARACTER CODE', 'LIGHT CONDITION CODE']
    # Some codes are paired with a human_readable string
['WEATHER', 'ROADWAY CONDITION', 'ROADWAY TYPE', 'DIRECTIONAL ANALYSIS', 'MANNER OF COLLISION',
     'ROADWAY CHARACTER', 'LIGHT CONDITION']
   #### Other codes are stand alone:
['LOCAL CODE', 'COLLISION STATUS CODE']

### Other info
['Unnamed: 0', 'MASTER FILE NUMBER', 'INVESTIGATING AGENCY', 'LOCAL CODE', 'COLLISION STATUS CODE',
  'COUNTY NAME','UNITS INVOLVED', 'MOTOR VEHICLES INVOLVED', 'KILLED', 'INJURED', 'HIT & RUN INDICATOR',
  'SECONDARY COLLISION INDICATOR', 'index_right']


# Columns

### Date and time columns
`['COLLISION DATE', 'COLLISION TIME', 'hour', 'minute', 'Date']`

In [5]:
df[['Date', 'COLLISION DATE']].agg(("min", "max"))

Unnamed: 0,Date,COLLISION DATE
min,2010-01-13 10:00:00,1/10/2011
max,2017-12-22 21:51:00,9/9/2016


In [6]:
df[['COLLISION TIME', 'hour', 'minute']].agg(("min", "max"))

Unnamed: 0,COLLISION TIME,hour,minute
min,0,0,0
max,100009,23,59


In [7]:
df[['Date', 'COLLISION DATE', 'COLLISION TIME']]
# These seem to match up. It shouldn't be too hard to write code to verify this.

Unnamed: 0,Date,COLLISION DATE,COLLISION TIME
0,2010-02-20 16:20:00,2/20/2010,1620
1,2010-01-13 13:40:00,1/13/2010,1340
2,2010-01-13 10:00:00,1/13/2010,100008
3,2010-01-15 15:50:00,1/15/2010,1550
4,2010-02-02 06:11:00,2/2/2010,611
...,...,...,...
1268,2017-12-05 07:07:00,12/5/2017,707
1269,2017-12-14 17:09:00,12/14/2017,1709
1270,2017-12-19 10:00:00,12/19/2017,100002
1271,2017-12-21 19:56:00,12/21/2017,1956


In [8]:
df[['COLLISION TIME', 'hour', 'minute']]
# These seem to match up. Write a script to check this data

Unnamed: 0,COLLISION TIME,hour,minute
0,1620,16,20
1,1340,13,40
2,100008,10,0
3,1550,15,50
4,611,6,11
...,...,...,...
1268,707,7,7
1269,1709,17,9
1270,100002,10,0
1271,1956,19,56


In [9]:
times = df[['COLLISION TIME', 'hour', 'minute']]
times[(times['hour'] == 10)].sort_values(by='COLLISION TIME')


Unnamed: 0,COLLISION TIME,hour,minute
861,1000,10,0
26,1000,10,0
1052,1000,10,0
104,1000,10,0
1063,1001,10,1
...,...,...,...
1201,100009,10,0
6,100009,10,0
712,100009,10,0
542,100009,10,0


### Geolocation columns

`['GPS LATITUDE DECIMAL', 'GPS LONGITUDE DECIMAL', 'Latitude', 'Longitude', 'geometry']`


In [10]:

df[['GPS LONGITUDE DECIMAL', 'Longitude', 'GPS LATITUDE DECIMAL', 'Latitude', 'geometry']]
# POINT data is more precise than other lat/long data

Unnamed: 0,GPS LONGITUDE DECIMAL,Longitude,GPS LATITUDE DECIMAL,Latitude,geometry
0,-85.707933,-85.707933,38.231850,38.231850,POINT (-85.707933333 38.23185)
1,-85.696572,-85.696572,38.273995,38.273995,POINT (-85.6965716 38.2739947)
2,-85.703576,-85.703576,38.258551,38.258551,POINT (-85.70357610000001 38.2585512)
3,-85.697265,-85.697265,38.250012,38.250012,POINT (-85.6972652 38.2500121)
4,-85.793380,-85.793380,38.195890,38.195890,POINT (-85.7933803 38.1958905)
...,...,...,...,...,...
1268,-85.733644,-85.733644,38.153815,38.153815,POINT (-85.73364410000001 38.1538153)
1269,-85.688008,-85.688008,38.163618,38.163618,POINT (-85.6880079 38.1636178)
1270,-85.671480,-85.671480,38.160030,38.160030,POINT (-85.6714798 38.1600301)
1271,-85.626309,-85.626309,38.198257,38.198257,POINT (-85.62630919999999 38.1982569)


### "... CODE" columns

`['LOCAL CODE', 'COLLISION STATUS CODE', 'WEATHER CODE', 
 'ROADWAY CONDITION CODE', 'ROADWAY TYPE CODE', 'DIRECTIONAL ANALYSIS CODE',
 'MANNER OF COLLISION CODE', 'ROADWAY CHARACTER CODE', 'LIGHT CONDITION CODE']`

`['WEATHER', 'ROADWAY CONDITION', 'ROADWAY TYPE', 'DIRECTIONAL ANALYSIS', 'MANNER OF COLLISION',
'ROADWAY CHARACTER', 'LIGHT CONDITION']`

In [11]:
human_readable = ['WEATHER', 'ROADWAY CONDITION', 'ROADWAY TYPE', 'DIRECTIONAL ANALYSIS', 'MANNER OF COLLISION',
     'ROADWAY CHARACTER', 'LIGHT CONDITION']
codes = {H:H+" CODE" for H in human_readable}
codes

{'WEATHER': 'WEATHER CODE',
 'ROADWAY CONDITION': 'ROADWAY CONDITION CODE',
 'ROADWAY TYPE': 'ROADWAY TYPE CODE',
 'DIRECTIONAL ANALYSIS': 'DIRECTIONAL ANALYSIS CODE',
 'MANNER OF COLLISION': 'MANNER OF COLLISION CODE',
 'ROADWAY CHARACTER': 'ROADWAY CHARACTER CODE',
 'LIGHT CONDITION': 'LIGHT CONDITION CODE'}

In [12]:
# Test each X CODE : X pair: They should be a 1:1 mapping, so the count of unique items for each
# human readable should equal the count of unique values for the numeric codes
all((len(df[key].unique()) == len(df[value].unique())) for key, value in codes.items())

True

In [13]:
df['ROADWAY TYPE CODE'].value_counts(dropna=False)
# Has some null values
# NAN roadway type code -> NONE OF THE ABOVE in ROADWAY TYPE

ROADWAY TYPE CODE
5.0     665
7.0     270
2.0     255
1.0      48
99.0     19
4.0      11
NaN       4
3.0       1
Name: count, dtype: int64

### Address information

`['COUNTY NAME', 'ROADWAY NUMBER', 'BLOCK/HOUSE #', 'ROADWAY NAME', 'ROADWAY SUFFIX',
    'ROADWAY DIRECTION CODE', 'MILEPOINT DERIVED','INTERSECTION ROADWAY #', 'INTERSECTION ROADWAY NAME',
       'INTERSECTION ROADWAY SFX', 'BETWEEN STREET ROADWAY # 1',
       'BETWEEN STREET ROADWAY NAME 1', 'BETWEEN STREET ROADWAY SFX 1',
       'BETWEEN STREET ROADWAY # 2', 'BETWEEN STREET ROADWAY NAME 2',
       'BETWEEN STREET ROADWAY SFX 2', 'RAMP FROM ROADWAY ID', 'RAMP TO ROADWAY ID']`

In [14]:
# COUNTY NAME column
CN = df['COUNTY NAME']
CN.unique()
# Always "56"
# Ignore this? All reports are in Jefferson County, KY


array([56])

In [15]:
df['ROADWAY NUMBER'].value_counts(dropna=False)
# Many of these are NAN or blank
df['ROADWAY NUMBER'].unique()

array(['US0031  ', nan, 'US0042  ', 'KY1020  ', 'US0031E ', 'US0150  ',
       'KY2051  ', 'KY1631  ', 'KY2053  ', 'KY1747  ', 'US0031W ',
       'KY0061  ', 'KY1865  ', 'US0060  ', 'KY1065  ', 'I 0264  ',
       'KY0155  ', 'KY2048  ', 'KY0864  ', 'US0060A ', 'KY1931  ',
       'KY2049  ', 'KY1447  ', 'KY2860  ', 'KY2050  ', 'KY1934  ',
       'KY2052  ', 'KY1703  ', 'KY1450  ', 'KY3064  ', 'KY1819  ',
       'KY2054  ', 'KY1932  ', 'KY1727  ', 'KY0146  ', 'KY0907  ',
       'KY0841  ', 'KY2251  ', 'KY3082  ', 'I 0064  ', 'KY0913  ',
       'KY2845  ', 'I 0065  ', 'KY2055  ', 'KY3077  ', 'KY1230  ',
       'KY1142  ', 'KY0148  ', '        ', 'I 0265  '], dtype=object)

In [16]:
df['BLOCK/HOUSE #'].value_counts(dropna=False)
# Building address numbers
# Some crash reports don't have any value here, or a null value
# My assumption is that sometimes crashes don't occur near any particular address

BLOCK/HOUSE #
NaN       1142
            35
600.0        2
4200         2
900.0        2
          ... 
1500.0       1
8020.0       1
9120.0       1
2216.0       1
131.0        1
Name: count, Length: 90, dtype: int64

In [17]:
# ROADWAY NAME column 
RN = df['ROADWAY NAME']
RN.info()


<class 'pandas.core.series.Series'>
RangeIndex: 1273 entries, 0 to 1272
Series name: ROADWAY NAME
Non-Null Count  Dtype 
--------------  ----- 
1264 non-null   object
dtypes: object(1)
memory usage: 10.1+ KB


In [18]:
# ROADWAY SUFFIX column 
RS = df['ROADWAY SUFFIX']
RS.unique()
RS.value_counts(dropna=False)
# Some blank values and NaN's

ROADWAY SUFFIX
ST       378
RD       227
AVE      202
LN        90
NaN       81
HWY       79
DR        67
BLVD      49
PKWY      33
TRL       11
CT        10
WAY        9
PL         8
LOOP       7
TRCE       4
TPKE       4
           3
CIR        3
BYP        3
ALY        2
PARK       1
TER        1
PLZ        1
Name: count, dtype: int64

In [19]:
# MILEPOINT DERIVED column
import statistics
MP = df['MILEPOINT DERIVED']
# This has some null values
MP.dropna().agg(("min", "max", statistics.mean, statistics.median, statistics.mode, statistics.stdev))

# Probably not useful for me. Likely will drop column.

min         0.000000
max       124.526000
mean        4.037019
median      1.170000
mode        0.000000
stdev       6.485810
Name: MILEPOINT DERIVED, dtype: float64

In [20]:
# INTERSECTION ROADWAY columns
IRW = df[['INTERSECTION ROADWAY #', 'INTERSECTION ROADWAY NAME', 'INTERSECTION ROADWAY SFX']]
IRW
# Intersecting road where crash occurred. Same types of info as ROADWAY NUMBER, ROADWAY NAME, ROADWAY SUFFIX
# One long string at index [1]; might there be others? 
# NaN's and blanks seem significant, because there may be no intersection for some crashes.

Unnamed: 0,INTERSECTION ROADWAY #,INTERSECTION ROADWAY NAME,INTERSECTION ROADWAY SFX
0,,DEERPARK,
1,I 0071,I71 N EXIT2 OFF RAMP TO ZORN AVE,
2,,JANE,ST
3,,,
4,,CONN,ST
...,...,...,...
1268,,,
1269,,WOODED,WAY
1270,,JEFFERSON,BLVD
1271,,ARJAY,LN


In [21]:
# BETWEEN STREET columns
BWS = df[['BETWEEN STREET ROADWAY # 1', 'BETWEEN STREET ROADWAY NAME 1', 'BETWEEN STREET ROADWAY SFX 1',
       'BETWEEN STREET ROADWAY # 2', 'BETWEEN STREET ROADWAY NAME 2', 'BETWEEN STREET ROADWAY SFX 2']]
BWS[200:210] # Just looking at random slices.
# Similar information as ROADWAY NUMBER, ROADWAY NAME, INTERSECTION ROADWAY #, etc.
# These seem to report crashes that occur between two roads.
# Check to see how this affects the primary/intersecting road information in the other columns

Unnamed: 0,BETWEEN STREET ROADWAY # 1,BETWEEN STREET ROADWAY NAME 1,BETWEEN STREET ROADWAY SFX 1,BETWEEN STREET ROADWAY # 2,BETWEEN STREET ROADWAY NAME 2,BETWEEN STREET ROADWAY SFX 2
200,,,,,,
201,KY2049,CRUMS,LN,,KRISTIN,WAY
202,,,,,,
203,KY3217,37TH,ST,,36TH,ST
204,,,,,,
205,,,,,,
206,,BILLTOWN,RD,,VANTAGE PLAZA,DR
207,,,,,,
208,,,,,,
209,,,,,,


In [22]:
# RAMP FROM/TO ROADWAY column
df[['RAMP FROM ROADWAY ID', 'RAMP TO ROADWAY ID']].info()
# All the values for this column are null. Probably can ignore but check if similar column exists in other data.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1273 entries, 0 to 1272
Data columns (total 2 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   RAMP FROM ROADWAY ID  0 non-null      float64
 1   RAMP TO ROADWAY ID    0 non-null      float64
dtypes: float64(2)
memory usage: 20.0 KB


### Other columns
`['Unnamed: 0', 'MASTER FILE NUMBER', 'INVESTIGATING AGENCY', 'LOCAL CODE', 'COLLISION STATUS CODE',
  'COUNTY NAME','UNITS INVOLVED', 'MOTOR VEHICLES INVOLVED', 'KILLED', 'INJURED', 'HIT & RUN INDICATOR',
  'SECONDARY COLLISION INDICATOR', 'index_right']`

In [23]:
# Unnamed: 0 column
unnamed_column = df['Unnamed: 0']
len(unnamed_column) == len(unnamed_column.unique())
# This looks to be an index column.
# I will probably just ignore it. 

True

In [24]:
# index_right column
df.index_right
df.index_right.unique()
# This column is all zeros. I will ignore it entirely. 

array([0])

In [25]:
# MASTER FILE NUMBER column
MFN = df['MASTER FILE NUMBER']
len(MFN.unique()), len(MFN)
# This looks like a unique identifier for each crash report. 
# I will keep this column


(1273, 1273)

In [26]:
# INVESTIGATING AGENCY
IA = df['INVESTIGATING AGENCY']
IA.unique()
IA.value_counts(dropna=False)

INVESTIGATING AGENCY
LOUISVILLE METRO POLICE DEPT      1160
SHIVELY POLICE DEPARTMENT           40
ST. MATTHEWS POLICE DEPARTMENT      29
JEFFERSONTOWN POLICE DEPT           24
UNIV. OF LOUISVILLE POLICE           7
INDIAN HILLS POLICE DEPARTMENT       5
PROSPECT POLICE DEPARTMENT           2
GRAYMOOR-DEVONDALE POLICE DEPT       2
ANCHORAGE POLICE DEPARTMENT          1
WEST BUECHEL POLICE DEPT.            1
NORTHFIELD POLICE DEPARTMENT         1
AUDUBON PARK POLICE DEPARTMENT       1
Name: count, dtype: int64

In [27]:
# LOCAL CODE column
LC = df['LOCAL CODE']
LC.value_counts(dropna=False)
# Not all of these are unique!




LOCAL CODE
8013052741    2
8017098682    2
8010012297    1
8014072189    1
8014076662    1
             ..
8012053577    1
8012052891    1
8012070216    1
8012067992    1
8017103882    1
Name: count, Length: 1271, dtype: int64

In [28]:
# COLLISION STATUS CODE column
CSC = df['COLLISION STATUS CODE']
CSC.value_counts(dropna=False)
# Firgure out that these codes mean


COLLISION STATUS CODE
AC    1256
RE      17
Name: count, dtype: int64

In [29]:
# UNITS INVOLVED column
UI = df["UNITS INVOLVED"]
UI.value_counts(dropna=False)


UNITS INVOLVED
2    1253
3      15
4       3
1       2
Name: count, dtype: int64

In [30]:
# MOTOR VEHICLES INVOLVED column
MVI = df['MOTOR VEHICLES INVOLVED']
MVI.value_counts()


MOTOR VEHICLES INVOLVED
1    1247
2      22
3       4
Name: count, dtype: int64

In [31]:
# KILLED/INJURED/HIT & RUN INDICATOR/SECONDARY COLLISION INDICATOR columns
df["KILLED"].value_counts(dropna=False)

KILLED
0    1262
1      11
Name: count, dtype: int64

In [32]:
df['INJURED'].value_counts(dropna=False)

INJURED
1    863
0    392
2     15
3      2
5      1
Name: count, dtype: int64

In [33]:
df['HIT & RUN INDICATOR'].value_counts(dropna=False)

HIT & RUN INDICATOR
N    1078
Y     195
Name: count, dtype: int64

In [34]:
df['SECONDARY COLLISION INDICATOR'].value_counts(dropna=False)

SECONDARY COLLISION INDICATOR
N      1188
NaN      72
Y        13
Name: count, dtype: int64