### Predicting the Severity of Automobile Accidents in Seattle, Washington ###

In this first week, you will discover your
project objectives, find your dataset that you will use for this capstone project, and publish your
dataset on GitHub.

In the second week, you will build your machine
learning solution.

In the third week,
you will finalize your model and be ready
to submit your work.

To complete capstone,
you will be working on a case study which is to predict the severity
of an accident.
Now, wouldn't it be great if there were something in place that could warn you, 
given the weather and the road conditions,
about the possibility of you getting into a car accident and how severe it would be,
so that you would drive more carefully or even change your travel plans?
Let's use our shared data for Seattle, Washington as an example of how to deal with the accidents data.

In [1]:
# Import packages.
import pandas as pd
import numpy as np
import itertools
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

In [91]:
# NOTE: >>> help(pd.options.display. <TAB>
# pd.options.display.chop_threshold      pd.options.display.float_format        pd.options.display.max_info_columns    pd.options.display.notebook_repr_html
# pd.options.display.colheader_justify   pd.options.display.html                pd.options.display.max_info_rows       pd.options.display.pprint_nest_depth
# pd.options.display.column_space        pd.options.display.large_repr          pd.options.display.max_rows            pd.options.display.precision
# pd.options.display.date_dayfirst       pd.options.display.latex               pd.options.display.max_seq_items       pd.options.display.show_dimensions
# pd.options.display.date_yearfirst      pd.options.display.max_categories      pd.options.display.memory_usage        pd.options.display.unicode
# pd.options.display.encoding            pd.options.display.max_columns         pd.options.display.min_rows            pd.options.display.width
# pd.options.display.expand_frame_repr   pd.options.display.max_colwidth        pd.options.display.multi_sparse        

# Create a list of display options.
list_of_display_options_fully_qualified_names = str(\
"pd.options.display.chop_threshold, pd.options.display.float_format, pd.options.display.max_info_columns, pd.options.display.notebook_repr_html, \
pd.options.display.colheader_justify, pd.options.display.html, pd.options.display.max_info_rows, pd.options.display.pprint_nest_depth, \
pd.options.display.column_space, pd.options.display.large_repr, pd.options.display.max_rows, pd.options.display.precision, \
pd.options.display.date_dayfirst, pd.options.display.latex, pd.options.display.max_seq_items, pd.options.display.show_dimensions, \
pd.options.display.date_yearfirst, pd.options.display.max_categories, pd.options.display.memory_usage, pd.options.display.unicode, \
pd.options.display.encoding, pd.options.display.max_columns, pd.options.display.min_rows, pd.options.display.width, \
pd.options.display.expand_frame_repr, pd.options.display.max_colwidth, pd.options.display.multi_sparse").split(sep=', ')

# Initialize an empty list to store all the short names for display options.
list_of_display_options_short_names = list()
# For each fully qualified option name,
# get the option's short name and add it to the list of short names.
for fully_qualified_option_name in list_of_display_options_fully_qualified_names:
    # Get short option name.
    short_option_name = fully_qualified_option_name.split(sep='.')[-1]
    
    # Add short option name to list of display option short names.
    list_of_display_options_short_names.append(short_option_name)

# Define dictionary of display option settings.
dict_of_display_option_settings_short_names=\
{"max_info_columns": 100,\
"max_info_rows": 200,\
"max_columns": 100,\
"max_rows": 200,\
"precision": 9,\
"max_seq_items": None,\
"show_dimensions": True,\
"max_categories": 1000000,\
"max_colwidth": 300,\
"float_format": lambda x: '%.9f' % x}

# Set pandas display options using dictionary of short names,
# and display the options/value pairs.
print("Setting display options...")
for key in list(dict_of_display_option_settings_short_names.keys()):
    # Set display option.
    pd.set_option(key, dict_of_display_option_settings_short_names[key])
    # Print display option name and value.
    print(key, ": ", pd.get_option(key), sep='')

Setting display options...
max_info_columns: 100
max_info_rows: 200
max_columns: 100
max_rows: 200
precision: 9
max_seq_items: None
show_dimensions: True
max_categories: 1000000
max_colwidth: 300
float_format: <function <lambda> at 0x7f17e88c7040>


In [73]:
# Attribute Information URL: https://www.seattle.gov/Documents/Departments/SDOT/GIS/Collisions_OD.pdf
# Read the Collisions Data CSV file and store it as a DataFrame.
url="http://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv?outSR={%22latestWkid%22:2926,%22wkid%22:2926}"
df=pd.read_csv(url, low_memory=False)

In [92]:
# View the first few rows of the collisions DataFrame.
df.head()

Unnamed: 0,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,LOCATION,EXCEPTRSNCODE,EXCEPTRSNDESC,SEVERITYCODE,SEVERITYDESC,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,INCDATE,INCDTTM,JUNCTIONTYPE,SDOT_COLCODE,SDOT_COLDESC,INATTENTIONIND,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,1273535.0548369,225839.133531317,1,328476,329976,EA08706,Matched,Block,,BROADWAY BETWEEN E COLUMBIA ST AND BOYLSTON AVE,,,1,Property Damage Only Collision,Sideswipe,2,0,0,2,0,0,0,2020/01/22 00:00:00+00,1/22/2020 3:21:00 PM,Mid-Block (not related to intersection),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",,N,Raining,Wet,Dark - Street Lights On,,,,11.0,From same direction - both going straight - both moving - sideswipe,0,0,N
1,1274202.09285358,245094.094895035,2,328142,329642,EA06882,Matched,Block,,8TH AVE NE BETWEEN NE 45TH E ST AND NE 47TH ST,,,1,Property Damage Only Collision,Parked Car,2,0,0,2,0,0,0,2020/01/07 00:00:00+00,1/7/2020 8:00:00 AM,Mid-Block (not related to intersection),15.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, RIGHT SIDE SIDESWIPE",,N,Clear,Dry,Daylight,,,,32.0,One parked--one moving,0,0,Y
2,1271830.51979515,224042.63650547,3,20700,20700,1181833,Unmatched,Block,,JAMES ST BETWEEN 6TH AVE AND 7TH AVE,,,0,Unknown,,0,0,0,0,0,0,0,2004/01/30 00:00:00+00,1/30/2004,Mid-Block (but intersection related),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",,,,,,,4030032.0,,,,0,0,N
3,1272568.5441159,262054.386176392,4,332126,333626,M16001640,Unmatched,Block,,NE NORTHGATE WAY BETWEEN 1ST AVE NE AND NE NORTHGATE DR,,,0,Unknown,,0,0,0,0,0,0,0,2016/01/23 00:00:00+00,1/23/2016,Mid-Block (not related to intersection),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",,,,,,,,,,,0,0,N
4,1280249.22181977,207323.482760355,5,328238,329738,3857118,Unmatched,Block,,M L KING JR ER WAY S BETWEEN S ANGELINE ST AND S EDMUNDS ST,,,0,Unknown,,0,0,0,0,0,0,0,2020/01/26 00:00:00+00,1/26/2020,Mid-Block (not related to intersection),28.0,MOTOR VEHICLE RAN OFF ROAD - HIT FIXED OBJECT,,,,,,,,,,,0,0,N


In [93]:
# Print a concise, technical summary of the collisions DataFrame.
df.info(verbose=True, null_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221389 entries, 0 to 221388
Data columns (total 40 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   X                213918 non-null  float64
 1   Y                213918 non-null  float64
 2   OBJECTID         221389 non-null  int64  
 3   INCKEY           221389 non-null  int64  
 4   COLDETKEY        221389 non-null  int64  
 5   REPORTNO         221389 non-null  object 
 6   STATUS           221389 non-null  object 
 7   ADDRTYPE         217677 non-null  object 
 8   INTKEY           71884 non-null   float64
 9   LOCATION         216801 non-null  object 
 10  EXCEPTRSNCODE    100986 non-null  object 
 11  EXCEPTRSNDESC    11779 non-null   object 
 12  SEVERITYCODE     221388 non-null  object 
 13  SEVERITYDESC     221389 non-null  object 
 14  COLLISIONTYPE    195159 non-null  object 
 15  PERSONCOUNT      221389 non-null  int64  
 16  PEDCOUNT         221389 non-null  int6

<h2 id="data_wrangling">Data Wrangling</h2>

Steps for working with missing data:
<ol>
    <li>Identify missing data.</li>
    <li>Deal with missing data.</li>
    <li>Correct data format.</li>
</ol>

<h3 id="identifying_missing_data">Identifying Missing Data</h3>

The missing values are converted to Python's default. We use Python's built-in functions to identify these missing values. 

In [94]:
# Initialize a list to hold the names of all the columns that are missing data.
list_of_columns_with_missing_data = list()

# For each column in the collisions DataFrame,
# if the Series contains at least one NaN, 
# then add the column name to the list of column names that are missing data.
for column in df.columns.values.tolist():
    if df[column].hasnans:
        list_of_columns_with_missing_data.append(column)

print("Number of columns: %d" % df.columns.size)
print()
print("List of columns labels:")
print(list(df.columns))
print()
print("Number of columns missing data: %d" % len(list_of_columns_with_missing_data))
print()
print("List of columns missing data:")
print(list_of_columns_with_missing_data)

Number of columns: 40

List of columns labels:
['X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO', 'STATUS', 'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE', 'EXCEPTRSNDESC', 'SEVERITYCODE', 'SEVERITYDESC', 'COLLISIONTYPE', 'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INJURIES', 'SERIOUSINJURIES', 'FATALITIES', 'INCDATE', 'INCDTTM', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC', 'INATTENTIONIND', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND', 'PEDROWNOTGRNT', 'SDOTCOLNUM', 'SPEEDING', 'ST_COLCODE', 'ST_COLDESC', 'SEGLANEKEY', 'CROSSWALKKEY', 'HITPARKEDCAR']

Number of columns missing data: 22

List of columns missing data:
['X', 'Y', 'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE', 'EXCEPTRSNDESC', 'SEVERITYCODE', 'COLLISIONTYPE', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC', 'INATTENTIONIND', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND', 'PEDROWNOTGRNT', 'SDOTCOLNUM', 'SPEEDING', 'ST_COLCODE', 'ST_COLDESC']


<h3 id="deal_with_missing_data">Deal with Missing Data</h3>

<ol>
    <li>Drop the Data
        <ol>
            <li>Drop entire row.</li>
            <li>Drop entire column.</li>
        </ol>
    </li>
    <li>Replace the Data
        <ol>
            <li>Replace data by mean.</li>
            <li>Replace data by frequency.</li>
            <li>Replace data based on other functions.</li>
        </ol>
    </li>
        
</ol>

Whole columns should be dropped only if most entries in the column are empty.
If the feature to be predicted, "SEVERITYCODE", is missing from a row,
then that entire row must be dropped from the DataFrame.

In [117]:
### DELETE THIS CELL BEFORE PRODUCTION ###

# Print a list of all the column labels for the collisions DataFrame.
print(list(df.columns))

['X', 'Y', 'OBJECTID', 'INCKEY', 'COLDETKEY', 'REPORTNO', 'STATUS', 'ADDRTYPE', 'INTKEY', 'LOCATION', 'EXCEPTRSNCODE', 'EXCEPTRSNDESC', 'SEVERITYCODE', 'SEVERITYDESC', 'COLLISIONTYPE', 'PERSONCOUNT', 'PEDCOUNT', 'PEDCYLCOUNT', 'VEHCOUNT', 'INJURIES', 'SERIOUSINJURIES', 'FATALITIES', 'INCDATE', 'INCDTTM', 'JUNCTIONTYPE', 'SDOT_COLCODE', 'SDOT_COLDESC', 'INATTENTIONIND', 'UNDERINFL', 'WEATHER', 'ROADCOND', 'LIGHTCOND', 'PEDROWNOTGRNT', 'SDOTCOLNUM', 'SPEEDING', 'ST_COLCODE', 'ST_COLDESC', 'SEGLANEKEY', 'CROSSWALKKEY', 'HITPARKEDCAR']


In [162]:
### DELETE THIS CELL BEFORE PRODUCTION ###

# NOTE: astype(self: ~FrameOrSeries, dtype, copy: bool = True, errors: str = 'raise') -> ~FrameOrSeries 
#    dtype : data type, or dict of column name -> data type
#        Use a numpy.dtype or Python type to cast entire pandas object to
#        the same type. Alternatively, use {col: dtype, ...}, where col is a
#        column label and dtype is a numpy.dtype or Python type to cast one
#        or more of the DataFrame's columns to column-specific types.

# For each column in collision DataFrame:
# (1) print statistical description and relative frequencies of column data;
# (2) cast column to categorical type and print a statistical description and 
#     the relative frequencies of the categorical data in the column.
for column in list(df.columns.values):
    print(df[[column]].describe(include="all"))
    print(column, " (", df[column].dtype, ") " "Relative Frequencies:", sep='')
    print(df[column].value_counts(normalize=True, dropna=False))
    print("As Categorical Type:")
    print(df[[column]].astype(dtype="category").describe(include="all"))
    print(column, " (", df[column].astype(dtype="category").dtype, ") " "Relative Frequencies:", sep='')
    print(df[column].astype(dtype="category").value_counts(normalize=True, dropna=False))
    print()
    print()

                      X
count  213918.000000000
mean  1271146.961145486
std      7361.742580748
min   1249026.115867790
25%   1266676.253613870
50%   1271141.683762450
75%   1276028.910135450
max   1293052.154248880

[8 rows x 1 columns]
X (float64) Relative Frequencies:
nan                 0.033746031
1271306.396978620   0.001337013
1268353.833967600   0.001273776
1271692.215347010   0.001246674
1268385.368431630   0.001219573
                        ...    
1278324.401721810   0.000004517
1269153.197680410   0.000004517
1264316.899115460   0.000004517
1282793.204798370   0.000004517
1274789.992118180   0.000004517
Name: X, Length: 24973, dtype: float64
As Categorical Type:
                       X
count   213918.000000000
unique   24972.000000000
top    1271306.396978620
freq       296.000000000

[4 rows x 1 columns]
X (category) Relative Frequencies:
nan                 0.033746031
1271306.396978620   0.001337013
1268353.833967600   0.001273776
1271692.215347010   0.001246674
126838

In [163]:
# Drop any column from the collisions DataFrame if it satisfies at least one of the following conditions:

# 1) the column contains only identification keys or codes with no predictive value;
# 2) the column's data is does not fit into a small (<) categories, such as an address or location description; or
# 3) a significant proportion (>15%) of the data is NaN; or
# 4) it is not clear how to interperet the data.
list_of_columns_to_drop = ["X",\
                           "Y",\
                           "OBJECTID",\
                           "INCKEY",\
                           "COLDETKEY",\
                           "REPORTNO",\
                           "INTKEY",\
                           "EXCEPTRSNCODE",\
                           "EXCEPTRSNDESC",\
                           "INATTENTIONIND",\
                           "PEDROWNOTGRNT",\
                           "SDOTCOLNUM",\
                           "SPEEDING",\
                          ]

SyntaxError: unexpected character after line continuation character (<ipython-input-163-025bf135737a>, line 9)

In [120]:
#NOTE: drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
# Drop the selected columns from the collisions DataFrame
# and store the result in a new DataFrame.
df_after_drop_columns = df.drop(columns=list_of_columns_to_drop, inplace=False)

In [121]:
# Print the first few rows of the DataFrame after dropping columns.
df_after_drop_columns.head()

Unnamed: 0,STATUS,ADDRTYPE,SEVERITYCODE,SEVERITYDESC,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,INCDATE,INCDTTM,JUNCTIONTYPE,SDOT_COLCODE,SDOT_COLDESC,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,Matched,Block,1,Property Damage Only Collision,Sideswipe,2,0,0,2,0,0,0,2020/01/22 00:00:00+00,1/22/2020 3:21:00 PM,Mid-Block (not related to intersection),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",N,Raining,Wet,Dark - Street Lights On,11.0,From same direction - both going straight - both moving - sideswipe,0,0,N
1,Matched,Block,1,Property Damage Only Collision,Parked Car,2,0,0,2,0,0,0,2020/01/07 00:00:00+00,1/7/2020 8:00:00 AM,Mid-Block (not related to intersection),15.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, RIGHT SIDE SIDESWIPE",N,Clear,Dry,Daylight,32.0,One parked--one moving,0,0,Y
2,Unmatched,Block,0,Unknown,,0,0,0,0,0,0,0,2004/01/30 00:00:00+00,1/30/2004,Mid-Block (but intersection related),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",,,,,,,0,0,N
3,Unmatched,Block,0,Unknown,,0,0,0,0,0,0,0,2016/01/23 00:00:00+00,1/23/2016,Mid-Block (not related to intersection),11.0,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",,,,,,,0,0,N
4,Unmatched,Block,0,Unknown,,0,0,0,0,0,0,0,2020/01/26 00:00:00+00,1/26/2020,Mid-Block (not related to intersection),28.0,MOTOR VEHICLE RAN OFF ROAD - HIT FIXED OBJECT,,,,,,,0,0,N


In [122]:
# Print a concise, technical summary of the collisions DataFrame.
df_after_drop_columns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221389 entries, 0 to 221388
Data columns (total 26 columns):
 #   Column           Dtype  
---  ------           -----  
 0   STATUS           object 
 1   ADDRTYPE         object 
 2   SEVERITYCODE     object 
 3   SEVERITYDESC     object 
 4   COLLISIONTYPE    object 
 5   PERSONCOUNT      int64  
 6   PEDCOUNT         int64  
 7   PEDCYLCOUNT      int64  
 8   VEHCOUNT         int64  
 9   INJURIES         int64  
 10  SERIOUSINJURIES  int64  
 11  FATALITIES       int64  
 12  INCDATE          object 
 13  INCDTTM          object 
 14  JUNCTIONTYPE     object 
 15  SDOT_COLCODE     float64
 16  SDOT_COLDESC     object 
 17  UNDERINFL        object 
 18  WEATHER          object 
 19  ROADCOND         object 
 20  LIGHTCOND        object 
 21  ST_COLCODE       object 
 22  ST_COLDESC       object 
 23  SEGLANEKEY       int64  
 24  CROSSWALKKEY     int64  
 25  HITPARKEDCAR     object 
dtypes: float64(1), int64(9), object(16)
memory u

In [123]:
# For each column in DataFrame after dropping columns,
# print the relative frequencies of values and a description
# of the columns data.
for column in df_after_drop_columns.columns:
    print("Relative frequency:")
    print(df_after_drop_columns[column].value_counts(normalize=True, dropna=False))
    #print(df_after_drop_columns[column].describe(include="all"))
    print()

Relative frequency:
Matched     0.881850498
Unmatched   0.118149502
Name: STATUS, Length: 2, dtype: float64

Relative frequency:
Block          0.654580851
Intersection   0.324695446
NaN            0.016766867
Alley          0.003956836
Name: ADDRTYPE, Length: 4, dtype: float64

Relative frequency:
1     0.621512361
2     0.265356454
0     0.097538721
2b    0.014011536
3     0.001576411
NaN   0.000004517
Name: SEVERITYCODE, Length: 6, dtype: float64

Relative frequency:
Property Damage Only Collision   0.621512361
Injury Collision                 0.265356454
Unknown                          0.097543238
Serious Injury Collision         0.014011536
Fatality Collision               0.001576411
Name: SEVERITYDESC, Length: 5, dtype: float64

Relative frequency:
Parked Car   0.219256603
Angles       0.160613219
Rear Ended   0.156660900
NaN          0.118479238
Other        0.111053395
Sideswipe    0.085311375
Left Turn    0.063743004
Pedestrian   0.034622316
Cycles       0.026767364
Right Tu

In [124]:
# dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
# Drop any row that contains at least one NaN.
print("Number of columns: %d" % list(df_after_drop_columns.columns).__len__())
df_after_drop_columns_and_rows = df_after_drop_columns.dropna(axis="index", how="any", thresh=None, subset=None, inplace=False)

Number of columns: 26


In [125]:
# NOTE: info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None) -> None

df_after_drop_columns_and_rows.info(verbose=True, null_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 188202 entries, 0 to 221388
Data columns (total 26 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   STATUS           188202 non-null  object 
 1   ADDRTYPE         188202 non-null  object 
 2   SEVERITYCODE     188202 non-null  object 
 3   SEVERITYDESC     188202 non-null  object 
 4   COLLISIONTYPE    188202 non-null  object 
 5   PERSONCOUNT      188202 non-null  int64  
 6   PEDCOUNT         188202 non-null  int64  
 7   PEDCYLCOUNT      188202 non-null  int64  
 8   VEHCOUNT         188202 non-null  int64  
 9   INJURIES         188202 non-null  int64  
 10  SERIOUSINJURIES  188202 non-null  int64  
 11  FATALITIES       188202 non-null  int64  
 12  INCDATE          188202 non-null  object 
 13  INCDTTM          188202 non-null  object 
 14  JUNCTIONTYPE     188202 non-null  object 
 15  SDOT_COLCODE     188202 non-null  float64
 16  SDOT_COLDESC     188202 non-null  obje

In [126]:
df_after_drop_columns_and_rows.describe(include="all")

Unnamed: 0,STATUS,ADDRTYPE,SEVERITYCODE,SEVERITYDESC,COLLISIONTYPE,PERSONCOUNT,PEDCOUNT,PEDCYLCOUNT,VEHCOUNT,INJURIES,SERIOUSINJURIES,FATALITIES,INCDATE,INCDTTM,JUNCTIONTYPE,SDOT_COLCODE,SDOT_COLDESC,UNDERINFL,WEATHER,ROADCOND,LIGHTCOND,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
count,188202,188202,188202.0,188202,188202,188202.0,188202.0,188202.0,188202.0,188202.0,188202.0,188202.0,188202,188202,188202,188202.0,188202,188202,188202,188202,188202,188202.0,188202,188202.0,188202.0,188202
unique,2,3,5.0,5,10,,,,,,,,6073,159287,7,,39,4,11,9,9,62.0,62,,,2
top,Matched,Block,1.0,Property Damage Only Collision,Parked Car,,,,,,,,2006/11/02 00:00:00+00,11/2/2006,Mid-Block (not related to intersection),,"MOTOR VEHICLE STRUCK MOTOR VEHICLE, FRONT END AT ANGLE",N,Clear,Dry,Daylight,32.0,One parked--one moving,,,N
freq,188201,122252,127532.0,127532,43649,,,,,,,,100,100,88987,,85016,98484,112573,126115,117110,40096.0,40096,,,182231
mean,,,,,,2.478799375,0.044271581,0.031753116,1.965159775,0.429947609,0.017672501,0.001918152,,,,14.476987492,,,,,,,,297.699461217,11000.292122294,
std,,,,,,1.388957476,0.216913453,0.176821207,0.57435554,0.771539217,0.170533827,0.04770533,,,,6.635152101,,,,,,,,3483.854669686,76498.535709304,
min,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,0.0,,,,,,,,0.0,0.0,
25%,,,,,,2.0,0.0,0.0,2.0,0.0,0.0,0.0,,,,11.0,,,,,,,,0.0,0.0,
50%,,,,,,2.0,0.0,0.0,2.0,0.0,0.0,0.0,,,,13.0,,,,,,,,0.0,0.0,
75%,,,,,,3.0,0.0,0.0,2.0,1.0,0.0,0.0,,,,14.0,,,,,,,,0.0,0.0,


<h4>Count the Missing Values in each Column</h4>
<p>
We use a for loop to count the number of missing ("True") values in each column of the collisions DataFrame.
</p>