In [1]:
%%latex
\tableofcontents

<IPython.core.display.Latex object>

# readme
- Most of our other Jupyter Notebooks have a main() function at the bottom that runs everything.  
- This notebook is structured differently, with several functions that run in sequence.  
- The reason for the difference is that part of the work has to be done outside this notebook.  
- The IVEware imputation software is available in several languages, but not Python.  We ran it in R using scrlib.  
- This notebook prepares the data for the Mode, Random Forest, and IVEware imputations, and does the first two.  Then the user must separately run the IVEware software.  Finally, this notebook pulls in those results and compares the three methods.  

# Methods

- We have the discretized CRSS dataset in '../../Big_Files/CRSS_Binned.csv'
- MissForest is a round-robin imputation method implemented in R, generally considered one of the best imputation methods.  It has several Python implementations.
- We tried to use MissForest, https://pypi.org/project/MissForest/, to impute missing values, but it gave us errors, and finding the source of the errors led us down the path to write our own round-robin implementation.
- We compare here three methods:
    - Round-Robin Random Forest 
        - Our own implementation of Round Robin, using scikit-learn's random forest
        - Using imputation by mode as the starting point
    - Imputation by mode
    - IVEware, using the hyperparameters in the CRSS Imputation report
- To compare, we followed the example for MissForest.
    - We dropped all samples with a missing value, so we would have ground truth.
    - We erased ~15% of the values in each sample.
    - We used each imputation method to impute the missing values, and, for each feature, counted how many did not match the ground truth.
- Our round-robin method
    - In data_NaN, change all of the 'Unknown' to np.NaN.
    - In each feature, count the number of unknown samples.
    - In another copy, data_Mode, impute by mode in all of the features.
    - Starting with the feature with the least (nonzero) number of missing samples:
        - Copy that feature from data_NaN into data_Mode, so that only that feature has missing values.
        - Separate the dataframe into two, one with known values in the target variable (X) and one with unknown values (Z).
        - From the dataframe with known values (X), separate out the target variable (call it 'y')
        - Using Random Forest, build a model that maps X to y.  
        - Use the model to impute the missing values
    - At each iteration we replace the mode-imputed values with RF-imputed values.
- The IVEware implementation is available in several platforms, but Python is not one of them.  We run it in R outside this notebook.  Be aware that the random selection of values to erase is different for each run, so the IVEware imputation must be run anew. 

- Once we had analyzed the results and decided that the Random Forest method is best for our work, we implemented it and saved the results to CRSS_Imputed.csv.

# Results of Comparison of Three Imputation Methods

- We ran the imputation on 78 features with 250,389 samples.  
    - The features are the features of the CRSS dataset that are have data for all of 2016 - 2021, are not the results of imputation by CRSS, may have a pattern (not random numbers like VIN numbers), and that do not have more than 20% of the samples missing.  
    - The features were discretized (binned) down to 2-10 categories before imputation.
    - The samples are those of the 713,566 that have no missing values in any of the 78 features.

- First Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.95% |
| Mode Imputation | 28.85% |
| IVEware | 26.14% |

    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  50 | 28 | 0 | 78 |
Compare RF to IVEware | 47 | 0 | 31 | 78 |
Compare Mode to IVEware | 39 | 0 | 39 |  78 |



    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,720,350|
|RF Different from Mode|  301,605|
|RF Different from IVEware|  1,255,018|
|Mode Different from IVEware| 1,412,826|


- Second Run

     - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.76% |
| Mode Imputation | 28.82% |
| IVEware | 28.41% |


    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  48 | 30 | 0 | 78 |
Compare RF to IVEware | 52 | 0 | 26 | 78 |
Compare Mode to IVEware | 43 | 0 | 35 |  78 |



    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,719,688|
|RF Different from Mode|  307,544|
|RF Different from IVEware|  811,484|
|Mode Different from IVEware| 960,765|





## Discussion

- Random Forest is as good or better than Mode for (nearly) every feature.
- Random Forest is as good or better than IVEware on more than half of the features, but not overwhelmingly, and slightly better in the count of missing samples correctly imputed.
- IVEware and Mode are comparable in the number of features, but IVEware is much better in the count of missing samples correctly imputed.
- Random Forest and Mode make the same mistakes.  
- IVEware makes different mistakes from Random Forest and Mode.

## Conclusion

- Use Random Forest

# Setup
## Import Libraries

In [1]:
import sys, copy, math, time, os

print ('Python version: {}'.format(sys.version))

import numpy as np
print ('NumPy version: {}'.format(np.__version__))
np.set_printoptions(suppress=True)


import pandas as pd
print ('Pandas version:  {}'.format(pd.__version__))
pd.set_option('display.max_rows', 500)

import sklearn
print ('SciKit-Learn version: {}'.format(sklearn.__version__))
from sklearn.model_selection import train_test_split

import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

from sklearn.experimental import enable_iterative_imputer 
from sklearn.impute import IterativeImputer, SimpleImputer

from missforest.missforest import MissForest

# Set Randomness.  Copied from https://www.kaggle.com/code/abazdyrev/keras-nn-focal-loss-experiments
import random
#np.random.seed(42) # NumPy
#random.seed(42) # Python
#tf.set_random_seed(42) # Tensorflow

from IPython.display import Audio
sound_file = './beep.wav'

import warnings
warnings.filterwarnings('ignore')

print ('Finished Importing Libraries')


Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:13) [Clang 14.0.6 ]
NumPy version: 1.24.0
Pandas version:  1.5.2
SciKit-Learn version: 1.2.0


ModuleNotFoundError: No module named 'missforest.missforest'

## Get Data
- Drop features imputed by CRSS
- Drop the correlation features, CASENUM, VEH_NO, and PER_NO, which exist to let us correlate the data in the ACCIDENT, VEHICLE, and PERSON datasets, which we have already done

In [3]:
def Get_Data():
    print ('Get_Data')
    data = pd.read_csv('../../Big_Files/CRSS_Binned_Data.csv', low_memory=False)
    print ('data.shape = ', data.shape)
    print ()

    print ('data.shape = ', data.shape)
    print ()
    
    print ("Features:")
    Features = sorted(list(data.columns))
    for feature in Features:
        print ("    ",feature)
    
    return data

In [4]:
data = Get_Data()


Get_Data
data.shape =  (817623, 67)

data.shape =  (817623, 67)

Features:
     ACC_TYPE
     AGE
     AIR_BAG
     ALC_STATUS
     BODY_TYP
     CARGO_BT
     DAY_WEEK
     DEFORMED
     DR_ZIP
     EJECTION
     HARM_EV
     HIT_RUN
     HOSPITAL
     HOUR
     IMPACT1
     INJ_SEV
     INT_HWY
     J_KNIFE
     LGT_COND
     MAKE
     MAK_MOD
     MAN_COLL
     MAX_SEV
     MAX_VSEV
     MODEL
     MONTH
     M_HARM
     NUMOCCS
     NUM_INJ
     NUM_INJV
     PCRASH4
     PCRASH5
     PERMVIT
     PER_TYP
     PJ
     PSU
     PVH_INVL
     P_CRASH1
     P_CRASH2
     REGION
     RELJCT1
     RELJCT2
     REL_ROAD
     REST_MIS
     REST_USE
     ROLINLOC
     ROLLOVER
     SEAT_POS
     SEX
     SPEC_USE
     SPEEDREL
     TOWED
     TOW_VEH
     TYP_INT
     URBANICITY
     VALIGN
     VEH_AGE
     VE_FORMS
     VE_TOTAL
     VPROFILE
     VSPD_LIM
     VSURCOND
     VTCONT_F
     VTRAFCON
     VTRAFWAY
     WEATHER
     WRK_ZONE


## Tools

In [13]:
def Impute_Round_Robin(data):
    print ('Impute()')
    pd.set_option('display.max_columns', None)
    
    # Replace 99 with np.NaN
    data.replace({99: np.nan}, inplace=True)
    display(data.head(20))
    print ()
    
    # Make a list of features with missing samples, 
    #     ordered by the number of missing samples, 
    #     from least to most.  
    Missing = []
    Complete = []
    for feature in data:
        s = data[feature].isna().sum()
        if s==0:
            Complete.append([feature, s])
        if s>0:
            Missing.append([feature, s])
    Missing = sorted (Missing, key=lambda x:x[1], reverse=False)
    print ()
    print ('Complete[]')
    display(Complete)
    print ()
    print ('Missing[]')
    display(Missing)
    print ()
    
    print ('Make data_Mode')
    print ()
    data_Mode = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Mode[feature] = data[feature]
    for M in Missing:
        feature = M[0]
        m = data[feature].mode()[0]
        print (feature, M[1], m)
        data_Mode[feature] = data[feature].fillna(m)
    print ('data_Mode')
    display(data_Mode.head(20))

    print ()
    print ('Make starting point for data_Imputed')
    data_Imputed = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Imputed[feature] = data[feature]
    for X in Missing:
        feature = X[0]
        data_Imputed[feature] = data_Mode[feature]
    print ('data_Imputed')
    display(data_Imputed.head(20))
    print ()

    print ('Start Loop')
    print ()
    n = 0
    for M in Missing:
        n += 1
        print (M)
        feature = M[0]
        print (feature)
        data_Imputed[feature] = data[feature]
#        print ()
#        print ('data[feature].isna().sum()')
#        print (data[feature].isna().sum())
#        print ('data_Imputed[feature].isna().sum()')
#        print (data_Imputed[feature].isna().sum())
#        print ()
        W = data_Imputed.dropna(subset=[feature])
        X = data_Imputed.dropna(subset=[feature])
        y = X[feature]
        X.drop(columns=feature, inplace=True)
        Z = data_Imputed[data_Imputed[feature].isna()]
        Z.drop(columns=feature, inplace=True)
#        Z.reset_index(drop=True, inplace=True)
#        print (data.shape)
#        print (X.shape)
#        display(X.head(40))
#        display(y.head(40))
#        print (Z.shape)
#        display(Z)
        clf = RandomForestClassifier(max_depth=2, random_state=0)
        clf.fit(X,y)
#        print ('clf.predict(Z)')
        z = clf.predict(Z)
        print (len(z))
        display(z)
        Z[feature] = z
#        display(Z)
        data_Imputed = pd.concat([Z, W])
#        display(data_Imputed.head(60))
        print (data_Imputed.shape)
        print ()
#        data_Imputed.sort_values(
#            by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#            ascending = [True, True, True], 
#            inplace=True
#        )
#        print ()
#        print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#        print (data.PER_NO.equals(data_Imputed.PER_NO))
#        print ()
               
        Check_Feature(data, data_Imputed, feature)
#        if n==10:
#            return data_Imputed
    
    
    
    
    print ()
    return data_Imputed

In [14]:
def Impute_Full(data):
    print ('Impute()')
    data.replace({'Unknown': np.nan}, inplace=True)
    for feature in data:
        print (feature, len(pd.unique(data[feature])))
    print ()
    mf = MissForest()
    data = mf.fit_transform(data)
    return data

In [15]:
def Check(data, data_Imputed):
    Features = data.columns
    print (Features)
    for feature in Features:
        U = pd.unique(data[feature]).tolist()
        print (U)
        A = []
        for u in U:
            a = len(data[data[feature]==u])
            b = len(data_Imputed[data_Imputed[feature]==u])
            A.append([u, a, b])
        display(A)
        print ()


In [16]:
def Check_Feature(data, data_Imputed, feature):
    U = pd.unique(data[feature]).tolist()
    U = [x for x in U if x == x]
    print (U)
    A = []
    for u in U:
        a = len(data[data[feature]==u])
        b = len(data_Imputed[data_Imputed[feature]==u])
        A.append([u, a, b, b-a])
    a = data[feature].isna().sum()
    b = data_Imputed[feature].isna().sum()
    A.append(['NaN', a, b, 0])
    A = pd.DataFrame(A, columns=['Value', 'Original', 'Imputed', 'Difference'])
    display(A)
    print ()


# Impute using Random Forest and Save for Next Step

In [17]:
def Impute_Using_Random_Forest():
    data = Get_Data()
    
#    data_Imputed = Impute_Full(data)
    data_Imputed = Impute_Round_Robin(data)
    data_Imputed.to_csv('../../Big_Files/CRSS_Imputed.csv', index=False)
#    display(data_Imputed.head(50))
    
    Check(data, data_Imputed)
#    display(Audio(sound_file, autoplay=True))
    return 0



In [None]:
def Impute_Using_Miss_Forest():
    data = Get_Data()
    
    imp = IterativeImputer(
        estimator=RandomForestClassifier, 
        # estimator=None, 
        *, # dunno what this is 
        missing_values=99, 
        sample_posterior=False, 
        max_iter=10, 
        tol=0.001, 
        n_nearest_features=None, 
        initial_strategy='most_frequent', # default is 'mean'
        fill_value=None, 
        imputation_order='ascending', 
        skip_complete=False, 
        min_value=0, # default value is -inf
        max_value=9, # default value is inf
        verbose=0, 
        random_state=None, 
        add_indicator=False, 
        keep_empty_features=False
    )
    

In [18]:
%%time
Impute_Using_Random_Forest()
Impute_Using_Miss_Forest()

Get_Data
data.shape =  (817623, 67)

data.shape =  (817623, 67)

Features:
     ACC_TYPE
     AGE
     AIR_BAG
     ALC_STATUS
     BODY_TYP
     CARGO_BT
     DAY_WEEK
     DEFORMED
     DR_ZIP
     EJECTION
     HARM_EV
     HIT_RUN
     HOSPITAL
     HOUR
     IMPACT1
     INJ_SEV
     INT_HWY
     J_KNIFE
     LGT_COND
     MAKE
     MAK_MOD
     MAN_COLL
     MAX_SEV
     MAX_VSEV
     MODEL
     MONTH
     M_HARM
     NUMOCCS
     NUM_INJ
     NUM_INJV
     PCRASH4
     PCRASH5
     PERMVIT
     PER_TYP
     PJ
     PSU
     PVH_INVL
     P_CRASH1
     P_CRASH2
     REGION
     RELJCT1
     RELJCT2
     REL_ROAD
     REST_MIS
     REST_USE
     ROLINLOC
     ROLLOVER
     SEAT_POS
     SEX
     SPEC_USE
     SPEEDREL
     TOWED
     TOW_VEH
     TYP_INT
     URBANICITY
     VALIGN
     VEH_AGE
     VE_FORMS
     VE_TOTAL
     VPROFILE
     VSPD_LIM
     VSURCOND
     VTCONT_F
     VTRAFCON
     VTRAFWAY
     WEATHER
     WRK_ZONE
Impute()


Unnamed: 0,HOSPITAL,ACC_TYPE,AGE,AIR_BAG,ALC_STATUS,BODY_TYP,CARGO_BT,DAY_WEEK,DEFORMED,DR_ZIP,EJECTION,HARM_EV,HIT_RUN,HOUR,IMPACT1,INJ_SEV,INT_HWY,J_KNIFE,LGT_COND,MAKE,MAK_MOD,MAN_COLL,MAX_SEV,MAX_VSEV,MODEL,MONTH,M_HARM,NUMOCCS,NUM_INJ,NUM_INJV,PCRASH4,PCRASH5,PERMVIT,PER_TYP,PJ,PSU,PVH_INVL,P_CRASH1,P_CRASH2,REGION,RELJCT1,RELJCT2,REL_ROAD,REST_MIS,REST_USE,ROLINLOC,ROLLOVER,SEAT_POS,SEX,SPEC_USE,SPEEDREL,TOWED,TOW_VEH,TYP_INT,URBANICITY,VALIGN,VEH_AGE,VE_FORMS,VE_TOTAL,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,WEATHER,WRK_ZONE
0,0,9.0,9.0,4.0,,2.0,0.0,1,2.0,8.0,0.0,3.0,0.0,4.0,4.0,3.0,0.0,0,3.0,2.0,4.0,4.0,3.0,3.0,3.0,0,2.0,0.0,0.0,0.0,2.0,2.0,5,1.0,9,9,0,5.0,8.0,3,0.0,1.0,2.0,2,1.0,2.0,2,2.0,1.0,1.0,2.0,4.0,0.0,0.0,0,2.0,6.0,1,1,3.0,2.0,0.0,,,,0.0,0
1,0,8.0,6.0,4.0,,3.0,0.0,1,2.0,8.0,0.0,3.0,0.0,4.0,0.0,3.0,0.0,0,3.0,6.0,4.0,4.0,3.0,3.0,6.0,0,2.0,0.0,0.0,0.0,2.0,1.0,5,1.0,9,9,0,1.0,8.0,3,0.0,1.0,2.0,2,1.0,2.0,2,2.0,1.0,1.0,2.0,4.0,0.0,0.0,0,2.0,1.0,1,1,3.0,2.0,0.0,,,,0.0,0
2,0,5.0,3.0,1.0,,7.0,0.0,1,0.0,8.0,0.0,3.0,0.0,6.0,1.0,2.0,0.0,0,1.0,7.0,6.0,0.0,2.0,2.0,7.0,0,2.0,0.0,4.0,1.0,2.0,1.0,3,1.0,9,9,0,2.0,6.0,3,0.0,0.0,2.0,2,1.0,2.0,2,2.0,1.0,1.0,2.0,0.0,0.0,,0,2.0,9.0,1,1,2.0,2.0,0.0,,,,0.0,0
3,0,3.0,4.0,1.0,,2.0,0.0,1,0.0,8.0,0.0,3.0,0.0,6.0,1.0,2.0,0.0,0,1.0,6.0,3.0,0.0,2.0,2.0,5.0,0,2.0,2.0,4.0,3.0,2.0,1.0,3,1.0,9,9,0,1.0,3.0,3,0.0,0.0,2.0,2,1.0,2.0,2,2.0,0.0,1.0,2.0,0.0,0.0,,0,2.0,5.0,1,1,2.0,2.0,0.0,,,,0.0,0
4,0,3.0,4.0,3.0,2.0,2.0,0.0,1,0.0,8.0,0.0,3.0,0.0,6.0,1.0,2.0,0.0,0,1.0,6.0,3.0,0.0,2.0,2.0,5.0,0,2.0,2.0,4.0,3.0,2.0,1.0,3,0.0,9,9,0,1.0,3.0,3,0.0,0.0,2.0,2,1.0,2.0,2,0.0,1.0,1.0,2.0,0.0,0.0,,0,2.0,5.0,1,1,2.0,2.0,0.0,,,,0.0,0
5,0,3.0,3.0,3.0,2.0,2.0,0.0,1,0.0,8.0,0.0,3.0,0.0,6.0,1.0,2.0,0.0,0,1.0,6.0,3.0,0.0,2.0,2.0,5.0,0,2.0,2.0,4.0,3.0,2.0,1.0,3,0.0,9,9,0,1.0,3.0,3,0.0,0.0,2.0,2,1.0,2.0,2,3.0,1.0,1.0,2.0,0.0,0.0,,0,2.0,5.0,1,1,2.0,2.0,0.0,,,,0.0,0
6,1,2.0,6.0,4.0,,2.0,0.0,2,0.0,8.0,1.0,2.0,0.0,2.0,2.0,0.0,0.0,0,1.0,1.0,1.0,1.0,0.0,0.0,3.0,0,0.0,0.0,1.0,1.0,0.0,0.0,0,1.0,9,9,0,1.0,0.0,3,0.0,1.0,0.0,2,0.0,1.0,0,2.0,0.0,1.0,2.0,0.0,0.0,0.0,0,2.0,6.0,0,0,3.0,5.0,0.0,,,1.0,0.0,0
7,0,0.0,,,2.0,,0.0,4,,,0.0,3.0,1.0,6.0,1.0,,0.0,0,1.0,,,0.0,3.0,,,0,2.0,,0.0,,,,5,1.0,9,9,0,1.0,6.0,3,0.0,0.0,2.0,2,,2.0,2,2.0,,1.0,2.0,4.0,0.0,,0,2.0,,1,1,2.0,3.0,1.0,,,,0.0,0
8,0,0.0,5.0,4.0,2.0,2.0,0.0,4,4.0,8.0,0.0,3.0,0.0,6.0,1.0,3.0,0.0,0,1.0,6.0,3.0,0.0,3.0,3.0,2.0,0,2.0,0.0,0.0,0.0,2.0,1.0,5,1.0,9,9,0,1.0,3.0,3,0.0,0.0,2.0,2,1.0,2.0,2,2.0,0.0,1.0,2.0,4.0,0.0,,0,2.0,5.0,1,1,2.0,3.0,1.0,,,,0.0,0
9,0,8.0,7.0,,2.0,7.0,0.0,3,3.0,5.0,0.0,3.0,0.0,5.0,1.0,3.0,0.0,2,3.0,6.0,6.0,3.0,3.0,3.0,7.0,0,2.0,0.0,0.0,0.0,2.0,1.0,4,1.0,5,2,0,1.0,9.0,3,0.0,1.0,2.0,2,1.0,2.0,2,2.0,1.0,1.0,1.0,4.0,2.0,0.0,0,2.0,9.0,1,1,0.0,3.0,0.0,1.0,0.0,1.0,0.0,0




Complete[]


[['HOSPITAL', 0],
 ['DAY_WEEK', 0],
 ['J_KNIFE', 0],
 ['MONTH', 0],
 ['PERMVIT', 0],
 ['PJ', 0],
 ['PSU', 0],
 ['PVH_INVL', 0],
 ['REGION', 0],
 ['REST_MIS', 0],
 ['ROLLOVER', 0],
 ['URBANICITY', 0],
 ['VE_FORMS', 0],
 ['VE_TOTAL', 0],
 ['WRK_ZONE', 0]]


Missing[]


[['HIT_RUN', 22],
 ['INT_HWY', 74],
 ['PER_TYP', 155],
 ['REL_ROAD', 198],
 ['HARM_EV', 300],
 ['M_HARM', 329],
 ['ROLINLOC', 415],
 ['TOW_VEH', 950],
 ['HOUR', 2553],
 ['PCRASH5', 2886],
 ['MAN_COLL', 3055],
 ['LGT_COND', 3886],
 ['MAX_SEV', 5917],
 ['NUM_INJ', 5917],
 ['MAK_MOD', 9650],
 ['P_CRASH1', 11721],
 ['SPEC_USE', 11944],
 ['SEAT_POS', 12478],
 ['IMPACT1', 12983],
 ['SPEEDREL', 14440],
 ['P_CRASH2', 15057],
 ['MAX_VSEV', 20225],
 ['NUM_INJV', 20225],
 ['VEH_AGE', 21546],
 ['INJ_SEV', 22070],
 ['MAKE', 22473],
 ['CARGO_BT', 24182],
 ['MODEL', 26805],
 ['NUMOCCS', 28035],
 ['WEATHER', 29295],
 ['BODY_TYP', 29722],
 ['SEX', 30276],
 ['PCRASH4', 33775],
 ['VSURCOND', 34537],
 ['EJECTION', 40329],
 ['TOWED', 40553],
 ['RELJCT2', 41803],
 ['VALIGN', 44487],
 ['AGE', 46539],
 ['DR_ZIP', 60921],
 ['AIR_BAG', 65897],
 ['REST_USE', 68246],
 ['VTCONT_F', 70044],
 ['TYP_INT', 72438],
 ['VTRAFCON', 72727],
 ['ACC_TYPE', 80816],
 ['VSPD_LIM', 106127],
 ['VPROFILE', 111013],
 ['VTRAFWAY', 1


Make data_Mode

HIT_RUN 22 0.0
INT_HWY 74 0.0
PER_TYP 155 1.0
REL_ROAD 198 2.0
HARM_EV 300 3.0
M_HARM 329 2.0
ROLINLOC 415 2.0
TOW_VEH 950 0.0
HOUR 2553 5.0
PCRASH5 2886 1.0
MAN_COLL 3055 3.0
LGT_COND 3886 3.0
MAX_SEV 5917 3.0
NUM_INJ 5917 0.0
MAK_MOD 9650 3.0
P_CRASH1 11721 1.0
SPEC_USE 11944 1.0
SEAT_POS 12478 2.0
IMPACT1 12983 1.0
SPEEDREL 14440 2.0
P_CRASH2 15057 8.0
MAX_VSEV 20225 3.0
NUM_INJV 20225 0.0
VEH_AGE 21546 1.0
INJ_SEV 22070 3.0
MAKE 22473 6.0
CARGO_BT 24182 0.0
MODEL 26805 3.0
NUMOCCS 28035 0.0
WEATHER 29295 0.0
BODY_TYP 29722 2.0
SEX 30276 1.0
PCRASH4 33775 2.0
VSURCOND 34537 0.0
EJECTION 40329 0.0
TOWED 40553 4.0
RELJCT2 41803 1.0
VALIGN 44487 2.0
AGE 46539 6.0
DR_ZIP 60921 5.0
AIR_BAG 65897 4.0
REST_USE 68246 1.0
VTCONT_F 70044 1.0
TYP_INT 72438 0.0
VTRAFCON 72727 0.0
ACC_TYPE 80816 8.0
VSPD_LIM 106127 3.0
VPROFILE 111013 3.0
VTRAFWAY 129908 0.0
ALC_STATUS 135050 2.0
DEFORMED 142194 0.0
RELJCT1 151563 0.0
data_Mode


Unnamed: 0,HOSPITAL,DAY_WEEK,J_KNIFE,MONTH,PERMVIT,PJ,PSU,PVH_INVL,REGION,REST_MIS,ROLLOVER,URBANICITY,VE_FORMS,VE_TOTAL,WRK_ZONE,HIT_RUN,INT_HWY,PER_TYP,REL_ROAD,HARM_EV,M_HARM,ROLINLOC,TOW_VEH,HOUR,PCRASH5,MAN_COLL,LGT_COND,MAX_SEV,NUM_INJ,MAK_MOD,P_CRASH1,SPEC_USE,SEAT_POS,IMPACT1,SPEEDREL,P_CRASH2,MAX_VSEV,NUM_INJV,VEH_AGE,INJ_SEV,MAKE,CARGO_BT,MODEL,NUMOCCS,WEATHER,BODY_TYP,SEX,PCRASH4,VSURCOND,EJECTION,TOWED,RELJCT2,VALIGN,AGE,DR_ZIP,AIR_BAG,REST_USE,VTCONT_F,TYP_INT,VTRAFCON,ACC_TYPE,VSPD_LIM,VPROFILE,VTRAFWAY,ALC_STATUS,DEFORMED,RELJCT1
0,0,1,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,4.0,2.0,4.0,3.0,3.0,0.0,4.0,5.0,1.0,2.0,4.0,2.0,8.0,3.0,0.0,6.0,3.0,2.0,0.0,3.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,9.0,8.0,4.0,1.0,1.0,0.0,0.0,9.0,2.0,3.0,0.0,2.0,2.0,0.0
1,0,1,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,4.0,1.0,4.0,3.0,3.0,0.0,4.0,1.0,1.0,2.0,0.0,2.0,8.0,3.0,0.0,1.0,3.0,6.0,0.0,6.0,0.0,0.0,3.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,6.0,8.0,4.0,1.0,1.0,0.0,0.0,8.0,2.0,3.0,0.0,2.0,2.0,0.0
2,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,6.0,2.0,1.0,2.0,1.0,2.0,6.0,2.0,1.0,9.0,2.0,7.0,0.0,7.0,0.0,0.0,7.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,3.0,8.0,1.0,1.0,1.0,0.0,0.0,5.0,2.0,2.0,0.0,2.0,0.0,0.0
3,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,2.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,4.0,8.0,1.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
4,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,0.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,0.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,4.0,8.0,3.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
5,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,0.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,3.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,3.0,8.0,3.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
6,1,2,0,0,0,9,9,0,3,2,0,0,0,0,0,0.0,0.0,1.0,0.0,2.0,0.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,0.0,1.0,6.0,0.0,1.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,6.0,8.0,4.0,0.0,1.0,0.0,0.0,2.0,5.0,3.0,1.0,2.0,0.0,0.0
7,0,4,0,0,5,9,9,0,3,2,2,0,1,1,0,1.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,3.0,0.0,3.0,1.0,1.0,2.0,1.0,2.0,6.0,3.0,0.0,1.0,3.0,6.0,0.0,3.0,0.0,0.0,2.0,1.0,2.0,1.0,0.0,4.0,0.0,2.0,6.0,5.0,4.0,1.0,1.0,0.0,0.0,0.0,3.0,2.0,0.0,2.0,0.0,0.0
8,0,4,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,3.0,0.0,3.0,1.0,1.0,2.0,1.0,2.0,3.0,3.0,0.0,5.0,3.0,6.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0,1.0,0.0,4.0,0.0,2.0,5.0,8.0,4.0,1.0,1.0,0.0,0.0,0.0,3.0,2.0,0.0,2.0,4.0,0.0
9,0,3,2,0,4,5,2,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,2.0,5.0,1.0,3.0,3.0,3.0,0.0,6.0,1.0,1.0,2.0,1.0,1.0,9.0,3.0,0.0,9.0,3.0,6.0,0.0,7.0,0.0,0.0,7.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,7.0,5.0,4.0,1.0,1.0,0.0,0.0,8.0,3.0,0.0,1.0,2.0,3.0,0.0



Make starting point for data_Imputed
data_Imputed


Unnamed: 0,HOSPITAL,DAY_WEEK,J_KNIFE,MONTH,PERMVIT,PJ,PSU,PVH_INVL,REGION,REST_MIS,ROLLOVER,URBANICITY,VE_FORMS,VE_TOTAL,WRK_ZONE,HIT_RUN,INT_HWY,PER_TYP,REL_ROAD,HARM_EV,M_HARM,ROLINLOC,TOW_VEH,HOUR,PCRASH5,MAN_COLL,LGT_COND,MAX_SEV,NUM_INJ,MAK_MOD,P_CRASH1,SPEC_USE,SEAT_POS,IMPACT1,SPEEDREL,P_CRASH2,MAX_VSEV,NUM_INJV,VEH_AGE,INJ_SEV,MAKE,CARGO_BT,MODEL,NUMOCCS,WEATHER,BODY_TYP,SEX,PCRASH4,VSURCOND,EJECTION,TOWED,RELJCT2,VALIGN,AGE,DR_ZIP,AIR_BAG,REST_USE,VTCONT_F,TYP_INT,VTRAFCON,ACC_TYPE,VSPD_LIM,VPROFILE,VTRAFWAY,ALC_STATUS,DEFORMED,RELJCT1
0,0,1,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,4.0,2.0,4.0,3.0,3.0,0.0,4.0,5.0,1.0,2.0,4.0,2.0,8.0,3.0,0.0,6.0,3.0,2.0,0.0,3.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,9.0,8.0,4.0,1.0,1.0,0.0,0.0,9.0,2.0,3.0,0.0,2.0,2.0,0.0
1,0,1,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,4.0,1.0,4.0,3.0,3.0,0.0,4.0,1.0,1.0,2.0,0.0,2.0,8.0,3.0,0.0,1.0,3.0,6.0,0.0,6.0,0.0,0.0,3.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,6.0,8.0,4.0,1.0,1.0,0.0,0.0,8.0,2.0,3.0,0.0,2.0,2.0,0.0
2,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,6.0,2.0,1.0,2.0,1.0,2.0,6.0,2.0,1.0,9.0,2.0,7.0,0.0,7.0,0.0,0.0,7.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,3.0,8.0,1.0,1.0,1.0,0.0,0.0,5.0,2.0,2.0,0.0,2.0,0.0,0.0
3,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,2.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,2.0,4.0,8.0,1.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
4,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,0.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,0.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,4.0,8.0,3.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
5,0,1,0,0,3,9,9,0,3,2,2,0,1,1,0,0.0,0.0,0.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,2.0,4.0,3.0,1.0,1.0,3.0,1.0,2.0,3.0,2.0,3.0,5.0,2.0,6.0,0.0,5.0,2.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,0.0,2.0,3.0,8.0,3.0,1.0,1.0,0.0,0.0,3.0,2.0,2.0,0.0,2.0,0.0,0.0
6,1,2,0,0,0,9,9,0,3,2,0,0,0,0,0,0.0,0.0,1.0,0.0,2.0,0.0,1.0,0.0,2.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,2.0,2.0,2.0,0.0,0.0,1.0,6.0,0.0,1.0,0.0,3.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,6.0,8.0,4.0,0.0,1.0,0.0,0.0,2.0,5.0,3.0,1.0,2.0,0.0,0.0
7,0,4,0,0,5,9,9,0,3,2,2,0,1,1,0,1.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,3.0,0.0,3.0,1.0,1.0,2.0,1.0,2.0,6.0,3.0,0.0,1.0,3.0,6.0,0.0,3.0,0.0,0.0,2.0,1.0,2.0,1.0,0.0,4.0,0.0,2.0,6.0,5.0,4.0,1.0,1.0,0.0,0.0,0.0,3.0,2.0,0.0,2.0,0.0,0.0
8,0,4,0,0,5,9,9,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,0.0,6.0,1.0,0.0,1.0,3.0,0.0,3.0,1.0,1.0,2.0,1.0,2.0,3.0,3.0,0.0,5.0,3.0,6.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0,1.0,0.0,4.0,0.0,2.0,5.0,8.0,4.0,1.0,1.0,0.0,0.0,0.0,3.0,2.0,0.0,2.0,4.0,0.0
9,0,3,2,0,4,5,2,0,3,2,2,0,1,1,0,0.0,0.0,1.0,2.0,3.0,2.0,2.0,2.0,5.0,1.0,3.0,3.0,3.0,0.0,6.0,1.0,1.0,2.0,1.0,1.0,9.0,3.0,0.0,9.0,3.0,6.0,0.0,7.0,0.0,0.0,7.0,1.0,2.0,0.0,0.0,4.0,1.0,2.0,7.0,5.0,4.0,1.0,1.0,0.0,0.0,8.0,3.0,0.0,1.0,2.0,3.0,0.0



Start Loop

['HIT_RUN', 22]
HIT_RUN
22


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.])

(817623, 67)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,779680,779702,22
1,1.0,37921,37921,0
2,,22,0,0



['INT_HWY', 74]
INT_HWY
74


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0.])

(817623, 67)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,729207,729281,74
1,1.0,88342,88342,0
2,,74,0,0



['PER_TYP', 155]
PER_TYP
155


array([1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1.])

(817623, 67)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,601849,602003,154
1,0.0,215619,215620,1
2,,155,0,0



['REL_ROAD', 198]
REL_ROAD
198


array([2., 2., 2., 2., 2., 2., 2., 0., 0., 0., 0., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 0., 0., 0., 2., 2., 2., 2., 0., 2.,
       2., 0., 0., 2., 2., 0., 2., 2., 2., 0., 2., 2., 2., 2., 2., 2., 2.,
       0., 2., 2., 2., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0., 2., 2., 2.,
       0., 2., 2., 2., 2., 2., 2., 2., 0., 0., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 0., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 0., 2., 2., 2., 2.,
       2., 0., 2., 2., 2., 0., 0., 0., 0., 0., 2., 2., 2., 0., 0., 2., 2.,
       2., 0., 2., 2., 2., 2., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 0., 0., 2., 0., 0., 0., 0.,
       0., 2., 2., 2., 2., 2., 0., 2., 2., 2., 2.])

(817623, 67)

[2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,736312,736461,149
1,0.0,61797,61846,49
2,1.0,19316,19316,0
3,,198,0,0



['HARM_EV', 300]
HARM_EV
300


array([3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 0., 3., 3., 3., 3., 3.,
       3., 0., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 0., 3., 3., 3., 1.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 0., 3.,
       3., 3., 0., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 1., 3.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 0., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 0.,
       3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 0., 0., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.,
       3., 3., 3., 3., 3.

(817623, 67)

[3.0, 2.0, 0.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,715178,715459,281
1,2.0,25564,25564,0
2,0.0,27681,27697,16
3,4.0,20613,20613,0
4,1.0,28287,28290,3
5,,300,0,0



['M_HARM', 329]
M_HARM
329


array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 1., 1., 2., 2., 2., 1., 1., 1., 2., 2., 1., 2., 1., 1., 2.,
       1., 1., 2., 2., 2., 2., 1., 2., 2., 1., 1., 2., 1., 2., 1., 2., 2.,
       2., 1., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 1., 1., 1., 1., 1., 2.,
       2., 1., 2., 2., 2., 1., 1., 1., 1., 1., 1., 2., 1., 2., 1., 1., 2.,
       2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 1., 1., 1., 2., 1., 2., 2.,
       1., 1., 1., 2., 2., 2., 1., 1., 2., 2., 2., 2., 1., 2., 1., 1., 2.,
       2., 2., 1., 2., 1., 1., 1., 2., 1., 1., 2., 1., 2., 2., 2., 1., 1.,
       1., 1., 1., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 1.,
       1., 2., 2., 2., 2.

(817623, 67)

[2.0, 0.0, 3.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,699529,699733,204
1,0.0,35583,35591,8
2,3.0,17630,17630,0
3,4.0,21543,21543,0
4,1.0,43009,43126,117
5,,329,0,0



['ROLINLOC', 415]
ROLINLOC
415


array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2., 2., 2.

(817623, 67)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,789431,789846,415
1,1.0,6613,6613,0
2,0.0,21164,21164,0
3,,415,0,0



['TOW_VEH', 950]
TOW_VEH
950


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.

(817623, 67)

[0.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,794704,795654,950
1,2.0,21243,21243,0
2,1.0,726,726,0
3,,950,0,0



['HOUR', 2553]
HOUR
2553


array([4., 4., 4., ..., 5., 5., 5.])

(817623, 67)

[4.0, 6.0, 2.0, 5.0, 0.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,265309,266373,1064
1,6.0,116974,117232,258
2,2.0,29545,29545,0
3,5.0,273052,274283,1231
4,0.0,26025,26025,0
5,3.0,75558,75558,0
6,1.0,28607,28607,0
7,,2553,0,0



['PCRASH5', 2886]
PCRASH5
2886


array([1., 1., 1., ..., 1., 0., 1.])

(817623, 67)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,82086,82086,0
1,1.0,641405,644153,2748
2,0.0,91246,91384,138
3,,2886,0,0



['MAN_COLL', 3055]
MAN_COLL
3055


array([1., 3., 1., ..., 3., 3., 2.])

(817623, 67)

[4.0, 0.0, 1.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,105608,105608,0
1,0.0,34235,34235,0
2,1.0,120116,120210,94
3,3.0,319712,321657,1945
4,2.0,234897,235913,1016
5,,3055,0,0



['LGT_COND', 3886]
LGT_COND
3886


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,584528,588414,3886
1,1.0,138264,138264,0
2,2.0,24348,24348,0
3,0.0,66597,66597,0
4,,3886,0,0



['MAX_SEV', 5917]
MAX_SEV
5917


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,403316,409233,5917
1,2.0,195194,195194,0
2,0.0,96519,96519,0
3,1.0,116677,116677,0
4,,5917,0,0



['NUM_INJ', 5917]
NUM_INJ
5917


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 4.0, 1.0, 2.0, 3.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,403332,409249,5917
1,4.0,19355,19355,0
2,1.0,227353,227353,0
3,2.0,100424,100424,0
4,3.0,42856,42856,0
5,5.0,18386,18386,0
6,,5917,0,0



['MAK_MOD', 9650]
MAK_MOD
9650


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[4.0, 6.0, 3.0, 1.0, 8.0, 0.0, 2.0, 9.0, 5.0, 7.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,166857,166857,0
1,6.0,137118,137178,60
2,3.0,205452,215029,9577
3,1.0,42189,42189,0
4,8.0,22596,22596,0
5,0.0,22026,22031,5
6,2.0,68162,68162,0
7,9.0,38042,38050,8
8,5.0,78768,78768,0
9,7.0,26763,26763,0



['P_CRASH1', 11721]
P_CRASH1
11721


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[5.0, 1.0, 2.0, 4.0, 6.0, 0.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,5.0,41736,41736,0
1,1.0,420243,431949,11706
2,2.0,81178,81178,0
3,4.0,129584,129599,15
4,6.0,37379,37379,0
5,0.0,45966,45966,0
6,3.0,49816,49816,0
7,,11721,0,0



['SPEC_USE', 11944]
SPEC_USE
11944


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,798374,810318,11944
1,2.0,3417,3417,0
2,0.0,3888,3888,0
3,,11944,0,0



['SEAT_POS', 12478]
SEAT_POS
12478


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 0.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,601713,613596,11883
1,0.0,117342,117342,0
2,3.0,53486,54081,595
3,1.0,32604,32604,0
4,,12478,0,0



['IMPACT1', 12983]
IMPACT1
12983


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[4.0, 0.0, 1.0, 2.0, 7.0, 5.0, 6.0, 3.0, 8.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,39490,39490,0
1,0.0,47043,47043,0
2,1.0,363687,376211,12524
3,2.0,44726,44726,0
4,7.0,190704,191163,459
5,5.0,27785,27785,0
6,6.0,26900,26900,0
7,3.0,30230,30230,0
8,8.0,16608,16608,0
9,9.0,17467,17467,0



['SPEEDREL', 14440]
SPEEDREL
14440


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,753089,767529,14440
1,1.0,37204,37204,0
2,0.0,12890,12890,0
3,,14440,0,0



['P_CRASH2', 15057]
P_CRASH2
15057


array([1., 8., 8., ..., 8., 8., 8.])

(817623, 67)

[8.0, 6.0, 3.0, 0.0, 9.0, 7.0, 5.0, 2.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,8.0,298422,312902,14480
1,6.0,109812,109923,111
2,3.0,48904,48904,0
3,0.0,21911,21911,0
4,9.0,106196,106196,0
5,7.0,48848,48848,0
6,5.0,72061,72162,101
7,2.0,23152,23153,1
8,4.0,41355,41355,0
9,1.0,31905,32269,364



['MAX_VSEV', 20225]
MAX_VSEV
20225


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,519628,539853,20225
1,2.0,137030,137030,0
2,0.0,51703,51703,0
3,1.0,89037,89037,0
4,,20225,0,0



['NUM_INJV', 20225]
NUM_INJV
20225


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,519649,539874,20225
1,1.0,195056,195056,0
2,3.0,28094,28094,0
3,2.0,54599,54599,0
4,,20225,0,0



['VEH_AGE', 21546]
VEH_AGE
21546


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[6.0, 1.0, 9.0, 5.0, 2.0, 0.0, 4.0, 3.0, 7.0, 8.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,6.0,58303,58303,0
1,1.0,252848,274394,21546
2,9.0,59243,59243,0
3,5.0,64955,64955,0
4,2.0,82528,82528,0
5,0.0,46284,46284,0
6,4.0,66361,66361,0
7,3.0,104168,104168,0
8,7.0,23953,23953,0
9,8.0,37434,37434,0



['INJ_SEV', 22070]
INJ_SEV
22070


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,568204,590162,21958
1,2.0,120420,120531,111
2,0.0,39440,39440,0
3,1.0,67489,67490,1
4,,22070,0,0



['MAKE', 22473]
MAKE
22473


array([6., 6., 6., ..., 0., 6., 6.])

(817623, 67)

[2.0, 6.0, 7.0, 1.0, 5.0, 4.0, 9.0, 0.0, 8.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,113378,113378,0
1,6.0,278513,296252,17739
2,7.0,76816,76816,0
3,1.0,22754,22754,0
4,5.0,118229,118229,0
5,4.0,48086,48086,0
6,9.0,18453,19950,1497
7,0.0,16090,19327,3237
8,8.0,37255,37255,0
9,3.0,65576,65576,0



['CARGO_BT', 24182]
CARGO_BT
24182


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,770550,787238,16688
1,1.0,4426,4426,0
2,2.0,18465,25959,7494
3,,24182,0,0



['MODEL', 26805]
MODEL
26805


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 6.0, 7.0, 5.0, 2.0, 1.0, 4.0, 8.0, 9.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,195240,221848,26608
1,6.0,58143,58143,0
2,7.0,95387,95459,72
3,5.0,108172,108172,0
4,2.0,141518,141518,0
5,1.0,20753,20753,0
6,4.0,83518,83518,0
7,8.0,34217,34217,0
8,9.0,32168,32291,123
9,0.0,21702,21704,2



['NUMOCCS', 28035]
NUMOCCS
28035


array([0., 0., 0., ..., 1., 0., 0.])

(817623, 67)

[0.0, 2.0, 3.0, 1.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,430913,458113,27200
1,2.0,82033,82033,0
2,3.0,46984,46984,0
3,1.0,196400,197235,835
4,4.0,33258,33258,0
5,,28035,0,0



['WEATHER', 29295]
WEATHER
29295


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,586678,615973,29295
1,1.0,71184,71184,0
2,2.0,130466,130466,0
3,,29295,0,0



['BODY_TYP', 29722]
BODY_TYP
29722


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 3.0, 7.0, 4.0, 1.0, 6.0, 9.0, 8.0, 0.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,283835,313456,29621
1,3.0,54278,54278,0
2,7.0,26901,26901,0
3,4.0,171415,171415,0
4,1.0,38910,38910,0
5,6.0,125025,125120,95
6,9.0,16938,16942,4
7,8.0,16865,16865,0
8,0.0,21697,21699,2
9,5.0,32037,32037,0



['SEX', 30276]
SEX
30276


array([1., 1., 1., ..., 1., 0., 1.])

(817623, 67)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,430048,459553,29505
1,0.0,357297,358068,771
2,2.0,2,2,0
3,,30276,0,0



['PCRASH4', 33775]
PCRASH4
33775


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,754828,788603,33775
1,0.0,12734,12734,0
2,1.0,16286,16286,0
3,,33775,0,0



['VSURCOND', 34537]
VSURCOND
34537


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,643575,678112,34537
1,1.0,112062,112062,0
2,2.0,27449,27449,0
3,,34537,0,0



['EJECTION', 40329]
EJECTION
40329


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,752065,792362,40297
1,1.0,2761,2761,0
2,2.0,22468,22500,32
3,,40329,0,0



['TOWED', 40553]
TOWED
40553


array([4., 4., 4., ..., 0., 4., 4.])

(817623, 67)

[4.0, 0.0, 3.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,443888,481475,37587
1,0.0,223872,226838,2966
2,3.0,34247,34247,0
3,1.0,25547,25547,0
4,2.0,49516,49516,0
5,,40553,0,0



['RELJCT2', 41803]
RELJCT2
41803


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[1.0, 0.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,287126,327451,40325
1,0.0,229231,229619,388
2,3.0,188382,189472,1090
3,2.0,71081,71081,0
4,,41803,0,0



['VALIGN', 44487]
VALIGN
44487


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 3.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,688703,733190,44487
1,3.0,19286,19286,0
2,1.0,38873,38873,0
3,0.0,26274,26274,0
4,,44487,0,0



['AGE', 46539]
AGE
46539


array([6., 6., 6., ..., 6., 6., 6.])

(817623, 67)

[9.0, 6.0, 3.0, 4.0, 5.0, 7.0, 1.0, 2.0, 8.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,9.0,36917,36917,0
1,6.0,403431,449970,46539
2,3.0,29952,29952,0
3,4.0,20684,20684,0
4,5.0,40778,40778,0
5,7.0,125673,125673,0
6,1.0,29436,29436,0
7,2.0,22905,22905,0
8,8.0,43719,43719,0
9,0.0,17589,17589,0



['DR_ZIP', 60921]
DR_ZIP
60921


array([5., 5., 5., ..., 5., 5., 5.])

(817623, 67)

[8.0, 5.0, 4.0, 9.0, 7.0, 2.0, 6.0, 1.0, 0.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,8.0,84369,88440,4071
1,5.0,212081,268931,56850
2,4.0,90944,90944,0
3,9.0,32308,32308,0
4,7.0,103003,103003,0
5,2.0,44425,44425,0
6,6.0,109956,109956,0
7,1.0,19421,19421,0
8,0.0,30600,30600,0
9,3.0,29595,29595,0



['AIR_BAG', 65897]
AIR_BAG
65897


array([4., 4., 4., ..., 4., 4., 4.])

(817623, 67)

[4.0, 1.0, 3.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,596205,662102,65897
1,1.0,45714,45714,0
2,3.0,16783,16783,0
3,0.0,41621,41621,0
4,2.0,51403,51403,0
5,,65897,0,0



['REST_USE', 68246]
REST_USE
68246


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[1.0, 0.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,660019,728012,67993
1,0.0,45350,45603,253
2,3.0,23782,23782,0
3,2.0,20226,20226,0
4,,68246,0,0



['VTCONT_F', 70044]
VTCONT_F
70044


array([1., 1., 1., ..., 1., 1., 1.])

(817623, 67)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,460966,530850,69884
1,2.0,285837,285997,160
2,0.0,776,776,0
3,,70044,0,0



['TYP_INT', 72438]
TYP_INT
72438


array([0., 0., 0., ..., 2., 2., 2.])

(817623, 67)

[0.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,418326,445951,27625
1,2.0,232852,277665,44813
2,1.0,94007,94007,0
3,,72438,0,0



['VTRAFCON', 72727]
VTRAFCON
72727


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,471707,543952,72245
1,1.0,194390,194872,482
2,2.0,78799,78799,0
3,,72727,0,0



['ACC_TYPE', 80816]
ACC_TYPE
80816


array([8., 8., 8., ..., 5., 8., 8.])

(817623, 67)

[9.0, 8.0, 5.0, 3.0, 2.0, 0.0, 7.0, 6.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,9.0,31921,31921,0
1,8.0,211413,253767,42354
2,5.0,95132,103258,8126
3,3.0,65527,65598,71
4,2.0,21294,22026,732
5,0.0,28061,28061,0
6,7.0,127823,154296,26473
7,6.0,85968,85968,0
8,4.0,43416,43416,0
9,1.0,26252,29312,3060



['VSPD_LIM', 106127]
VSPD_LIM
106127


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[2.0, 5.0, 3.0, 4.0, 6.0, 1.0, 0.0, 7.0, 8.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,46288,46288,0
1,5.0,25759,25759,0
2,3.0,213451,319377,105926
3,4.0,157199,157199,0
4,6.0,80514,80642,128
5,1.0,70732,70732,0
6,0.0,19536,19536,0
7,7.0,14369,14369,0
8,8.0,41924,41924,0
9,9.0,41724,41797,73



['VPROFILE', 111013]
VPROFILE
111013


array([3., 3., 3., ..., 3., 3., 3.])

(817623, 67)

[3.0, 2.0, 0.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,587104,698117,111013
1,2.0,57747,57747,0
2,0.0,19737,19737,0
3,4.0,19286,19286,0
4,1.0,22736,22736,0
5,,111013,0,0



['VTRAFWAY', 129908]
VTRAFWAY
129908


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[1.0, 0.0, 5.0, 2.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,158675,158675,0
1,0.0,307588,432125,124537
2,5.0,19286,19286,0
3,2.0,154870,160241,5371
4,3.0,18678,18678,0
5,4.0,28618,28618,0
6,,129908,0,0



['ALC_STATUS', 135050]
ALC_STATUS
135050


array([2., 2., 2., ..., 2., 2., 2.])

(817623, 67)

[2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,667674,802724,135050
1,0.0,14649,14649,0
2,1.0,250,250,0
3,,135050,0,0



['DEFORMED', 142194]
DEFORMED
142194


array([4., 4., 4., ..., 0., 0., 4.])

(817623, 67)

[2.0, 0.0, 4.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,163996,163996,0
1,0.0,267556,313230,45674
2,4.0,205355,301875,96520
3,3.0,14721,14721,0
4,1.0,23801,23801,0
5,,142194,0,0



['RELJCT1', 151563]
RELJCT1
151563


array([0., 0., 0., ..., 0., 0., 0.])

(817623, 67)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,626700,778263,151563
1,1.0,39360,39360,0
2,,151563,0,0




Index(['HOSPITAL', 'ACC_TYPE', 'AGE', 'AIR_BAG', 'ALC_STATUS', 'BODY_TYP',
       'CARGO_BT', 'DAY_WEEK', 'DEFORMED', 'DR_ZIP', 'EJECTION', 'HARM_EV',
       'HIT_RUN', 'HOUR', 'IMPACT1', 'INJ_SEV', 'INT_HWY', 'J_KNIFE',
       'LGT_COND', 'MAKE', 'MAK_MOD', 'MAN_COLL', 'MAX_SEV', 'MAX_VSEV',
       'MODEL', 'MONTH', 'M_HARM', 'NUMOCCS', 'NUM_INJ', 'NUM_INJV', 'PCRASH4',
       'PCRASH5', 'PERMVIT', 'PER_TYP', 'PJ', 'PSU', 'PVH_INVL', 'P_CRASH1',
       'P_CRASH2', 'REGION', 'RELJCT1', 'RELJCT2', 'REL_ROAD', 'REST_MIS',
       'REST_USE', 'ROLINLOC', 'ROLLOVER', 'SEAT_POS', 'SEX', 'SPEC_USE',
       'SPEEDREL', 'TOWED', 'TOW_VEH', 'TYP_INT', 'URBANICITY', 'VALIGN',
       'VEH_AGE', 'VE_FORMS', 'VE_TOTAL', 'VPROFILE', 'VSPD_LIM', 'VSURCOND',
       'VTCONT_F', 'VTRAFCON', 'VTRAFWAY', 'WEATHER', 'WRK_ZONE'],
      dtype='object')
[0, 1]


[[0, 691281, 691281], [1, 126342, 126342]]


[9.0, 8.0, 5.0, 3.0, 2.0, 0.0, 7.0, nan, 6.0, 4.0, 1.0]


[[9.0, 31921, 31921],
 [8.0, 211413, 253767],
 [5.0, 95132, 103258],
 [3.0, 65527, 65598],
 [2.0, 21294, 22026],
 [0.0, 28061, 28061],
 [7.0, 127823, 154296],
 [nan, 0, 0],
 [6.0, 85968, 85968],
 [4.0, 43416, 43416],
 [1.0, 26252, 29312]]


[9.0, 6.0, 3.0, 4.0, nan, 5.0, 7.0, 1.0, 2.0, 8.0, 0.0]


[[9.0, 36917, 36917],
 [6.0, 403431, 449970],
 [3.0, 29952, 29952],
 [4.0, 20684, 20684],
 [nan, 0, 0],
 [5.0, 40778, 40778],
 [7.0, 125673, 125673],
 [1.0, 29436, 29436],
 [2.0, 22905, 22905],
 [8.0, 43719, 43719],
 [0.0, 17589, 17589]]


[4.0, 1.0, 3.0, nan, 0.0, 2.0]


[[4.0, 596205, 662102],
 [1.0, 45714, 45714],
 [3.0, 16783, 16783],
 [nan, 0, 0],
 [0.0, 41621, 41621],
 [2.0, 51403, 51403]]


[nan, 2.0, 0.0, 1.0]


[[nan, 0, 0], [2.0, 667674, 802724], [0.0, 14649, 14649], [1.0, 250, 250]]


[2.0, 3.0, 7.0, nan, 4.0, 1.0, 6.0, 9.0, 8.0, 0.0, 5.0]


[[2.0, 283835, 313456],
 [3.0, 54278, 54278],
 [7.0, 26901, 26901],
 [nan, 0, 0],
 [4.0, 171415, 171415],
 [1.0, 38910, 38910],
 [6.0, 125025, 125120],
 [9.0, 16938, 16942],
 [8.0, 16865, 16865],
 [0.0, 21697, 21699],
 [5.0, 32037, 32037]]


[0.0, 1.0, 2.0, nan]


[[0.0, 770550, 787238], [1.0, 4426, 4426], [2.0, 18465, 25959], [nan, 0, 0]]


[1, 2, 4, 3, 0]


[[1, 112889, 112889],
 [2, 117194, 117194],
 [4, 139378, 139378],
 [3, 242595, 242595],
 [0, 205567, 205567]]


[2.0, 0.0, nan, 4.0, 3.0, 1.0]


[[2.0, 163996, 163996],
 [0.0, 267556, 313230],
 [nan, 0, 0],
 [4.0, 205355, 301875],
 [3.0, 14721, 14721],
 [1.0, 23801, 23801]]


[8.0, nan, 5.0, 4.0, 9.0, 7.0, 2.0, 6.0, 1.0, 0.0, 3.0]


[[8.0, 84369, 88440],
 [nan, 0, 0],
 [5.0, 212081, 268931],
 [4.0, 90944, 90944],
 [9.0, 32308, 32308],
 [7.0, 103003, 103003],
 [2.0, 44425, 44425],
 [6.0, 109956, 109956],
 [1.0, 19421, 19421],
 [0.0, 30600, 30600],
 [3.0, 29595, 29595]]


[0.0, 1.0, nan, 2.0]


[[0.0, 752065, 792362], [1.0, 2761, 2761], [nan, 0, 0], [2.0, 22468, 22500]]


[3.0, 2.0, 0.0, 4.0, 1.0, nan]


[[3.0, 715178, 715459],
 [2.0, 25564, 25564],
 [0.0, 27681, 27697],
 [4.0, 20613, 20613],
 [1.0, 28287, 28290],
 [nan, 0, 0]]


[0.0, 1.0, nan]


[[0.0, 779680, 779702], [1.0, 37921, 37921], [nan, 0, 0]]


[4.0, 6.0, 2.0, 5.0, 0.0, 3.0, 1.0, nan]


[[4.0, 265309, 266373],
 [6.0, 116974, 117232],
 [2.0, 29545, 29545],
 [5.0, 273052, 274283],
 [0.0, 26025, 26025],
 [3.0, 75558, 75558],
 [1.0, 28607, 28607],
 [nan, 0, 0]]


[4.0, 0.0, 1.0, 2.0, 7.0, 5.0, nan, 6.0, 3.0, 8.0, 9.0]


[[4.0, 39490, 39490],
 [0.0, 47043, 47043],
 [1.0, 363687, 376211],
 [2.0, 44726, 44726],
 [7.0, 190704, 191163],
 [5.0, 27785, 27785],
 [nan, 0, 0],
 [6.0, 26900, 26900],
 [3.0, 30230, 30230],
 [8.0, 16608, 16608],
 [9.0, 17467, 17467]]


[3.0, 2.0, 0.0, nan, 1.0]


[[3.0, 568204, 590162],
 [2.0, 120420, 120531],
 [0.0, 39440, 39440],
 [nan, 0, 0],
 [1.0, 67489, 67490]]


[0.0, 1.0, nan]


[[0.0, 729207, 729281], [1.0, 88342, 88342], [nan, 0, 0]]


[0, 2, 1]


[[0, 796084, 796084], [2, 21109, 21109], [1, 430, 430]]


[3.0, 1.0, 2.0, nan, 0.0]


[[3.0, 584528, 588414],
 [1.0, 138264, 138264],
 [2.0, 24348, 24348],
 [nan, 0, 0],
 [0.0, 66597, 66597]]


[2.0, 6.0, 7.0, 1.0, nan, 5.0, 4.0, 9.0, 0.0, 8.0, 3.0]


[[2.0, 113378, 113378],
 [6.0, 278513, 296252],
 [7.0, 76816, 76816],
 [1.0, 22754, 22754],
 [nan, 0, 0],
 [5.0, 118229, 118229],
 [4.0, 48086, 48086],
 [9.0, 18453, 19950],
 [0.0, 16090, 19327],
 [8.0, 37255, 37255],
 [3.0, 65576, 65576]]


[4.0, 6.0, 3.0, 1.0, nan, 8.0, 0.0, 2.0, 9.0, 5.0, 7.0]


[[4.0, 166857, 166857],
 [6.0, 137118, 137178],
 [3.0, 205452, 215029],
 [1.0, 42189, 42189],
 [nan, 0, 0],
 [8.0, 22596, 22596],
 [0.0, 22026, 22031],
 [2.0, 68162, 68162],
 [9.0, 38042, 38050],
 [5.0, 78768, 78768],
 [7.0, 26763, 26763]]


[4.0, 0.0, 1.0, 3.0, 2.0, nan]


[[4.0, 105608, 105608],
 [0.0, 34235, 34235],
 [1.0, 120116, 120210],
 [3.0, 319712, 321657],
 [2.0, 234897, 235913],
 [nan, 0, 0]]


[3.0, 2.0, 0.0, nan, 1.0]


[[3.0, 403316, 409233],
 [2.0, 195194, 195194],
 [0.0, 96519, 96519],
 [nan, 0, 0],
 [1.0, 116677, 116677]]


[3.0, 2.0, 0.0, nan, 1.0]


[[3.0, 519628, 539853],
 [2.0, 137030, 137030],
 [0.0, 51703, 51703],
 [nan, 0, 0],
 [1.0, 89037, 89037]]


[3.0, 6.0, 7.0, 5.0, nan, 2.0, 1.0, 4.0, 8.0, 9.0, 0.0]


[[3.0, 195240, 221848],
 [6.0, 58143, 58143],
 [7.0, 95387, 95459],
 [5.0, 108172, 108172],
 [nan, 0, 0],
 [2.0, 141518, 141518],
 [1.0, 20753, 20753],
 [4.0, 83518, 83518],
 [8.0, 34217, 34217],
 [9.0, 32168, 32291],
 [0.0, 21702, 21704]]


[0, 1, 2, 3, 4, 5]


[[0, 181326, 181326],
 [1, 125805, 125805],
 [2, 137098, 137098],
 [3, 148013, 148013],
 [4, 153296, 153296],
 [5, 72085, 72085]]


[2.0, 0.0, 3.0, 4.0, 1.0, nan]


[[2.0, 699529, 699733],
 [0.0, 35583, 35591],
 [3.0, 17630, 17630],
 [4.0, 21543, 21543],
 [1.0, 43009, 43126],
 [nan, 0, 0]]


[0.0, 2.0, nan, 3.0, 1.0, 4.0]


[[0.0, 430913, 458113],
 [2.0, 82033, 82033],
 [nan, 0, 0],
 [3.0, 46984, 46984],
 [1.0, 196400, 197235],
 [4.0, 33258, 33258]]


[0.0, 4.0, 1.0, nan, 2.0, 3.0, 5.0]


[[0.0, 403332, 409249],
 [4.0, 19355, 19355],
 [1.0, 227353, 227353],
 [nan, 0, 0],
 [2.0, 100424, 100424],
 [3.0, 42856, 42856],
 [5.0, 18386, 18386]]


[0.0, 1.0, 3.0, nan, 2.0]


[[0.0, 519649, 539874],
 [1.0, 195056, 195056],
 [3.0, 28094, 28094],
 [nan, 0, 0],
 [2.0, 54599, 54599]]


[2.0, 0.0, nan, 1.0]


[[2.0, 754828, 788603], [0.0, 12734, 12734], [nan, 0, 0], [1.0, 16286, 16286]]


[2.0, 1.0, 0.0, nan]


[[2.0, 82086, 82086], [1.0, 641405, 644153], [0.0, 91246, 91384], [nan, 0, 0]]


[5, 3, 0, 4, 1, 2]


[[5, 274774, 274774],
 [3, 317208, 317208],
 [0, 70594, 70594],
 [4, 116321, 116321],
 [1, 16567, 16567],
 [2, 22159, 22159]]


[1.0, 0.0, nan]


[[1.0, 601849, 602003], [0.0, 215619, 215620], [nan, 0, 0]]


[9, 5, 0, 1, 4, 2, 6, 3, 7, 8]


[[9, 33401, 33401],
 [5, 143193, 143193],
 [0, 32834, 32834],
 [1, 30574, 30574],
 [4, 119980, 119980],
 [2, 34835, 34835],
 [6, 112708, 112708],
 [3, 114267, 114267],
 [7, 108233, 108233],
 [8, 87598, 87598]]


[9, 2, 3, 4, 6, 5, 8, 0, 1, 7]


[[9, 21208, 21208],
 [2, 70201, 70201],
 [3, 200727, 200727],
 [4, 141952, 141952],
 [6, 72315, 72315],
 [5, 99480, 99480],
 [8, 70871, 70871],
 [0, 29199, 29199],
 [1, 57946, 57946],
 [7, 53724, 53724]]


[0, 1, 2]


[[0, 797349, 797349], [1, 16893, 16893], [2, 3381, 3381]]


[5.0, 1.0, 2.0, 4.0, nan, 6.0, 0.0, 3.0]


[[5.0, 41736, 41736],
 [1.0, 420243, 431949],
 [2.0, 81178, 81178],
 [4.0, 129584, 129599],
 [nan, 0, 0],
 [6.0, 37379, 37379],
 [0.0, 45966, 45966],
 [3.0, 49816, 49816]]


[8.0, 6.0, 3.0, 0.0, 9.0, 7.0, nan, 5.0, 2.0, 4.0, 1.0]


[[8.0, 298422, 312902],
 [6.0, 109812, 109923],
 [3.0, 48904, 48904],
 [0.0, 21911, 21911],
 [9.0, 106196, 106196],
 [7.0, 48848, 48848],
 [nan, 0, 0],
 [5.0, 72061, 72162],
 [2.0, 23152, 23153],
 [4.0, 41355, 41355],
 [1.0, 31905, 32269]]


[3, 2, 1, 0]


[[3, 124004, 124004],
 [2, 145546, 145546],
 [1, 458170, 458170],
 [0, 89903, 89903]]


[0.0, 1.0, nan]


[[0.0, 626700, 778263], [1.0, 39360, 39360], [nan, 0, 0]]


[1.0, 0.0, 3.0, 2.0, nan]


[[1.0, 287126, 327451],
 [0.0, 229231, 229619],
 [3.0, 188382, 189472],
 [2.0, 71081, 71081],
 [nan, 0, 0]]


[2.0, 0.0, 1.0, nan]


[[2.0, 736312, 736461], [0.0, 61797, 61846], [1.0, 19316, 19316], [nan, 0, 0]]


[2, 1, 0]


[[2, 740043, 740043], [1, 6872, 6872], [0, 70708, 70708]]


[1.0, 0.0, nan, 3.0, 2.0]


[[1.0, 660019, 728012],
 [0.0, 45350, 45603],
 [nan, 0, 0],
 [3.0, 23782, 23782],
 [2.0, 20226, 20226]]


[2.0, 1.0, 0.0, nan]


[[2.0, 789431, 789846], [1.0, 6613, 6613], [0.0, 21164, 21164], [nan, 0, 0]]


[2, 0, 1]


[[2, 789431, 789431], [0, 13831, 13831], [1, 14361, 14361]]


[2.0, 0.0, 3.0, 1.0, nan]


[[2.0, 601713, 613596],
 [0.0, 117342, 117342],
 [3.0, 53486, 54081],
 [1.0, 32604, 32604],
 [nan, 0, 0]]


[1.0, 0.0, nan, 2.0]


[[1.0, 430048, 459553], [0.0, 357297, 358068], [nan, 0, 0], [2.0, 2, 2]]


[1.0, 2.0, 0.0, nan]


[[1.0, 798374, 810318], [2.0, 3417, 3417], [0.0, 3888, 3888], [nan, 0, 0]]


[2.0, 1.0, nan, 0.0]


[[2.0, 753089, 767529], [1.0, 37204, 37204], [nan, 0, 0], [0.0, 12890, 12890]]


[4.0, 0.0, nan, 3.0, 1.0, 2.0]


[[4.0, 443888, 481475],
 [0.0, 223872, 226838],
 [nan, 0, 0],
 [3.0, 34247, 34247],
 [1.0, 25547, 25547],
 [2.0, 49516, 49516]]


[0.0, 2.0, 1.0, nan]


[[0.0, 794704, 795654], [2.0, 21243, 21243], [1.0, 726, 726], [nan, 0, 0]]


[0.0, nan, 2.0, 1.0]


[[0.0, 418326, 445951],
 [nan, 0, 0],
 [2.0, 232852, 277665],
 [1.0, 94007, 94007]]


[0, 1]


[[0, 184785, 184785], [1, 632838, 632838]]


[2.0, 3.0, nan, 1.0, 0.0]


[[2.0, 688703, 733190],
 [3.0, 19286, 19286],
 [nan, 0, 0],
 [1.0, 38873, 38873],
 [0.0, 26274, 26274]]


[6.0, 1.0, 9.0, 5.0, nan, 2.0, 0.0, 4.0, 3.0, 7.0, 8.0]


[[6.0, 58303, 58303],
 [1.0, 252848, 274394],
 [9.0, 59243, 59243],
 [5.0, 64955, 64955],
 [nan, 0, 0],
 [2.0, 82528, 82528],
 [0.0, 46284, 46284],
 [4.0, 66361, 66361],
 [3.0, 104168, 104168],
 [7.0, 23953, 23953],
 [8.0, 37434, 37434]]


[1, 0, 3, 2]


[[1, 585905, 585905],
 [0, 111573, 111573],
 [3, 30017, 30017],
 [2, 90128, 90128]]


[1, 0, 3, 2]


[[1, 595835, 595835], [0, 95641, 95641], [3, 32004, 32004], [2, 94143, 94143]]


[3.0, 2.0, 0.0, 4.0, nan, 1.0]


[[3.0, 587104, 698117],
 [2.0, 57747, 57747],
 [0.0, 19737, 19737],
 [4.0, 19286, 19286],
 [nan, 0, 0],
 [1.0, 22736, 22736]]


[2.0, 5.0, 3.0, 4.0, 6.0, 1.0, 0.0, nan, 7.0, 8.0, 9.0]


[[2.0, 46288, 46288],
 [5.0, 25759, 25759],
 [3.0, 213451, 319377],
 [4.0, 157199, 157199],
 [6.0, 80514, 80642],
 [1.0, 70732, 70732],
 [0.0, 19536, 19536],
 [nan, 0, 0],
 [7.0, 14369, 14369],
 [8.0, 41924, 41924],
 [9.0, 41724, 41797]]


[0.0, 1.0, 2.0, nan]


[[0.0, 643575, 678112],
 [1.0, 112062, 112062],
 [2.0, 27449, 27449],
 [nan, 0, 0]]


[nan, 1.0, 2.0, 0.0]


[[nan, 0, 0], [1.0, 460966, 530850], [2.0, 285837, 285997], [0.0, 776, 776]]


[nan, 0.0, 1.0, 2.0]


[[nan, 0, 0],
 [0.0, 471707, 543952],
 [1.0, 194390, 194872],
 [2.0, 78799, 78799]]


[nan, 1.0, 0.0, 5.0, 2.0, 3.0, 4.0]


[[nan, 0, 0],
 [1.0, 158675, 158675],
 [0.0, 307588, 432125],
 [5.0, 19286, 19286],
 [2.0, 154870, 160241],
 [3.0, 18678, 18678],
 [4.0, 28618, 28618]]


[0.0, nan, 1.0, 2.0]


[[0.0, 586678, 615973],
 [nan, 0, 0],
 [1.0, 71184, 71184],
 [2.0, 130466, 130466]]


[0, 1, 2]


[[0, 802710, 802710], [1, 8089, 8089], [2, 6824, 6824]]


CPU times: user 19min 12s, sys: 1min 37s, total: 20min 50s
Wall time: 21min 16s


0