In [None]:
%%latex
\tableofcontents

# readme
- Most of our other Jupyter Notebooks have a main() function at the bottom that runs everything.  
- This notebook is structured differently, with several functions that run in sequence.  
- The reason for the difference is that part of the work has to be done outside this notebook.  
- The IVEware imputation software is available in several languages, but not Python.  We ran it in R using scrlib.  
- This notebook prepares the data for the Mode, Random Forest, and IVEware imputations, and does the first two.  Then the user must separately run the IVEware software.  Finally, this notebook pulls in those results and compares the three methods.  

# Methods

- We have the discretized CRSS dataset in '../../Big_Files/CRSS_Binned.csv'
- MissForest is a round-robin imputation method implemented in R, generally considered one of the best imputation methods.  It has several Python implementations.
- We tried to use MissForest, https://pypi.org/project/MissForest/, to impute missing values, but it gave us errors, and finding the source of the errors led us down the path to write our own round-robin implementation.
- We compare here three methods:
    - Round-Robin Random Forest 
        - Our own implementation of Round Robin, using scikit-learn's random forest
        - Using imputation by mode as the starting point
    - Imputation by mode
    - IVEware, using the hyperparameters in the CRSS Imputation report
- To compare, we followed the example for MissForest.
    - We dropped all samples with a missing value, so we would have ground truth.
    - We erased ~15% of the values in each sample.
    - We used each imputation method to impute the missing values, and, for each feature, counted how many did not match the ground truth.
- Our round-robin method
    - In data_NaN, change all of the 'Unknown' to np.NaN.
    - In each feature, count the number of unknown samples.
    - In another copy, data_Mode, impute by mode in all of the features.
    - Starting with the feature with the least (nonzero) number of missing samples:
        - Copy that feature from data_NaN into data_Mode, so that only that feature has missing values.
        - Separate the dataframe into two, one with known values in the target variable (X) and one with unknown values (Z).
        - From the dataframe with known values (X), separate out the target variable (call it 'y')
        - Using Random Forest, build a model that maps X to y.  
        - Use the model to impute the missing values
    - At each iteration we replace the mode-imputed values with RF-imputed values.
- The IVEware implementation is available in several platforms, but Python is not one of them.  We run it in R outside this notebook.  Be aware that the random selection of values to erase is different for each run, so the IVEware imputation must be run anew. 

- Once we had analyzed the results and decided that the Random Forest method is best for our work, we implemented it and saved the results to CRSS_Imputed.csv.

# Results of Comparison of Three Imputation Methods

- We ran the imputation on 78 features with 250,389 samples.  
    - The features are the features of the CRSS dataset that are have data for all of 2016 - 2021, are not the results of imputation by CRSS, may have a pattern (not random numbers like VIN numbers), and that do not have more than 20% of the samples missing.  
    - The features were discretized (binned) down to 2-10 categories before imputation.
    - The samples are those of the 713,566 that have no missing values in any of the 78 features.

- First Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.95% |
| Mode Imputation | 28.85% |
| IVEware | 26.14% |

    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  50 | 28 | 0 | 78 |
Compare RF to IVEware | 47 | 0 | 31 | 78 |
Compare Mode to IVEware | 39 | 0 | 39 |  78 |



    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,720,350|
|RF Different from Mode|  301,605|
|RF Different from IVEware|  1,255,018|
|Mode Different from IVEware| 1,412,826|


- Second Run

     - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.76% |
| Mode Imputation | 28.82% |
| IVEware | 28.41% |


    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  48 | 30 | 0 | 78 |
Compare RF to IVEware | 52 | 0 | 26 | 78 |
Compare Mode to IVEware | 43 | 0 | 35 |  78 |



    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,719,688|
|RF Different from Mode|  307,544|
|RF Different from IVEware|  811,484|
|Mode Different from IVEware| 960,765|





## Discussion

- Random Forest is as good or better than Mode for (nearly) every feature.
- Random Forest is as good or better than IVEware on more than half of the features, but not overwhelmingly, and slightly better in the count of missing samples correctly imputed.
- IVEware and Mode are comparable in the number of features, but IVEware is much better in the count of missing samples correctly imputed.
- Random Forest and Mode make the same mistakes.  
- IVEware makes different mistakes from Random Forest and Mode.

## Conclusion

- Use Random Forest

# Setup
## Import Libraries

In [1]:
import sys, copy, math, time, os

print ('Python version: {}'.format(sys.version))

import numpy as np
print ('NumPy version: {}'.format(np.__version__))
np.set_printoptions(suppress=True)


import pandas as pd
print ('Pandas version:  {}'.format(pd.__version__))
pd.set_option('display.max_rows', 500)

import sklearn
print ('SciKit-Learn version: {}'.format(sklearn.__version__))
from sklearn.model_selection import train_test_split

import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

# Set Randomness.  Copied from https://www.kaggle.com/code/abazdyrev/keras-nn-focal-loss-experiments
import random
#np.random.seed(42) # NumPy
#random.seed(42) # Python
#tf.set_random_seed(42) # Tensorflow

from IPython.display import Audio
sound_file = './beep.wav'

import warnings
warnings.filterwarnings('ignore')

print ('Finished Importing Libraries')


Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:13) [Clang 14.0.6 ]
NumPy version: 1.24.0
Pandas version:  1.5.2
SciKit-Learn version: 1.2.0
Finished Importing Libraries


## Get Data
- Drop features imputed by CRSS
- Drop the correlation features, CASENUM, VEH_NO, and PER_NO, which exist to let us correlate the data in the ACCIDENT, VEHICLE, and PERSON datasets, which we have already done

In [2]:
def Get_Data():
    print ('Get_Data')
    data = pd.read_csv('../../Big_Files/CRSS_Binned.csv', low_memory=False)
    print ('data.shape = ', data.shape)
    print ()

    print ('Drop CASENUM, VEH_NO, and PER_NO')
    data.drop(columns=['CASENUM', 'VEH_NO', 'PER_NO'], inplace=True)
    print ('data.shape = ', data.shape)
    print ()

    print ('Drop Imputed Columns')
    for feature in data:
        if '_IM' in feature:
            print (feature)
            data.drop(columns=feature, inplace=True)
    
    print ('data.shape = ', data.shape)
    print ()
    
    print ("Remaining Features:")
    Features = sorted(list(data.columns))
    for feature in Features:
        print ("    ",feature)
    
    return data

In [3]:
data = Get_Data()


Get_Data
data.shape =  (713566, 106)

Drop CASENUM, VEH_NO, and PER_NO
data.shape =  (713566, 103)

Drop Imputed Columns
HOUR_IM
LGTCON_IM
RELJCT2_IM
WEATHR_IM
WKDY_IM
NO_INJ_IM
MANCOL_IM
EVENT1_IM
ALCHL_IM
MAXSEV_IM
RELJCT1_IM
BDYTYP_IM
IMPACT1_IM
MXVSEV_IM
NUMINJ_IM
PCRASH1_IM
V_ALCH_IM
VEVENT_IM
AGE_IM
EJECT_IM
INJSEV_IM
PERALCH_IM
SEAT_IM
SEX_IM
VEH_AGE_IM
data.shape =  (713566, 78)

Remaining Features:
     ACC_TYPE
     AGE
     AIR_BAG
     ALCOHOL
     ALC_RES
     ALC_STATUS
     BODY_TYP
     BUS_USE
     CARGO_BT
     DAY_WEEK
     DR_PRES
     EJECTION
     EMER_USE
     FIRE_EXP
     HARM_EV
     HAZ_CNO
     HAZ_INV
     HAZ_PLAC
     HAZ_REL
     HIT_RUN
     HOSPITAL
     HOUR
     IMPACT1
     INJ_SEV
     INT_HWY
     J_KNIFE
     LGT_COND
     MAKE
     MAK_MOD
     MAN_COLL
     MAX_SEV
     MAX_VSEV
     MODEL
     MONTH
     M_HARM
     NUMOCCS
     NUM_INJ
     NUM_INJV
     PCRASH4
     PCRASH5
     PERMVIT
     PERNOTMVIT
     PER_TYP
     PJ
     PSU
     PVH_

## Tools

In [4]:
def Impute_Round_Robin(data):
    print ('Impute()')
    pd.set_option('display.max_columns', None)
    
    # Replace 'Unknown' with np.NaN
    data.replace({'Unknown': np.nan}, inplace=True)
    display(data.head(20))
    print ()
    
    # Make a list of features with missing samples, 
    #     ordered by the number of missing samples, 
    #     from least to most.  
    Missing = []
    Complete = []
    for feature in data:
        s = data[feature].isna().sum()
        if s==0:
            Complete.append([feature, s])
        if s>0:
            Missing.append([feature, s])
    Missing = sorted (Missing, key=lambda x:x[1], reverse=False)
    print ()
    print ('Complete[]')
    display(Complete)
    print ()
    print ('Missing[]')
    display(Missing)
    print ()
    
    print ('Make data_Mode')
    print ()
    data_Mode = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Mode[feature] = data[feature]
    for M in Missing:
        feature = M[0]
        m = data[feature].mode()[0]
        print (feature, M[1], m)
        data_Mode[feature] = data[feature].fillna(m)
    print ('data_Mode')
    display(data_Mode.head(20))

    print ()
    print ('Make starting point for data_Imputed')
    data_Imputed = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Imputed[feature] = data[feature]
    for X in Missing:
        feature = X[0]
        data_Imputed[feature] = data_Mode[feature]
    print ('data_Imputed')
    display(data_Imputed.head(20))
    print ()

    print ('Start Loop')
    print ()
    n = 0
    for M in Missing:
        n += 1
        print (M)
        feature = M[0]
        data_Imputed[feature] = data[feature]
#        print ()
#        print ('data[feature].isna().sum()')
#        print (data[feature].isna().sum())
#        print ('data_Imputed[feature].isna().sum()')
#        print (data_Imputed[feature].isna().sum())
#        print ()
        W = data_Imputed.dropna(subset=[feature])
        X = data_Imputed.dropna(subset=[feature])
        y = X[feature]
        X.drop(columns=feature, inplace=True)
        Z = data_Imputed[data_Imputed[feature].isna()]
        Z.drop(columns=feature, inplace=True)
#        Z.reset_index(drop=True, inplace=True)
#        print (data.shape)
#        print (X.shape)
#        display(X.head(40))
#        display(y.head(40))
#        print (Z.shape)
#        display(Z)
        clf = RandomForestClassifier(max_depth=2, random_state=0)
        clf.fit(X,y)
#        print ('clf.predict(Z)')
        z = clf.predict(Z)
        print (len(z))
        display(z)
        Z[feature] = z
#        display(Z)
        data_Imputed = pd.concat([Z, W])
#        display(data_Imputed.head(60))
        print (data_Imputed.shape)
        print ()
#        data_Imputed.sort_values(
#            by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#            ascending = [True, True, True], 
#            inplace=True
#        )
#        print ()
#        print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#        print (data.PER_NO.equals(data_Imputed.PER_NO))
#        print ()
               
        Check_Feature(data, data_Imputed, feature)
#        if n==10:
#            return data_Imputed
    
    
    
    
    print ()
    return data_Imputed

In [5]:
def Impute_Full(data):
    print ('Impute()')
    data.replace({'Unknown': np.nan}, inplace=True)
    for feature in data:
        print (feature, len(pd.unique(data[feature])))
    print ()
    mf = MissForest()
    data = mf.fit_transform(data)
    return data

In [6]:
def Check(data, data_Imputed):
    Features = data.columns
    print (Features)
    for feature in Features:
        U = pd.unique(data[feature]).tolist()
        print (U)
        A = []
        for u in U:
            a = len(data[data[feature]==u])
            b = len(data_Imputed[data_Imputed[feature]==u])
            A.append([u, a, b])
        display(A)
        print ()


In [7]:
def Check_Feature(data, data_Imputed, feature):
    U = pd.unique(data[feature]).tolist()
    U = [x for x in U if x == x]
    print (U)
    A = []
    for u in U:
        a = len(data[data[feature]==u])
        b = len(data_Imputed[data_Imputed[feature]==u])
        A.append([u, a, b, b-a])
    a = data[feature].isna().sum()
    b = data_Imputed[feature].isna().sum()
    A.append(['NaN', a, b, 0])
    A = pd.DataFrame(A, columns=['Value', 'Original', 'Imputed', 'Difference'])
    display(A)
    print ()


# Compare Imputation Methods

## Mode Imputation
## Random Forest Imputation
## Prepare Data for IVEware

In [8]:
def Compare_Imputation_Methods_Part_1():
    print ()
    print ('Compare_Imputation_Methods_Part_1()')
    data = Get_Data()
    print (data.shape)

    # Drop all samples with missing data, so we have ground truth
    data.replace({'Unknown':np.nan}, inplace=True)
    data.dropna(inplace=True)
    data.reset_index(inplace=True, drop=True)
    for feature in data:
        data[feature] = pd.to_numeric(data[feature])
    data.astype('int64')

    data_Ground_Truth = data.copy(deep=True)
    for feature in data_Ground_Truth:
        data_Ground_Truth[feature] = pd.to_numeric(data_Ground_Truth[feature])
    data_Ground_Truth = data_Ground_Truth.astype('int64')
    print ('data_Ground_Truth.shape')
    print (data_Ground_Truth.shape)
    display(data_Ground_Truth.head())

    # Randomly pick 15% of the values from each row
    # and set them to be missing
    print ('Remove 15% of values from each row')
    frac = .15
    N = data.shape[0] * frac # Number of NaN in each feature
    for c in data.columns:
        idx = np.random.choice(a=data.index, size=int(len(data) * frac))
        data.loc[idx, c] = np.nan
    data_NaN = data.copy(deep=True)
    print ('data_NaN.shape')
    print (data_NaN.shape)
    display(data_NaN.head())
    
    # Create .txt file to feed into IVEware imputation
    data_IVEware = data.fillna('')
    data_IVEware.to_csv('../../Big_Files/data_IVEware.txt', sep='\t', index=False)
    
    data_Mode = pd.DataFrame()
    for feature in data:
        data_Mode[feature] = data[feature].fillna(data[feature].mode()[0])
    data_Mode = data_Mode.astype('int64')
    print ('data_Mode.shape')
    print (data_Mode.shape)
    display(data_Mode.head())
    
    # Perform Round Robin imputation using Random Forest Classifier
    data_RF = Impute_Round_Robin(data)
    data_RF.sort_index(inplace=True)
    data_RF = data_RF[data.columns]  
    data_RF = data_RF.astype('int64')
    
    print ('data_RF.shape')
    print (data_RF.shape)
    display(data_RF.head())
#    print ()

    return data_Ground_Truth, data_NaN, data_RF, data_Mode

In [9]:
data_Ground_Truth, data_NaN, data_RF, data_Mode = Compare_Imputation_Methods_Part_1()


Compare_Imputation_Methods_Part_1()
Get_Data
data.shape =  (713566, 106)

Drop CASENUM, VEH_NO, and PER_NO
data.shape =  (713566, 103)

Drop Imputed Columns
HOUR_IM
LGTCON_IM
RELJCT2_IM
WEATHR_IM
WKDY_IM
NO_INJ_IM
MANCOL_IM
EVENT1_IM
ALCHL_IM
MAXSEV_IM
RELJCT1_IM
BDYTYP_IM
IMPACT1_IM
MXVSEV_IM
NUMINJ_IM
PCRASH1_IM
V_ALCH_IM
VEVENT_IM
AGE_IM
EJECT_IM
INJSEV_IM
PERALCH_IM
SEAT_IM
SEX_IM
VEH_AGE_IM
data.shape =  (713566, 78)

Remaining Features:
     ACC_TYPE
     AGE
     AIR_BAG
     ALCOHOL
     ALC_RES
     ALC_STATUS
     BODY_TYP
     BUS_USE
     CARGO_BT
     DAY_WEEK
     DR_PRES
     EJECTION
     EMER_USE
     FIRE_EXP
     HARM_EV
     HAZ_CNO
     HAZ_INV
     HAZ_PLAC
     HAZ_REL
     HIT_RUN
     HOSPITAL
     HOUR
     IMPACT1
     INJ_SEV
     INT_HWY
     J_KNIFE
     LGT_COND
     MAKE
     MAK_MOD
     MAN_COLL
     MAX_SEV
     MAX_VSEV
     MODEL
     MONTH
     M_HARM
     NUMOCCS
     NUM_INJ
     NUM_INJV
     PCRASH4
     PCRASH5
     PERMVIT
     PERNOTMVIT
  

Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,...,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,...,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,...,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,2,0,1,0,2,1,...,1,1,1,0,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,...,1,1,0,3,1,1,1,4,1,3


Remove 15% of values from each row
data_NaN.shape
(250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,,3.0,0.0,1.0,,,0.0,2.0,2.0,...,1.0,0.0,,0.0,,,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,2.0,1.0,0.0,,2.0,2.0,...,1.0,1.0,0.0,3.0,2.0,1.0,,3.0,,4.0
2,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,...,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,
3,,0.0,2.0,,2.0,0.0,1.0,0.0,2.0,1.0,...,1.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,3.0
4,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,...,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0


data_Mode.shape
(250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,...,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,...,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,...,1,1,0,3,2,1,1,3,0,0
3,3,0,2,2,2,0,1,0,2,1,...,1,1,1,3,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,...,1,1,0,3,1,1,1,4,1,3


Impute()


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,,3.0,0.0,1.0,,,0.0,2.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,0.0,,4.0,1.0,1.0,2016.0,4.0,2.0,3.0,0.0,4.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,5.0,,1.0,0.0,0.0,0.0,0.0,1.0,3.0,5.0,2.0,,2.0,1.0,1.0,1.0,0.0,0.0,,1.0,1.0,0.0,,1.0,1.0,2.0,3.0,0.0,0.0,1.0,0.0,,0.0,,,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,2.0,1.0,0.0,,2.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,2.0,1.0,3.0,2.0,1.0,0.0,2016.0,,2.0,1.0,0.0,1.0,5.0,,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,1.0,,1.0,4.0,6.0,2.0,4.0,3.0,3.0,,1.0,1.0,3.0,1.0,1.0,1.0,0.0,0.0,,,1.0,1.0,1.0,1.0,1.0,0.0,3.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,,3.0,,4.0
2,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,,1.0,0.0,0.0,,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,1.0,3.0,2.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,
3,,0.0,2.0,,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,,,0.0,1.0,1.0,,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,3.0
4,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,,1.0,1.0,,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,,0.0,0.0,8.0,,2.0,1.0,5.0,,0.0,0.0,1.0,,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,,0.0,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0
5,3.0,,3.0,,2.0,1.0,1.0,,1.0,2.0,1.0,,0.0,2.0,0.0,0.0,1.0,4.0,2.0,3.0,1.0,1.0,,2.0,2.0,2.0,0.0,4.0,5.0,1.0,,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,,4.0,3.0,3.0,,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,1.0
6,,,3.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,2.0,3.0,1.0,1.0,,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,4.0,,1.0,,8.0,0.0,4.0,1.0,,4.0,3.0,,3.0,1.0,2.0,,0.0,,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,2.0
7,3.0,,,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,4.0,2.0,,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,,1.0,,1.0,0.0,,1.0,0.0,4.0,1.0,1.0,4.0,8.0,,4.0,1.0,1.0,4.0,,1.0,,1.0,2.0,1.0,,2.0,1.0,,1.0,4.0,2.0,1.0,1.0,,0.0,,0.0,1.0,,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0
8,2.0,0.0,3.0,,1.0,1.0,1.0,0.0,2.0,2.0,,1.0,0.0,,,,,1.0,,3.0,,1.0,2016.0,3.0,,,0.0,3.0,,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,,1.0,1.0,2.0,0.0,,1.0,3.0,3.0,4.0,3.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,,1.0,1.0,,3.0,2.0,1.0,1.0,,0.0,0.0
9,,0.0,,0.0,,0.0,1.0,0.0,,1.0,,1.0,0.0,1.0,0.0,0.0,0.0,1.0,3.0,,3.0,,2016.0,,,0.0,0.0,0.0,5.0,1.0,,1.0,1.0,,1.0,0.0,0.0,1.0,,,1.0,,3.0,8.0,2.0,4.0,3.0,3.0,1.0,0.0,0.0,,1.0,1.0,0.0,0.0,0.0,,1.0,,7.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,3.0,,1.0,,3.0,1.0,2.0




Complete[]


[]


Missing[]


[['URBANICITY', 34778],
 ['VEH_ALCH', 34785],
 ['NUM_INJV', 34787],
 ['PERNOTMVIT', 34788],
 ['M_HARM', 34797],
 ['PER_TYP', 34802],
 ['P_CRASH1', 34804],
 ['ALC_STATUS', 34805],
 ['FIRE_EXP', 34807],
 ['PSU', 34808],
 ['WEATHER', 34811],
 ['SPEC_USE', 34811],
 ['VE_FORMS', 34816],
 ['SEX', 34817],
 ['AIR_BAG', 34819],
 ['SEAT_POS', 34820],
 ['VEH_AGE', 34820],
 ['CARGO_BT', 34824],
 ['HOSPITAL', 34825],
 ['HARM_EV', 34828],
 ['PJ', 34829],
 ['HAZ_PLAC', 34833],
 ['PVH_INVL', 34835],
 ['MAX_VSEV', 34836],
 ['SPEEDREL', 34837],
 ['MAN_COLL', 34839],
 ['BUS_USE', 34840],
 ['BODY_TYP', 34841],
 ['TOWED', 34844],
 ['MAKE', 34846],
 ['ROLLOVER', 34849],
 ['TYP_INT', 34850],
 ['PCRASH5', 34851],
 ['TOW_VEH', 34855],
 ['HIT_RUN', 34857],
 ['AGE', 34858],
 ['EMER_USE', 34862],
 ['PCRASH4', 34866],
 ['REST_MIS', 34868],
 ['WRK_ZONE', 34869],
 ['HAZ_CNO', 34869],
 ['IMPACT1', 34870],
 ['DAY_WEEK', 34872],
 ['P_CRASH2', 34873],
 ['RELJCT1', 34875],
 ['VSURCOND', 34875],
 ['VTRAFWAY', 34875],
 ['H


Make data_Mode

URBANICITY 34778 1.0
VEH_ALCH 34785 1.0
NUM_INJV 34787 3.0
PERNOTMVIT 34788 0.0
M_HARM 34797 1.0
PER_TYP 34802 2.0
P_CRASH1 34804 1.0
ALC_STATUS 34805 1.0
FIRE_EXP 34807 1.0
PSU 34808 3.0
WEATHER 34811 1.0
SPEC_USE 34811 1.0
VE_FORMS 34816 2.0
SEX 34817 1.0
AIR_BAG 34819 1.0
SEAT_POS 34820 3.0
VEH_AGE 34820 0.0
CARGO_BT 34824 0.0
HOSPITAL 34825 0.0
HARM_EV 34828 1.0
PJ 34829 3.0
HAZ_PLAC 34833 0.0
PVH_INVL 34835 0.0
MAX_VSEV 34836 2.0
SPEEDREL 34837 1.0
MAN_COLL 34839 3.0
BUS_USE 34840 1.0
BODY_TYP 34841 1.0
TOWED 34844 2.0
MAKE 34846 0.0
ROLLOVER 34849 1.0
TYP_INT 34850 1.0
PCRASH5 34851 3.0
TOW_VEH 34855 0.0
HIT_RUN 34857 0.0
AGE 34858 2.0
EMER_USE 34862 1.0
PCRASH4 34866 1.0
REST_MIS 34868 1.0
WRK_ZONE 34869 0.0
HAZ_CNO 34869 1.0
IMPACT1 34870 1.0
DAY_WEEK 34872 1.0
P_CRASH2 34873 3.0
RELJCT1 34875 0.0
VSURCOND 34875 1.0
VTRAFWAY 34875 0.0
HOUR 34878 3.0
REST_USE 34886 1.0
YEAR 34893 2017.0
VTRAFCON 34893 1.0
HAZ_INV 34898 0.0
DR_PRES 34899 1.0
VALIGN 34901 1.0
VTCO

Unnamed: 0,URBANICITY,VEH_ALCH,NUM_INJV,PERNOTMVIT,M_HARM,PER_TYP,P_CRASH1,ALC_STATUS,FIRE_EXP,PSU,WEATHER,SPEC_USE,VE_FORMS,SEX,AIR_BAG,SEAT_POS,VEH_AGE,CARGO_BT,HOSPITAL,HARM_EV,PJ,HAZ_PLAC,PVH_INVL,MAX_VSEV,SPEEDREL,MAN_COLL,BUS_USE,BODY_TYP,TOWED,MAKE,ROLLOVER,TYP_INT,PCRASH5,TOW_VEH,HIT_RUN,AGE,EMER_USE,PCRASH4,REST_MIS,WRK_ZONE,HAZ_CNO,IMPACT1,DAY_WEEK,P_CRASH2,RELJCT1,VSURCOND,VTRAFWAY,HOUR,REST_USE,YEAR,VTRAFCON,HAZ_INV,DR_PRES,VALIGN,VTCONT_F,INJ_SEV,MONTH,LGT_COND,RELJCT2,NUM_INJ,MAX_SEV,VSPD_LIM,INT_HWY,ALC_RES,PERMVIT,MAK_MOD,EJECTION,J_KNIFE,MODEL,VPROFILE,REL_ROAD,ACC_TYPE,NUMOCCS,REGION,HAZ_REL,VE_TOTAL,ALCOHOL,SCH_BUS
0,2.0,1.0,1.0,0.0,1.0,2.0,5.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,4.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,5.0,1.0,2.0,0.0,1.0,2.0,3.0,0.0,2016.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,4.0,3.0,4.0,1.0,2.0,2.0,0.0
1,2.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,3.0,4.0,0.0,0.0,1.0,3.0,0.0,0.0,2.0,1.0,2.0,1.0,5.0,0.0,6.0,1.0,0.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,1.0,3.0,3.0,1.0,2.0,2.0,0.0
2,2.0,1.0,1.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,3.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,3.0,0.0,8.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,2.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,2.0,1.0,1.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,1.0,0.0,0.0,3.0,3.0,1.0,1.0,2.0,0.0
3,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,1.0,3.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,2.0,1.0,1.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,5.0,3.0,1.0,1.0,2.0,0.0
4,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,4.0,3.0,0.0,0.0,3.0,1.0,0.0,0.0,2.0,0.0,3.0,1.0,3.0,0.0,8.0,1.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,2.0,1.0,0.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,1.0,0.0,0.0,5.0,3.0,1.0,1.0,2.0,0.0
5,1.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,4.0,1.0,1.0,2.0,1.0,1.0,3.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,2.0,1.0,3.0,1.0,5.0,2.0,6.0,1.0,1.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,5.0,0.0,2.0,0.0,3.0,2.0,2017.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,3.0,1.0,1.0,2.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,3.0,2.0,1.0,2.0,2.0,0.0
6,1.0,1.0,1.0,0.0,1.0,2.0,4.0,1.0,1.0,4.0,1.0,2.0,2.0,1.0,1.0,3.0,2.0,1.0,0.0,1.0,2.0,0.0,0.0,0.0,1.0,3.0,2.0,5.0,2.0,8.0,1.0,1.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,4.0,1.0,3.0,0.0,2.0,0.0,3.0,2.0,2017.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,1.0,0.0,2.0,4.0,0.0,0.0,2.0,3.0,1.0,1.0,4.0,1.0,1.0,3.0,3.0,2.0,1.0,2.0,2.0,1.0
7,1.0,1.0,1.0,0.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,2.0,2.0,1.0,1.0,0.0,2.0,1.0,0.0,1.0,2.0,0.0,0.0,2.0,1.0,3.0,2.0,5.0,2.0,8.0,1.0,1.0,3.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,4.0,1.0,3.0,0.0,2.0,0.0,3.0,0.0,2016.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0,2.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,3.0,1.0,2.0,1.0,2.0,2.0,1.0
8,2.0,1.0,3.0,0.0,1.0,2.0,4.0,1.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,2.0,1.0,3.0,1.0,1.0,2.0,0.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,1.0,2.0,2.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,3.0,1.0,0.0,0.0,5.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,3.0,3.0,3.0,1.0,2.0,2.0,0.0
9,1.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,2.0,0.0,0.0,3.0,3.0,0.0,0.0,2.0,0.0,3.0,1.0,5.0,0.0,8.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,1.0,0.0,0.0,7.0,0.0,0.0,2.0,3.0,1.0,1.0,4.0,1.0,0.0,0.0,3.0,3.0,1.0,1.0,2.0,0.0



Make starting point for data_Imputed
data_Imputed


Unnamed: 0,URBANICITY,VEH_ALCH,NUM_INJV,PERNOTMVIT,M_HARM,PER_TYP,P_CRASH1,ALC_STATUS,FIRE_EXP,PSU,WEATHER,SPEC_USE,VE_FORMS,SEX,AIR_BAG,SEAT_POS,VEH_AGE,CARGO_BT,HOSPITAL,HARM_EV,PJ,HAZ_PLAC,PVH_INVL,MAX_VSEV,SPEEDREL,MAN_COLL,BUS_USE,BODY_TYP,TOWED,MAKE,ROLLOVER,TYP_INT,PCRASH5,TOW_VEH,HIT_RUN,AGE,EMER_USE,PCRASH4,REST_MIS,WRK_ZONE,HAZ_CNO,IMPACT1,DAY_WEEK,P_CRASH2,RELJCT1,VSURCOND,VTRAFWAY,HOUR,REST_USE,YEAR,VTRAFCON,HAZ_INV,DR_PRES,VALIGN,VTCONT_F,INJ_SEV,MONTH,LGT_COND,RELJCT2,NUM_INJ,MAX_SEV,VSPD_LIM,INT_HWY,ALC_RES,PERMVIT,MAK_MOD,EJECTION,J_KNIFE,MODEL,VPROFILE,REL_ROAD,ACC_TYPE,NUMOCCS,REGION,HAZ_REL,VE_TOTAL,ALCOHOL,SCH_BUS
0,2.0,1.0,1.0,0.0,1.0,2.0,5.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,0.0,3.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,4.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,5.0,1.0,2.0,0.0,1.0,2.0,3.0,0.0,2016.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,4.0,3.0,4.0,1.0,2.0,2.0,0.0
1,2.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,3.0,4.0,0.0,0.0,1.0,3.0,0.0,0.0,2.0,1.0,2.0,1.0,5.0,0.0,6.0,1.0,0.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,0.0,2.0,1.0,1.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,1.0,3.0,3.0,1.0,2.0,2.0,0.0
2,2.0,1.0,1.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,3.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,3.0,0.0,8.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,2.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,2.0,1.0,1.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,1.0,0.0,0.0,3.0,3.0,1.0,1.0,2.0,0.0
3,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,1.0,3.0,0.0,1.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,2.0,1.0,1.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,5.0,3.0,1.0,1.0,2.0,0.0
4,2.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,4.0,3.0,0.0,0.0,3.0,1.0,0.0,0.0,2.0,0.0,3.0,1.0,3.0,0.0,8.0,1.0,1.0,1.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,3.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,2.0,1.0,0.0,2.0,5.0,0.0,0.0,2.0,0.0,1.0,1.0,2.0,1.0,0.0,0.0,5.0,3.0,1.0,1.0,2.0,0.0
5,1.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,4.0,1.0,1.0,2.0,1.0,1.0,3.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,2.0,1.0,3.0,1.0,5.0,2.0,6.0,1.0,1.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,5.0,0.0,2.0,0.0,3.0,2.0,2017.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,3.0,1.0,1.0,2.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,4.0,3.0,2.0,1.0,2.0,2.0,0.0
6,1.0,1.0,1.0,0.0,1.0,2.0,4.0,1.0,1.0,4.0,1.0,2.0,2.0,1.0,1.0,3.0,2.0,1.0,0.0,1.0,2.0,0.0,0.0,0.0,1.0,3.0,2.0,5.0,2.0,8.0,1.0,1.0,3.0,0.0,0.0,3.0,1.0,1.0,1.0,0.0,1.0,4.0,1.0,3.0,0.0,2.0,0.0,3.0,2.0,2017.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,1.0,0.0,2.0,4.0,0.0,0.0,2.0,3.0,1.0,1.0,4.0,1.0,1.0,3.0,3.0,2.0,1.0,2.0,2.0,1.0
7,1.0,1.0,1.0,0.0,1.0,1.0,4.0,1.0,1.0,4.0,1.0,2.0,2.0,1.0,1.0,0.0,2.0,1.0,0.0,1.0,2.0,0.0,0.0,2.0,1.0,3.0,2.0,5.0,2.0,8.0,1.0,1.0,3.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,4.0,1.0,3.0,0.0,2.0,0.0,3.0,0.0,2016.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,3.0,1.0,1.0,2.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,4.0,1.0,1.0,3.0,1.0,2.0,1.0,2.0,2.0,1.0
8,2.0,1.0,3.0,0.0,1.0,2.0,4.0,1.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,3.0,0.0,0.0,0.0,1.0,3.0,0.0,0.0,2.0,1.0,3.0,1.0,1.0,2.0,0.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,1.0,2.0,2.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,3.0,1.0,0.0,0.0,5.0,0.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,3.0,3.0,3.0,1.0,2.0,2.0,0.0
9,1.0,1.0,3.0,0.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,2.0,0.0,0.0,3.0,3.0,0.0,0.0,2.0,0.0,3.0,1.0,5.0,0.0,8.0,1.0,1.0,3.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,3.0,1.0,2016.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,3.0,1.0,0.0,0.0,7.0,0.0,0.0,2.0,3.0,1.0,1.0,4.0,1.0,0.0,0.0,3.0,3.0,1.0,1.0,2.0,0.0



Start Loop

['URBANICITY', 34778]
34778


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,54258,54261,3
1,1.0,161353,196128,34775
2,,34778,0,0



['VEH_ALCH', 34785]
34785


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,210579,245364,34785
1,0.0,5025,5025,0
2,,34785,0,0



['NUM_INJV', 34787]
34787


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[1.0, 3.0, 0.0, 2.0, 13.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,53681,60929,7248
1,3.0,137249,164788,27539
2,0.0,24641,24641,0
3,2.0,19,19,0
4,13.0,12,12,0
5,,34787,0,0



['PERNOTMVIT', 34788]
34788


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,215021,249809,34788
1,1.0,580,580,0
2,,34788,0,0



['M_HARM', 34797]
34797


array([1., 1., 0., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,188949,221269,32320
1,0.0,18876,21353,2477
2,2.0,7767,7767,0
3,,34797,0,0



['PER_TYP', 34802]
34802


array([2., 2., 2., ..., 2., 1., 2.])

(250389, 78)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,150977,181910,30933
1,1.0,64609,68478,3869
2,0.0,1,1,0
3,,34802,0,0



['P_CRASH1', 34804]
34804


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[5.0, 1.0, 4.0, 3.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,5.0,18988,18988,0
1,1.0,105915,140533,34618
2,4.0,38394,38580,186
3,3.0,15716,15716,0
4,0.0,14983,14983,0
5,2.0,21589,21589,0
6,,34804,0,0



['ALC_STATUS', 34805]
34805


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,211284,246089,34805
1,0.0,4300,4300,0
2,,34805,0,0



['FIRE_EXP', 34807]
34807


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,215264,250071,34807
1,0.0,318,318,0
2,,34807,0,0



['PSU', 34808]
34808


array([3., 4., 3., ..., 4., 3., 4.])

(250389, 78)

[0.0, 1.0, 4.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,17318,17321,3
1,1.0,38977,38977,0
2,4.0,48295,54413,6118
3,3.0,60390,87556,27166
4,2.0,50601,52122,1521
5,,34808,0,0



['WEATHER', 34811]
34811


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 3.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,154481,189292,34811
1,2.0,21083,21083,0
2,3.0,36171,36171,0
3,4.0,2775,2775,0
4,0.0,1068,1068,0
5,,34811,0,0



['SPEC_USE', 34811]
34811


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,213896,248707,34811
1,2.0,1114,1114,0
2,0.0,568,568,0
3,,34811,0,0



['VE_FORMS', 34816]
34816


array([2., 2., 2., ..., 2., 2., 2.])

(250389, 78)

[2.0, 1.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,158642,190880,32238
1,1.0,22905,25483,2578
2,3.0,25273,25273,0
3,4.0,8753,8753,0
4,,34816,0,0



['SEX', 34817]
34817


array([1., 0., 0., ..., 0., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,112032,127297,15265
1,0.0,103540,123092,19552
2,,34817,0,0



['AIR_BAG', 34819]
34819


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,46668,46701,33
1,1.0,168902,203688,34786
2,,34819,0,0



['SEAT_POS', 34820]
34820


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[3.0, 1.0, 4.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,150917,185737,34820
1,1.0,35722,35722,0
2,4.0,16751,16751,0
3,0.0,1669,1669,0
4,2.0,10510,10510,0
5,,34820,0,0



['VEH_AGE', 34820]
34820


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 4.0, 3.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,64838,99287,34449
1,4.0,14140,14140,0
2,3.0,31261,31261,0
3,1.0,58110,58114,4
4,2.0,47220,47587,367
5,,34820,0,0



['CARGO_BT', 34824]
34824


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,207766,242590,34824
1,1.0,7799,7799,0
2,,34824,0,0



['HOSPITAL', 34825]
34825


array([0., 0., 0., ..., 0., 1., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,182315,216973,34658
1,1.0,33249,33416,167
2,,34825,0,0



['HARM_EV', 34828]
34828


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 3.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,190233,222598,32365
1,3.0,15466,17929,2463
2,0.0,3249,3249,0
3,2.0,6613,6613,0
4,,34828,0,0



['PJ', 34829]
34829


array([3., 3., 4., ..., 3., 4., 4.])

(250389, 78)

[3.0, 1.0, 2.0, 0.0, 4.0, 4154.0, 1051.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,54134,68176,14042
1,1.0,30498,30498,0
2,2.0,52293,64675,12382
3,0.0,31470,33251,1781
4,4.0,46491,53115,6624
5,4154.0,661,661,0
6,1051.0,13,13,0
7,,34829,0,0



['HAZ_PLAC', 34833]
34833


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,215500,250333,34833
1,1.0,56,56,0
2,,34833,0,0



['PVH_INVL', 34835]
34835


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,212341,247171,34830
1,1.0,3213,3218,5
2,,34835,0,0



['MAX_VSEV', 34836]
34836


array([0., 1., 2., ..., 2., 2., 0.])

(250389, 78)

[0.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,38684,42864,4180
1,2.0,137260,167212,29952
2,1.0,39609,40313,704
3,,34836,0,0



['SPEEDREL', 34837]
34837


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,206296,241133,34837
1,0.0,9256,9256,0
2,,34837,0,0



['MAN_COLL', 34839]
34839


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[4.0, 2.0, 1.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,27008,27008,0
1,2.0,65344,74131,8787
2,1.0,25264,28962,3698
3,3.0,89218,111572,22354
4,0.0,8716,8716,0
5,,34839,0,0



['BUS_USE', 34840]
34840


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,214750,249590,34840
1,2.0,780,780,0
2,0.0,19,19,0
3,,34840,0,0



['BODY_TYP', 34841]
34841


array([5., 5., 1., ..., 1., 5., 1.])

(250389, 78)

[0.0, 5.0, 3.0, 1.0, 2.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,15957,15957,0
1,5.0,48869,55103,6234
2,3.0,37430,37430,0
3,1.0,82475,111082,28607
4,2.0,19235,19235,0
5,4.0,11582,11582,0
6,,34841,0,0



['TOWED', 34844]
34844


array([0., 2., 2., ..., 2., 0., 2.])

(250389, 78)

[0.0, 2.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,82058,90913,8855
1,2.0,122888,148877,25989
2,3.0,9527,9527,0
3,1.0,1072,1072,0
4,,34844,0,0



['MAKE', 34846]
34846


array([0., 0., 0., ..., 0., 8., 0.])

(250389, 78)

[0.0, 6.0, 8.0, 2.0, 4.0, 7.0, 3.0, 1.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,42192,67864,25672
1,6.0,29726,29726,0
2,8.0,41926,51100,9174
3,2.0,28056,28056,0
4,4.0,25475,25475,0
5,7.0,8730,8730,0
6,3.0,4105,4105,0
7,1.0,32110,32110,0
8,5.0,3223,3223,0
9,,34846,0,0



['ROLLOVER', 34849]
34849


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,209772,244621,34849
1,0.0,5768,5768,0
2,,34849,0,0



['TYP_INT', 34850]
34850


array([2., 1., 1., ..., 1., 1., 2.])

(250389, 78)

[1.0, 0.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,119081,145564,26483
1,0.0,28374,28374,0
2,2.0,66509,74876,8367
3,3.0,1575,1575,0
4,,34850,0,0



['PCRASH5', 34851]
34851


array([3., 3., 3., ..., 1., 1., 3.])

(250389, 78)

[2.0, 3.0, 1.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,20600,20600,0
1,3.0,172451,204801,32350
2,1.0,18603,21104,2501
3,4.0,3478,3478,0
4,0.0,406,406,0
5,,34851,0,0



['TOW_VEH', 34855]
34855


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,210595,245447,34852
1,1.0,4939,4942,3
2,,34855,0,0



['HIT_RUN', 34857]
34857


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,214528,249385,34857
1,1.0,1004,1004,0
2,,34857,0,0



['AGE', 34858]
34858


array([2., 2., 2., ..., 2., 2., 2.])

(250389, 78)

[3.0, 2.0, 0.0, 1.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,39729,39729,0
1,2.0,130039,164891,34852
2,0.0,21231,21237,6
3,1.0,16595,16595,0
4,4.0,7937,7937,0
5,,34858,0,0



['EMER_USE', 34862]
34862


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,214943,249805,34862
1,2.0,234,234,0
2,0.0,350,350,0
3,,34862,0,0



['PCRASH4', 34866]
34866


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,206440,241306,34866
1,0.0,9083,9083,0
2,,34866,0,0



['REST_MIS', 34868]
34868


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,209236,244104,34868
1,0.0,6285,6285,0
2,,34868,0,0



['WRK_ZONE', 34869]
34869


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,210279,245148,34869
1,1.0,4821,4821,0
2,2.0,369,369,0
3,3.0,51,51,0
4,,34869,0,0



['HAZ_CNO', 34869]
34869


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,215466,250334,34868
1,2.0,51,52,1
2,0.0,3,3,0
3,,34869,0,0



['IMPACT1', 34870]
34870


array([4., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[5.0, 1.0, 4.0, 0.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,5.0,9086,9086,0
1,1.0,86493,113434,26941
2,4.0,54376,62305,7929
3,0.0,18586,18586,0
4,3.0,31198,31198,0
5,2.0,15780,15780,0
6,,34870,0,0



['DAY_WEEK', 34872]
34872


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,162293,197165,34872
1,0.0,53224,53224,0
2,,34872,0,0



['P_CRASH2', 34873]
34873


array([3., 1., 5., ..., 0., 0., 1.])

(250389, 78)

[2.0, 1.0, 0.0, 5.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,23884,23884,0
1,1.0,36359,44315,7956
2,0.0,41714,48186,6472
3,5.0,34966,37971,3005
4,3.0,49639,67079,17440
5,4.0,28954,28954,0
6,,34873,0,0



['RELJCT1', 34875]
34875


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0, 8.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,174988,209863,34875
1,1.0,8974,8974,0
2,8.0,31514,31514,0
3,9.0,38,38,0
4,,34875,0,0



['VSURCOND', 34875]
34875


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,174157,208970,34813
1,2.0,30985,30985,0
2,3.0,9853,9915,62
3,0.0,519,519,0
4,,34875,0,0



['VTRAFWAY', 34875]
34875


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[2.0, 0.0, 3.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,12293,12293,0
1,0.0,93893,127502,33609
2,3.0,49136,49581,445
3,4.0,17038,17859,821
4,1.0,43154,43154,0
5,,34875,0,0



['HOUR', 34878]
34878


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[3.0, 1.0, 2.0, 4.0, 6.0, 0.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,59676,92902,33226
1,1.0,37368,37368,0
2,2.0,53641,54309,668
3,4.0,26068,26068,0
4,6.0,11793,12456,663
5,0.0,7326,7326,0
6,5.0,19639,19960,321
7,,34878,0,0



['REST_USE', 34886]
34886


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,12851,12920,69
1,1.0,190052,224869,34817
2,2.0,12600,12600,0
3,,34886,0,0



['YEAR', 34893]
34893


array([2016., 2016., 2016., ..., 2016., 2016., 2017.])

(250389, 78)

[2016.0, 2017.0, 2018.0, 2019.0, 2020.0, 2021.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2016.0,44676,65957,21281
1,2017.0,44776,57464,12688
2,2018.0,37189,37189,0
3,2019.0,28253,28350,97
4,2020.0,29259,29917,658
5,2021.0,31343,31512,169
6,,34893,0,0



['VTRAFCON', 34893]
34893


array([2., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,141902,170922,29020
1,2.0,46956,52829,5873
2,3.0,21073,21073,0
3,0.0,5565,5565,0
4,,34893,0,0



['HAZ_INV', 34898]
34898


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,215436,250334,34898
1,1.0,55,55,0
2,,34898,0,0



['DR_PRES', 34899]
34899


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,215462,250361,34899
1,0.0,28,28,0
2,,34899,0,0



['VALIGN', 34901]
34901


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,190490,225391,34901
1,0.0,18453,18453,0
2,2.0,6545,6545,0
3,,34901,0,0



['VTCONT_F', 34901]
34901


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 3.0, 0.0, 4.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,141977,169340,27363
1,3.0,73171,80709,7538
2,0.0,211,211,0
3,4.0,120,120,0
4,2.0,9,9,0
5,,34901,0,0



['INJ_SEV', 34901]
34901


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[0.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,31046,33587,2541
1,3.0,151155,183515,32360
2,1.0,33287,33287,0
3,,34901,0,0



['MONTH', 34902]
34902


array([2., 1., 2., ..., 2., 2., 2.])

(250389, 78)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,66710,68687,1977
1,1.0,73192,79007,5815
2,2.0,75585,102695,27110
3,,34902,0,0



['LGT_COND', 34903]
34903


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,158908,193811,34903
1,2.0,5064,5064,0
2,0.0,17988,17988,0
3,1.0,33526,33526,0
4,,34903,0,0



['RELJCT2', 34903]
34903


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[0.0, 1.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,57320,64554,7234
1,1.0,89728,113003,23275
2,2.0,23643,23643,0
3,3.0,44795,49189,4394
4,,34903,0,0



['NUM_INJ', 34903]
34903


array([1., 0., 0., ..., 1., 0., 0.])

(250389, 78)

[1.0, 2.0, 0.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,60273,71553,11280
1,2.0,28445,28445,0
2,0.0,102800,126423,23623
3,3.0,12569,12569,0
4,4.0,11399,11399,0
5,,34903,0,0



['MAX_SEV', 34908]
34908


array([0., 0., 1., ..., 1., 1., 1.])

(250389, 78)

[3.0, 1.0, 2.0, 0.0, 4.0, 5.0, 6.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,21901,21902,1
1,1.0,54191,64779,10588
2,2.0,32375,32639,264
3,0.0,102729,126784,24055
4,4.0,3384,3384,0
5,5.0,895,895,0
6,6.0,6,6,0
7,,34908,0,0



['VSPD_LIM', 34911]
34911


array([2., 2., 2., ..., 2., 2., 2.])

(250389, 78)

[0.0, 1.0, 5.0, 4.0, 7.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,24840,24840,0
1,1.0,33213,36806,3593
2,5.0,38341,38341,0
3,4.0,23011,23011,0
4,7.0,35704,36893,1189
5,2.0,54480,84609,30129
6,3.0,5889,5889,0
7,,34911,0,0



['INT_HWY', 34914]
34914


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,189241,224155,34914
1,1.0,26234,26234,0
2,,34914,0,0



['ALC_RES', 34914]
34914


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,214319,249233,34914
1,1.0,1156,1156,0
2,,34914,0,0



['PERMVIT', 34916]
34916


array([2., 2., 2., ..., 2., 2., 2.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,67093,67093,0
1,2.0,134632,167678,33046
2,0.0,13748,15618,1870
3,,34916,0,0



['MAK_MOD', 34922]
34922


array([1., 1., 3., ..., 1., 3., 2.])

(250389, 78)

[0.0, 4.0, 2.0, 3.0, 1.0, 39403.0, 89883.0, 42498.0, 58498.0, 32498.0, 32057.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,36999,38051,1052
1,4.0,40893,46370,5477
2,2.0,44819,50603,5784
3,3.0,47227,55001,7774
4,1.0,45518,60353,14835
5,39403.0,1,1,0
6,89883.0,2,2,0
7,42498.0,2,2,0
8,58498.0,4,4,0
9,32498.0,1,1,0



['EJECTION', 34922]
34922


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,5093,5093,0
1,1.0,210374,245296,34922
2,,34922,0,0



['J_KNIFE', 34927]
34927


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,210613,245540,34927
1,2.0,4757,4757,0
2,0.0,92,92,0
3,,34927,0,0



['MODEL', 34928]
34928


array([1., 1., 4., ..., 4., 4., 3.])

(250389, 78)

[0.0, 4.0, 2.0, 1.0, 3.0, 63.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,37545,38681,1136
1,4.0,44236,51355,7119
2,2.0,44072,45444,1372
3,1.0,46613,65328,18715
4,3.0,42994,49580,6586
5,63.0,1,1,0
6,,34928,0,0



['VPROFILE', 34934]
34934


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,175906,210098,34192
1,2.0,23178,23920,742
2,0.0,16371,16371,0
3,,34934,0,0



['REL_ROAD', 34936]
34936


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,195989,228470,32481
1,0.0,17881,20336,2455
2,2.0,1583,1583,0
3,,34936,0,0



['ACC_TYPE', 34939]
34939


array([3., 0., 2., ..., 1., 1., 3.])

(250389, 78)

[4.0, 1.0, 0.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,39629,41160,1531
1,1.0,32688,41334,8646
2,0.0,40379,44229,3850
3,3.0,51996,69525,17529
4,2.0,50758,54141,3383
5,,34939,0,0



['NUMOCCS', 34939]
34939


array([3., 3., 1., ..., 3., 3., 3.])

(250389, 78)

[3.0, 5.0, 1.0, 6.0, 2.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,111049,139513,28464
1,5.0,24424,24424,0
2,1.0,56820,63295,6475
3,6.0,19919,19919,0
4,2.0,2567,2567,0
5,0.0,544,544,0
6,4.0,127,127,0
7,,34939,0,0



['REGION', 34940]
34940


array([3., 3., 3., ..., 3., 3., 3.])

(250389, 78)

[4.0, 3.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,8349,8349,0
1,3.0,164044,198984,34940
2,2.0,34984,34984,0
3,1.0,8072,8072,0
4,,34940,0,0



['HAZ_REL', 34952]
34952


array([1., 1., 1., ..., 1., 1., 1.])

(250389, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,215387,250338,34951
1,2.0,40,41,1
2,0.0,10,10,0
3,,34952,0,0



['VE_TOTAL', 34956]
34956


array([2., 1., 2., ..., 2., 2., 2.])

(250389, 78)

[2.0, 1.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,159861,192402,32541
1,1.0,20576,22991,2415
2,3.0,26010,26010,0
3,4.0,8986,8986,0
4,,34956,0,0



['ALCOHOL', 34963]
34963


array([2., 2., 2., ..., 2., 2., 2.])

(250389, 78)

[2.0, 9.0, 1.0, 8.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,197317,232259,34942
1,9.0,9791,9791,0
2,1.0,8313,8334,21
3,8.0,5,5,0
4,,34963,0,0



['SCH_BUS', 34980]
34980


array([0., 0., 0., ..., 0., 0., 0.])

(250389, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,214401,249381,34980
1,1.0,1008,1008,0
2,,34980,0,0




data_RF.shape
(250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,3,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,1,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,1,0,1,1,0,3,2,1,1,3,0,2
3,3,0,2,2,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,2,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,1,0,1,1,0,3,1,1,1,4,1,3


## Do IVEware Imputation (Outside this Jupyter Notebook)
- Go to the IVEware folder and run (at the command line) IVE_12_22_22.bat
- Requires scrlib and R.  You may need to, in the batch file, change the path to your scrlib installation.
- Run time: ./IVEware_CRSS_Imputation.bat  1069.08s user 12.92s system 98% cpu 18:23.92 total

In [10]:
data_IVEware = pd.read_csv('../../Big_Files/data_IVEware.csv')
data_IVEware.drop(columns='Unnamed: 0', inplace=True)

print ('data_Ground_Truth', data_Ground_Truth.shape)
display(data_Ground_Truth.head(10))
print ('data_NaN', data_NaN.shape)
display(data_NaN.head(10))
print ('data_RF', data_RF.shape)
display(data_RF.head(10))
print ('data_Mode', data_Mode.shape)
display(data_Mode.head(10))
print ('data_IVEware', data_IVEware.shape)
display(data_IVEware.head(10))


data_Ground_Truth (250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,0,1,1,0,0,1,1,0,0,1,2
8,2,0,3,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
9,2,0,3,0,0,0,1,0,2,1,3,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2


data_NaN (250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,,3.0,0.0,1.0,,,0.0,2.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,0.0,,4.0,1.0,1.0,2016.0,4.0,2.0,3.0,0.0,4.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,5.0,,1.0,0.0,0.0,0.0,0.0,1.0,3.0,5.0,2.0,,2.0,1.0,1.0,1.0,0.0,0.0,,1.0,1.0,0.0,,1.0,1.0,2.0,3.0,0.0,0.0,1.0,0.0,,0.0,,,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,2.0,1.0,0.0,,2.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,2.0,1.0,3.0,2.0,1.0,0.0,2016.0,,2.0,1.0,0.0,1.0,5.0,,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,1.0,,1.0,4.0,6.0,2.0,4.0,3.0,3.0,,1.0,1.0,3.0,1.0,1.0,1.0,0.0,0.0,,,1.0,1.0,1.0,1.0,1.0,0.0,3.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,,3.0,,4.0
2,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,,1.0,0.0,0.0,,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,1.0,3.0,2.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,
3,,0.0,2.0,,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,,,0.0,1.0,1.0,,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,3.0
4,3.0,0.0,2.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,,1.0,1.0,,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,,0.0,0.0,8.0,,2.0,1.0,5.0,,0.0,0.0,1.0,,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,,0.0,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0
5,3.0,,3.0,,2.0,1.0,1.0,,1.0,2.0,1.0,,0.0,2.0,0.0,0.0,1.0,4.0,2.0,3.0,1.0,1.0,,2.0,2.0,2.0,0.0,4.0,5.0,1.0,,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,,4.0,3.0,3.0,,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,1.0
6,,,3.0,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,2.0,3.0,1.0,1.0,,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,4.0,,1.0,,8.0,0.0,4.0,1.0,,4.0,3.0,,3.0,1.0,2.0,,0.0,,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,2.0
7,3.0,,,0.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,4.0,2.0,,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,,1.0,,1.0,0.0,,1.0,0.0,4.0,1.0,1.0,4.0,8.0,,4.0,1.0,1.0,4.0,,1.0,,1.0,2.0,1.0,,2.0,1.0,,1.0,4.0,2.0,1.0,1.0,,0.0,,0.0,1.0,,0.0,0.0,1.0,1.0,0.0,0.0,1.0,2.0
8,2.0,0.0,3.0,,1.0,1.0,1.0,0.0,2.0,2.0,,1.0,0.0,,,,,1.0,,3.0,,1.0,2016.0,3.0,,,0.0,3.0,,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,,1.0,1.0,2.0,0.0,,1.0,3.0,3.0,4.0,3.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,,1.0,1.0,,3.0,2.0,1.0,1.0,,0.0,0.0
9,,0.0,,0.0,,0.0,1.0,0.0,,1.0,,1.0,0.0,1.0,0.0,0.0,0.0,1.0,3.0,,3.0,,2016.0,,,0.0,0.0,0.0,5.0,1.0,,1.0,1.0,,1.0,0.0,0.0,1.0,,,1.0,,3.0,8.0,2.0,4.0,3.0,3.0,1.0,0.0,0.0,,1.0,1.0,0.0,0.0,0.0,,1.0,,7.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,3.0,,1.0,,3.0,1.0,2.0


data_RF (250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,3,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,1,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,1,0,1,1,0,3,2,1,1,3,0,2
3,3,0,2,2,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,2,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,1,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,2,2,1,1,0,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2017,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2017,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,3,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,2,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,1,0,1,1,0,0,1,1,0,0,1,2
8,2,0,3,2,1,1,1,0,2,2,1,1,0,2,0,0,0,1,3,3,1,1,2016,3,2,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
9,3,0,3,0,0,0,1,0,1,1,1,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,1,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,1,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2


data_Mode (250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,3,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,0,0,2,1,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,1,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,3,1,1,0,0,0,1,1,1,5,2,1,1,3,2,1,0,1,1,0,3,2,1,1,3,0,0
3,3,0,2,2,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,3,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,0,1,1,3,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,2,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,1,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,2,2,1,1,0,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2017,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,0,4,2,3,1,1,2017,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,3,8,0,4,1,3,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,2,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,1,0,1,1,0,0,1,1,0,0,1,2
8,2,0,3,2,1,1,1,0,2,2,1,1,0,2,0,0,0,1,3,3,1,1,2016,3,2,0,0,3,1,1,0,1,1,1,1,0,0,1,0,1,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
9,3,0,3,0,2,0,1,0,1,1,1,1,0,1,0,0,0,1,3,3,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,1,1,1,3,8,2,4,3,3,1,0,0,3,1,1,0,0,0,1,1,1,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2


data_IVEware (250389, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,0,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,1,0,2,0,0,3,1,0
1,1,0,3,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,1,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,1,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,1,0,1,1,0,3,2,1,1,3,0,1
3,5,0,2,1,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,6,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
4,3,0,2,0,2,0,1,0,2,1,2,1,0,1,0,0,2,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,0,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,2,1,1,0,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2020,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,1,4,3,3,3,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
6,1,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2017,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,3,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,2,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,1,0,1,0,1,1,0,0,1,1,0,0,1,2
8,2,0,3,0,1,1,1,0,2,2,1,1,0,2,0,0,0,1,2,3,2,1,2016,3,2,3,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,1,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
9,4,0,3,0,0,0,1,0,2,1,1,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,1,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2


## Compare Three Imputation Methods

In [11]:
def Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
):
    print ('Compare_Imputation_Methods_Part_2')
    print ('data_Ground_Truth.shape: ', data_Ground_Truth.shape)
    print ('data_NaN.shape: ', data_NaN.shape)
    print ('data_RF.shape: ', data_RF.shape)
    print ('data_Mode.shape: ', data_Mode.shape)
    print ('data_IVEware.shape: ', data_IVEware.shape)
    print ()
    
    A = []
    for feature in data_NaN:
        N = data_NaN[feature].isna().sum()
#        print (feature, N)
#        print ()
        D = data_Ground_Truth[feature] != data_RF[feature]
        d = D.sum()
        E = data_Ground_Truth[feature] != data_Mode[feature]
        e = E.sum()
        F = data_Ground_Truth[feature] != data_IVEware[feature]
        f = F.sum()
        G = data_RF[feature] != data_Mode[feature]
        g = G.sum()
        H = data_RF[feature] != data_IVEware[feature]
        h = H.sum()
        I = data_Mode[feature] != data_IVEware[feature]
        i = I.sum()
        print (feature, N, d, e, f, g, h, i)
        print (
            feature, 
            data_Ground_Truth.dtypes[feature],
            data_NaN.dtypes[feature],
            data_RF.dtypes[feature],
            data_Mode.dtypes[feature],
            data_IVEware.dtypes[feature],
        )
        A.append([
            feature, N, 
            d, int(d/N*100), 
            e, int(e/N*100), 
            f, int(f/N*100),
            g, int(g/N*100),
            h, int(h/N*100),
            i, int(i/N*100),
        ])
#        print (D[:10])
        print ()
    print ()
    
    A = sorted(A, key=lambda x:x[3])
    B = pd.DataFrame(
        A, 
        columns=[
            'Feature', 'nNaN', 
            'nRF Incorrect', 'pRF Incorrect', 
            'nMode Incorrect', 'pMode Incorrect', 
            'nIVEware Incorrect', 'pIVEware Incorrect',
            'RF and Mode Different', 'RF v/s Mode %',
            'RF and IVEware Different', 'RF v/s IVEware %',
            'Mode and IVEware Different', 'Mode v/s IVEware %'
        ]
    )
    display(B)
    a = sum([x[1] for x in A])
    b = sum([x[2] for x in A])
    c = sum([x[4] for x in A])
    d = sum([x[6] for x in A])
    e = round(b/a*100,2)
    f = round(c/a*100,2)
    g = round(d/a*100,2)
    s = len(A) - sum([x[8] for x in A])
    t = len(A) - sum([x[9] for x in A])
    u = len(A) - sum([x[10] for x in A])

    RF_less_Mode = sum([x[2] < x[4] for x in A])
    RF_equal_Mode = sum([x[2] == x[4] for x in A])
    RF_greater_Mode = sum([x[2] > x[4] for x in A])

    RF_less_IVEware = sum([x[2] < x[6] for x in A])
    RF_equal_IVEware = sum([x[2] == x[6] for x in A])
    RF_greater_IVEware = sum([x[2] > x[6] for x in A])

    Mode_less_IVEware = sum([x[4] < x[6] for x in A])
    Mode_equal_IVEware = sum([x[4] == x[6] for x in A])
    Mode_greater_IVEware = sum([x[4] > x[6] for x in A])

    print ()
    print ('Error RF = ', e)
    print ('Error Mode = ', f)
    print ('Error IVEware = ', g)
    print ('nRF > nMode: ', s)
    print ('nRF > nIVEware: ', t)
    print ('nModel > nIVEware: ', u)
    print ('Compare RF to Mode: ', RF_less_Mode, RF_equal_Mode, RF_greater_Mode)
    print ('Compare RF to IVEware: ', RF_less_IVEware, RF_equal_IVEware, RF_greater_IVEware)
    print ('Compare Mode to IVEware: ', Mode_less_IVEware, Mode_equal_IVEware, Mode_greater_IVEware)
    print ()
    print ('Number of NaN in data_NaN: ', data_NaN.isna().sum().sum())
    print ('RF Different from Mode: ', sum([x[8] for x in A]))
    print ('RF Different from IVEware: ', sum([x[10] for x in A]))
    print ('Mode Different from IVEware: ', sum([x[12] for x in A]))
        
#    display(Audio(sound_file, autoplay=True))
    
    
        

In [12]:
Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
)

Compare_Imputation_Methods_Part_2
data_Ground_Truth.shape:  (250389, 78)
data_NaN.shape:  (250389, 78)
data_RF.shape:  (250389, 78)
data_Mode.shape:  (250389, 78)
data_IVEware.shape:  (250389, 78)

HOUR 34878 24749 25189 25437 1652 25098 25259
HOUR int64 float64 int64 int64 int64

INT_HWY 34914 4178 4178 4208 0 4160 4160
INT_HWY int64 float64 int64 int64 int64

LGT_COND 34903 9277 9277 9565 0 9310 9310
LGT_COND int64 float64 int64 int64 int64

MONTH 34902 21242 22637 23251 7792 22733 22598
MONTH int64 float64 int64 int64 int64

PERMVIT 34916 11760 13108 8412 1870 11717 12108
PERMVIT int64 float64 int64 int64 int64

REL_ROAD 34936 786 3192 493 2455 808 3178
REL_ROAD int64 float64 int64 int64 int64

RELJCT2 34903 10326 20207 8831 11628 11920 20961
RELJCT2 int64 float64 int64 int64 int64

SCH_BUS 34980 178 178 339 0 161 161
SCH_BUS int64 float64 int64 int64 int64

URBANICITY 34778 8593 8596 9204 3 8747 8750
URBANICITY int64 float64 int64 int64 int64

VE_TOTAL 34956 6601 8884 2113 2415 685

Unnamed: 0,Feature,nNaN,nRF Incorrect,pRF Incorrect,nMode Incorrect,pMode Incorrect,nIVEware Incorrect,pIVEware Incorrect,RF and Mode Different,RF v/s Mode %,RF and IVEware Different,RF v/s IVEware %,Mode and IVEware Different,Mode v/s IVEware %
0,SCH_BUS,34980,178,0,178,0,339,0,0,0,161,0,161,0
1,PERNOTMVIT,34788,94,0,94,0,208,0,0,0,114,0,114,0
2,BUS_USE,34840,132,0,132,0,53,0,0,0,127,0,127,0
3,DR_PRES,34899,4,0,4,0,9,0,0,0,5,0,5,0
4,EMER_USE,34862,100,0,100,0,109,0,0,0,100,0,100,0
5,FIRE_EXP,34807,63,0,63,0,114,0,0,0,51,0,51,0
6,HAZ_CNO,34869,8,0,9,0,34862,99,1,0,34868,99,34869,100
7,HAZ_INV,34898,8,0,8,0,18,0,0,0,10,0,10,0
8,HAZ_PLAC,34833,7,0,7,0,17,0,0,0,10,0,10,0
9,HAZ_REL,34952,13,0,13,0,34945,99,1,0,34951,99,34952,100



Error RF =  22.76
Error Mode =  28.82
Error IVEware =  28.41
nRF > nMode:  -307466
nRF > nIVEware:  -785
nModel > nIVEware:  -811406
Compare RF to Mode:  48 30 0
Compare RF to IVEware:  52 0 26
Compare Mode to IVEware:  43 0 35

Number of NaN in data_NaN:  2719688
RF Different from Mode:  307544
RF Different from IVEware:  811484
Mode Different from IVEware:  960765


# Impute using Random Forest and Save for Next Step

In [None]:
def Impute_Using_Random_Forest():
    data = Get_Data()
    
#    data_Imputed = Impute_Full(data)
    data_Imputed = Impute_Round_Robin(data)
    data_Imputed.to_csv('../../Big_Files/CRSS_Imputed.csv', index=False)
#    display(data_Imputed.head(50))
    
    Check(data, data_Imputed)
#    display(Audio(sound_file, autoplay=True))
    return 0

Impute_Using_Random_Forest()