# Methods

- We have the discretized CRSS dataset in '../../Big_Files/Discretized_All_12_22_22.csv'
- MissForest is a round-robin imputation method implemented in R, generally considered one of the best imputation methods.  It has several Python implementations.
- I tried to use MissForest, https://pypi.org/project/MissForest/, to impute missing values, but it gave me errors, and finding the source of the errors led me down the path to write my own round-robin implementation.
- I compare here three methods:
    - Round-Robin Random Forest (my own implementation of Round Robin, using scikit-learn's random forest)
    - Imputation by mode
    - IVEware, using the hyperparameters in the CRSS Imputation report
- To compare, I followed the example for MissForest.
    - I dropped all samples with a missing value, so I would have ground truth.
    - I erased 15% of the values in each sample.
    - I used each imputation method to impute the missing values, and, for each feature, counted how many did not match the ground truth.
- My round-robin method
    - In data_NaN, change all of the 'Unknown' to np.NaN.
    - In each feature, count the number of unknown samples.
    - In another copy, data_Mode, impute by mode in all of the features.
    - Starting with the feature with the least (nonzero) number of missing samples:
        - Copy that feature from data_NaN into data_Mode, so that only that feature has missing values.
        - Separate the dataframe into two, one with known values in the target variable (X) and one with unknown values (Z).
        - From the dataframe with known values (X), separate out the target variable (call it 'y')
        - Using Random Forest, build a model that maps X to y.  
        - Use the model to impute the missing values
    - At each iteration we replace the mode-imputed values with RF-imputed values.
- The IVEware implementation is available in several platforms, but Python is not one of them.  I run it in R outside this notebook.  Be aware that the random selection of values to erase is different for each run, so the IVEware imputation must be run anew.  

# Results of Comparison of Three Imputation Methods

- We ran the imputation on 78 features with 224,850 samples.  
    - The features are the features of the CRSS dataset that are have data for all of 2016 - 2020, are not the results of imputation by CRSS, may have a pattern (not random numbers like VIN numbers), and that do not have more than 20% of the samples missing.  
    - The features were discretized (binned) down to 2-10 categories before imputation.
    - The samples are those of the 619,027 that have no missing values in any of the 78 features.
- First Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.25% |
| Mode Imputation | 28.51% |
| IVEware | 24.23% |

    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  45 | 33 | 0 | 78 |
Compare RF to IVEware | 50 | 0 | 28 | 78 |
Compare Mode to IVEware | 39 | 0 | 39 |  78 |


- Second Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.17 % |
| Mode Imputation | 28.42% |
| IVEware |  23.84% |


    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More |
| --- | --- | --- | --- |
| Compare RF to Mode | 46 | 31 | 1 |
| Compare RF to IVEware | 49 | 0 | 29 |
| Compare Mode to IVEware |  36 | 1 | 41 |

    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,443,202|
|RF Different from Mode|  273,351|
|RF Different from IVEware|  606,751|
|Mode Different from IVEware|  738,833|

- Third run with 79 features (I had neglected to include AGE)


    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.52 % |
| Mode Imputation | 28.63% |
| IVEware |  22.73% |



    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More |
| --- | --- | --- | --- |
| Compare RF to Mode | 47 | 31 | 1 |
| Compare RF to IVEware | 47 | 0 | 32 |
| Compare Mode to IVEware |  38 | 0 | 41 |

    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,417,148|
|RF Different from Mode|  279,104|
|RF Different from IVEware|  580,863|
|Mode Different from IVEware|  713,171|



## Discussion

- Random Forest is as good or better than Mode for (nearly) every feature.
- Random Forest is as good or better than IVEware on more than half of the features, but not overwhelmingly, and slightly better in the count of missing samples correctly imputed.
- IVEware and Mode are comparable in the number of features, but IVEware is much better in the count of missing samples correctly imputed.
- Random Forest and Mode make the same mistakes.  
- IVEware makes different mistakes from Random Forest and Mode.

## Conclusion

- Use Random Forest

In [1]:
%%latex
\tableofcontents

<IPython.core.display.Latex object>

# Setup
## Import Libraries

In [2]:
import sys, copy, math, time, os

print ('Python version: {}'.format(sys.version))

import numpy as np
print ('NumPy version: {}'.format(np.__version__))
np.set_printoptions(suppress=True)


import pandas as pd
print ('Pandas version:  {}'.format(pd.__version__))
pd.set_option('display.max_rows', 500)

import sklearn
print ('SciKit-Learn version: {}'.format(sklearn.__version__))
from sklearn.model_selection import train_test_split

import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

# Set Randomness.  Copied from https://www.kaggle.com/code/abazdyrev/keras-nn-focal-loss-experiments
import random
#np.random.seed(42) # NumPy
#random.seed(42) # Python
#tf.set_random_seed(42) # Tensorflow

from IPython.display import Audio
sound_file = './beep.wav'

import warnings
warnings.filterwarnings('ignore')

print ('Finished Importing Libraries')


Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:13) [Clang 14.0.6 ]
NumPy version: 1.24.0
Pandas version:  1.5.2
SciKit-Learn version: 1.2.0
Finished Importing Libraries


# Import Data

## Get Data
- The Get_Data_from_Original() reads the (original) CRSS files from the CRSS directory, preprocesses it, and writes it to files in a folder outside this GitHub repo (because the files are too large for my subscription), and returns the dataframes.
- The Get_Data_from_Temp_Files() reads the temp files and returns the dataframes.  I created this option for running repeatedly during writing and debugging, because it's much faster.

In [3]:
def Get_Data():
    print ('Get_Data')
    data = pd.read_csv('../../Big_Files/CRSS_Discretized_All_12_22_22.csv', low_memory=False)
    print ('data.shape = ', data.shape)
    print ('Drop Imputed Columns')
    for feature in data:
        if '_IM' in feature:
#            print (feature)
            data.drop(columns=feature, inplace=True)
    
    print ('data.shape = ', data.shape)
    print ()
    
    return data

In [4]:
#data = Get_Data()


In [5]:
def Impute_Round_Robin(data):
    print ('Impute()')
    pd.set_option('display.max_columns', None)
    
    # Replace 'Unknown' with np.NaN
    data.replace({'Unknown': np.nan}, inplace=True)
    display(data.head(20))
    print ()
    
#    data.sort_values(by = ['CASENUM', 'VEH_NO', 'PER_NO'], ascending = [True, True, True])
    
    # Make a list of features with missing samples, 
    #     ordered by the number of missing samples, 
    #     from least to most.  
    Missing = []
    Complete = []
    for feature in data:
        s = data[feature].isna().sum()
        if s==0:
            Complete.append([feature, s])
        if s>0:
            Missing.append([feature, s])
    Missing = sorted (Missing, key=lambda x:x[1], reverse=False)
    print ()
    print ('Complete[]')
    display(Complete)
    print ()
    print ('Missing[]')
    display(Missing)
    print ()
    
    print ('Make data_Mode')
    print ()
    data_Mode = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Mode[feature] = data[feature]
    for M in Missing:
        feature = M[0]
        m = data[feature].mode()[0]
        print (feature, M[1], m)
        data_Mode[feature] = data[feature].fillna(m)
    print ('data_Mode')
    display(data_Mode.head(20))
#    data.sort_values(
#        by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#        ascending = [True, True, True], 
#        inplace=True
#    )
#    print ()
#    print ('data.PER_NO.equals(data__Mode.PER_NO)')
#    print (data.PER_NO.equals(data_Mode.PER_NO))
#    print ()
#    
    print ()
    print ('Make starting point for data_Imputed')
    data_Imputed = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Imputed[feature] = data[feature]
    for X in Missing:
        feature = X[0]
        data_Imputed[feature] = data_Mode[feature]
    print ('data_Imputed')
    display(data_Imputed.head(20))
    print ()
#    data_Imputed.sort_values(
#        by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#        ascending = [True, True, True], 
#        inplace=True
#    )
#    print ()
#    print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#    print (data.PER_NO.equals(data_Imputed.PER_NO))
#    print ()
    
    print ('Start Loop')
    print ()
    n = 0
    for M in Missing:
        n += 1
        print (M)
        feature = M[0]
        data_Imputed[feature] = data[feature]
#        print ()
#        print ('data[feature].isna().sum()')
#        print (data[feature].isna().sum())
#        print ('data_Imputed[feature].isna().sum()')
#        print (data_Imputed[feature].isna().sum())
#        print ()
        W = data_Imputed.dropna(subset=[feature])
        X = data_Imputed.dropna(subset=[feature])
        y = X[feature]
        X.drop(columns=feature, inplace=True)
        Z = data_Imputed[data_Imputed[feature].isna()]
        Z.drop(columns=feature, inplace=True)
#        Z.reset_index(drop=True, inplace=True)
#        print (data.shape)
#        print (X.shape)
#        display(X.head(40))
#        display(y.head(40))
#        print (Z.shape)
#        display(Z)
        clf = RandomForestClassifier(max_depth=2, random_state=0)
        clf.fit(X,y)
#        print ('clf.predict(Z)')
        z = clf.predict(Z)
        print (len(z))
        display(z)
        Z[feature] = z
#        display(Z)
        data_Imputed = pd.concat([Z, W])
#        display(data_Imputed.head(60))
        print (data_Imputed.shape)
        print ()
#        data_Imputed.sort_values(
#            by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#            ascending = [True, True, True], 
#            inplace=True
#        )
#        print ()
#        print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#        print (data.PER_NO.equals(data_Imputed.PER_NO))
#        print ()
               
        Check_Feature(data, data_Imputed, feature)
#        if n==10:
#            return data_Imputed
    
    
    
    
    print ()
    return data_Imputed

In [6]:
def Impute_Full(data):
    print ('Impute()')
    data.replace({'Unknown': np.nan}, inplace=True)
    for feature in data:
        print (feature, len(pd.unique(data[feature])))
    print ()
    mf = MissForest()
    data = mf.fit_transform(data)
    return data

In [7]:
def Check(data, data_Imputed):
    Features = data.columns
    print (Features)
    for feature in Features:
        U = pd.unique(data[feature]).tolist()
        print (U)
        A = []
        for u in U:
            a = len(data[data[feature]==u])
            b = len(data_Imputed[data_Imputed[feature]==u])
            A.append([u, a, b])
        display(A)
        print ()


In [8]:
def Check_Feature(data, data_Imputed, feature):
    U = pd.unique(data[feature]).tolist()
    U = [x for x in U if x == x]
    print (U)
    A = []
    for u in U:
        a = len(data[data[feature]==u])
        b = len(data_Imputed[data_Imputed[feature]==u])
        A.append([u, a, b, b-a])
    a = data[feature].isna().sum()
    b = data_Imputed[feature].isna().sum()
    A.append(['NaN', a, b, 0])
    A = pd.DataFrame(A, columns=['Value', 'Original', 'Imputed', 'Difference'])
    display(A)
    print ()


# Test_Accuracy

In [9]:
def Compare_Imputation_Methods_Part_1():
    print ()
    print ('Compare_Imputation_Methods_Part_1()')
    data = Get_Data()
    data.drop(columns=['CASENUM', 'VEH_NO', 'PER_NO'], inplace=True)
    print (data.shape)

    # Drop all samples with missing data, so we have ground truth
    data.replace({'Unknown':np.nan}, inplace=True)
    data.dropna(inplace=True)
    data.reset_index(inplace=True, drop=True)
    for feature in data:
        data[feature] = pd.to_numeric(data[feature])
    data.astype('int64')

    data_Ground_Truth = data.copy(deep=True)
    for feature in data_Ground_Truth:
        data_Ground_Truth[feature] = pd.to_numeric(data_Ground_Truth[feature])
    data_Ground_Truth = data_Ground_Truth.astype('int64')
    print ('data_Ground_Truth.shape')
    print (data_Ground_Truth.shape)
    display(data_Ground_Truth.head())

    # Randomly pick 15% of the values from each row
    # and set them to be missing
    print ('Remove 15% of values from each row')
    frac = .15
    N = data.shape[0] * frac # Number of NaN in each feature
    for c in data.columns:
        idx = np.random.choice(a=data.index, size=int(len(data) * frac))
        data.loc[idx, c] = np.nan
    data_NaN = data.copy(deep=True)
    print ('data_NaN.shape')
    print (data_NaN.shape)
    display(data_NaN.head())

    data_IVEware = data.fillna('')
    data_IVEware.to_csv('../../Big_Files/data_IVEware.txt', sep='\t', index=False)
    
    data_Mode = pd.DataFrame()
    for feature in data:
        data_Mode[feature] = data[feature].fillna(data[feature].mode()[0])
    data_Mode = data_Mode.astype('int64')
    print ('data_Mode.shape')
    print (data_Mode.shape)
    display(data_Mode.head())
    
    data_RF = Impute_Round_Robin(data)
    data_RF.sort_index(inplace=True)
    data_RF = data_RF[data.columns]  
    data_RF = data_RF.astype('int64')
    
    print ('data_RF.shape')
    print (data_RF.shape)
    display(data_RF.head())
#    print ()

    return data_Ground_Truth, data_NaN, data_RF, data_Mode

def Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
):
    print ('Compare_Imputation_Methods_Part_2')
    A = []
    for feature in data_NaN:
        N = data_NaN[feature].isna().sum()
#        print (feature, N)
#        print ()
        D = data_Ground_Truth[feature] != data_RF[feature]
        d = D.sum()
        E = data_Ground_Truth[feature] != data_Mode[feature]
        e = E.sum()
        F = data_Ground_Truth[feature] != data_IVEware[feature]
        f = F.sum()
        G = data_RF[feature] != data_Mode[feature]
        g = G.sum()
        H = data_RF[feature] != data_IVEware[feature]
        h = H.sum()
        I = data_Mode[feature] != data_IVEware[feature]
        i = I.sum()
        print (feature, N, d, e, f, g, h, i)
        print (
            feature, 
            data_Ground_Truth.dtypes[feature],
            data_NaN.dtypes[feature],
            data_RF.dtypes[feature],
            data_Mode.dtypes[feature],
            data_IVEware.dtypes[feature],
        )
        A.append([
            feature, N, 
            d, int(d/N*100), 
            e, int(e/N*100), 
            f, int(f/N*100),
            g, int(g/N*100),
            h, int(h/N*100),
            i, int(i/N*100),
        ])
#        print (D[:10])
        print ()
    print ()
    
    A = sorted(A, key=lambda x:x[3])
    B = pd.DataFrame(
        A, 
        columns=[
            'Feature', 'nNaN', 
            'nRF Incorrect', 'pRF Incorrect', 
            'nMode Incorrect', 'pMode Incorrect', 
            'nIVEware Incorrect', 'pIVEware Incorrect',
            'RF and Mode Different', 'RF v/s Mode %',
            'RF and IVEware Different', 'RF v/s IVEware %',
            'Mode and IVEware Different', 'Mode v/s IVEware %'
        ]
    )
    display(B)
    a = sum([x[1] for x in A])
    b = sum([x[2] for x in A])
    c = sum([x[4] for x in A])
    d = sum([x[6] for x in A])
    e = round(b/a*100,2)
    f = round(c/a*100,2)
    g = round(d/a*100,2)
    s = len(A) - sum([x[8] for x in A])
    t = len(A) - sum([x[9] for x in A])
    u = len(A) - sum([x[10] for x in A])

    RF_less_Mode = sum([x[2] < x[4] for x in A])
    RF_equal_Mode = sum([x[2] == x[4] for x in A])
    RF_greater_Mode = sum([x[2] > x[4] for x in A])

    RF_less_IVEware = sum([x[2] < x[6] for x in A])
    RF_equal_IVEware = sum([x[2] == x[6] for x in A])
    RF_greater_IVEware = sum([x[2] > x[6] for x in A])

    Mode_less_IVEware = sum([x[4] < x[6] for x in A])
    Mode_equal_IVEware = sum([x[4] == x[6] for x in A])
    Mode_greater_IVEware = sum([x[4] > x[6] for x in A])

    print ()
    print ('Error RF = ', e)
    print ('Error Mode = ', f)
    print ('Error IVEware = ', g)
    print ('nRF > nMode: ', s)
    print ('nRF > nIVEware: ', t)
    print ('nModel > nIVEware: ', u)
    print ('Compare RF to Mode: ', RF_less_Mode, RF_equal_Mode, RF_greater_Mode)
    print ('Compare RF to IVEware: ', RF_less_IVEware, RF_equal_IVEware, RF_greater_IVEware)
    print ('Compare Mode to IVEware: ', Mode_less_IVEware, Mode_equal_IVEware, Mode_greater_IVEware)
    print ()
    print ('Number of NaN in data_NaN: ', data_NaN.isna().sum().sum())
    print ('RF Different from Mode: ', sum([x[8] for x in A]))
    print ('RF Different from IVEware: ', sum([x[10] for x in A]))
    print ('Mode Different from IVEware: ', sum([x[12] for x in A]))
        
    display(Audio(sound_file, autoplay=True))
    
    
        

In [10]:
data_Ground_Truth, data_NaN, data_RF, data_Mode = Compare_Imputation_Methods_Part_1()


Compare_Imputation_Methods_Part_1()
Get_Data
data.shape =  (619027, 107)
Drop Imputed Columns
data.shape =  (619027, 82)

(619027, 79)
data_Ground_Truth.shape
(218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,0,0,2,...,1,1,0,3,2,1,1,3,1,4
1,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,2,1,1,3,0,3
2,3,0,2,0,0,2,0,1,0,2,...,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,...,1,1,0,3,2,1,2,3,1,1


Remove 15% of values from each row
data_NaN.shape
(218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1.0,,3.0,0.0,0.0,2.0,1.0,,,2.0,...,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
1,3.0,0.0,,0.0,0.0,,0.0,1.0,0.0,2.0,...,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,,3.0
2,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,,...,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,1.0,0.0,2.0,...,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0
4,,0.0,3.0,0.0,0.0,2.0,,1.0,1.0,1.0,...,1.0,1.0,0.0,3.0,2.0,,2.0,3.0,1.0,1.0


data_Mode.shape
(218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,1,0,2,...,1,1,0,3,2,1,1,3,1,4
1,3,0,3,0,0,2,0,1,0,2,...,1,1,0,3,2,1,1,3,1,3
2,3,0,2,0,0,2,0,1,0,1,...,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,...,1,1,0,3,2,1,2,3,1,1


Impute()


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1.0,,3.0,0.0,0.0,2.0,1.0,,,2.0,,,1.0,0.0,2.0,0.0,0.0,,1.0,3.0,2.0,,0.0,2016.0,3.0,2.0,,0.0,1.0,5.0,1.0,0.0,1.0,1.0,1.0,,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,2.0,4.0,,3.0,1.0,1.0,,3.0,,1.0,1.0,0.0,0.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
1,3.0,0.0,,0.0,0.0,,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,,1.0,1.0,3.0,1.0,2016.0,3.0,,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,,2.0,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,,3.0
2,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,,1.0,2.0,1.0,0.0,1.0,0.0,,,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,,0.0,3.0,1.0,,1.0,1.0,1.0,1.0,0.0,,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,0.0,,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,,1.0,3.0,1.0,2016.0,,2.0,,0.0,0.0,,1.0,0.0,1.0,1.0,1.0,1.0,0.0,,1.0,0.0,1.0,,0.0,0.0,8.0,0.0,2.0,1.0,,,0.0,0.0,1.0,,1.0,,0.0,,1.0,1.0,,5.0,2.0,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0
4,,0.0,3.0,0.0,0.0,2.0,,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,,0.0,1.0,4.0,2.0,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,4.0,5.0,1.0,0.0,1.0,1.0,1.0,,0.0,0.0,,0.0,1.0,1.0,,,6.0,2.0,4.0,3.0,3.0,,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,,2.0,3.0,1.0,1.0
5,3.0,0.0,3.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,,1.0,1.0,0.0,2.0,0.0,,1.0,4.0,,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,4.0,1.0,1.0,,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,,1.0,2.0,1.0,0.0,2.0,1.0,,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,,3.0,2.0,1.0,2.0,3.0,1.0,
6,3.0,0.0,3.0,0.0,,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,,3.0,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,,,1.0,1.0,1.0,,1.0,0.0,0.0,1.0,0.0,4.0,,1.0,4.0,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,0.0,2.0,1.0,,1.0,4.0,2.0,,1.0,0.0,,0.0,0.0,1.0,1.0,,,1.0,1.0,0.0,0.0,1.0,2.0
7,2.0,0.0,3.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0,,3.0,1.0,0.0,2.0,0.0,0.0,0.0,1.0,3.0,3.0,1.0,1.0,2016.0,3.0,9.0,,0.0,3.0,1.0,1.0,,1.0,,1.0,1.0,,0.0,1.0,0.0,4.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,3.0,4.0,3.0,,,1.0,1.0,1.0,,2.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,1.0,,,3.0,2.0,,1.0,3.0,0.0,0.0
8,2.0,0.0,3.0,0.0,0.0,,0.0,,0.0,2.0,,,1.0,0.0,1.0,0.0,0.0,,1.0,3.0,1.0,,1.0,2016.0,3.0,2.0,0.0,0.0,0.0,5.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,,3.0,8.0,,4.0,3.0,3.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,7.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,,1.0,,3.0,2.0,1.0,1.0,3.0,,
9,3.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,,2.0,2.0,1.0,0.0,2.0,0.0,0.0,2.0,1.0,3.0,3.0,1.0,,2016.0,3.0,2.0,,0.0,,5.0,1.0,1.0,1.0,1.0,,1.0,0.0,0.0,,0.0,1.0,1.0,1.0,4.0,0.0,1.0,4.0,1.0,3.0,1.0,4.0,1.0,,,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,1.0,2.0,1.0,,3.0,1.0,2.0




Complete[]


[]


Missing[]


[['HAZ_PLAC', 30282],
 ['NUM_INJV', 30288],
 ['REL_ROAD', 30300],
 ['SPEC_USE', 30303],
 ['HAZ_INV', 30309],
 ['PERMVIT', 30310],
 ['REST_MIS', 30326],
 ['ACC_TYPE', 30328],
 ['IMPACT1', 30333],
 ['EJECTION', 30334],
 ['PERNOTMVIT', 30335],
 ['TOWED', 30335],
 ['RELJCT2', 30338],
 ['VE_FORMS', 30340],
 ['MODEL', 30340],
 ['ROLLOVER', 30342],
 ['HOUR', 30347],
 ['HIT_RUN', 30351],
 ['NUMOCCS', 30353],
 ['REGION', 30354],
 ['VEH_ALCH', 30356],
 ['HOSPITAL', 30356],
 ['PCRASH5', 30363],
 ['PSU', 30364],
 ['MAKE', 30364],
 ['PJ', 30365],
 ['VTRAFWAY', 30367],
 ['REST_USE', 30367],
 ['SEX', 30367],
 ['SEAT_POS', 30375],
 ['MAN_COLL', 30377],
 ['AGE', 30377],
 ['PVH_INVL', 30379],
 ['YEAR', 30380],
 ['VSURCOND', 30380],
 ['VE_TOTAL', 30383],
 ['P_CRASH1', 30384],
 ['WEATHER', 30385],
 ['BUS_USE', 30388],
 ['ALCOHOL', 30390],
 ['ALC_RES', 30390],
 ['J_KNIFE', 30391],
 ['ALC_STATUS', 30393],
 ['TYP_INT', 30395],
 ['VTCONT_F', 30395],
 ['INT_HWY', 30398],
 ['VALIGN', 30399],
 ['EMER_USE', 30402


Make data_Mode

HAZ_PLAC 30282 0.0
NUM_INJV 30288 3.0
REL_ROAD 30300 1.0
SPEC_USE 30303 1.0
HAZ_INV 30309 0.0
PERMVIT 30310 2.0
REST_MIS 30326 1.0
ACC_TYPE 30328 3.0
IMPACT1 30333 1.0
EJECTION 30334 1.0
PERNOTMVIT 30335 0.0
TOWED 30335 2.0
RELJCT2 30338 1.0
VE_FORMS 30340 2.0
MODEL 30340 1.0
ROLLOVER 30342 1.0
HOUR 30347 3.0
HIT_RUN 30351 0.0
NUMOCCS 30353 3.0
REGION 30354 3.0
VEH_ALCH 30356 1.0
HOSPITAL 30356 0.0
PCRASH5 30363 3.0
PSU 30364 3.0
MAKE 30364 0.0
PJ 30365 3.0
VTRAFWAY 30367 0.0
REST_USE 30367 1.0
SEX 30367 1.0
SEAT_POS 30375 3.0
MAN_COLL 30377 3.0
AGE 30377 2.0
PVH_INVL 30379 0.0
YEAR 30380 2017.0
VSURCOND 30380 1.0
VE_TOTAL 30383 2.0
P_CRASH1 30384 1.0
WEATHER 30385 1.0
BUS_USE 30388 1.0
ALCOHOL 30390 2.0
ALC_RES 30390 0.0
J_KNIFE 30391 1.0
ALC_STATUS 30393 1.0
TYP_INT 30395 1.0
VTCONT_F 30395 1.0
INT_HWY 30398 0.0
VALIGN 30399 1.0
EMER_USE 30402 1.0
FIRE_EXP 30402 1.0
MAK_MOD 30402 3.0
PER_TYP 30402 2.0
DAY_WEEK 30403 1.0
VSPD_LIM 30403 2.0
PEDS 30404 0.0
MAX_SEV 30404

Unnamed: 0,HAZ_PLAC,NUM_INJV,REL_ROAD,SPEC_USE,HAZ_INV,PERMVIT,REST_MIS,ACC_TYPE,IMPACT1,EJECTION,PERNOTMVIT,TOWED,RELJCT2,VE_FORMS,MODEL,ROLLOVER,HOUR,HIT_RUN,NUMOCCS,REGION,VEH_ALCH,HOSPITAL,PCRASH5,PSU,MAKE,PJ,VTRAFWAY,REST_USE,SEX,SEAT_POS,MAN_COLL,AGE,PVH_INVL,YEAR,VSURCOND,VE_TOTAL,P_CRASH1,WEATHER,BUS_USE,ALCOHOL,ALC_RES,J_KNIFE,ALC_STATUS,TYP_INT,VTCONT_F,INT_HWY,VALIGN,EMER_USE,FIRE_EXP,MAK_MOD,PER_TYP,DAY_WEEK,VSPD_LIM,PEDS,MAX_SEV,VEH_AGE,P_CRASH2,AIR_BAG,MAX_VSEV,PCRASH4,HARM_EV,INJ_SEV,WRK_ZONE,SPEEDREL,URBANICITY,RELJCT1,TOW_VEH,BODY_TYP,VPROFILE,DR_PRES,HAZ_CNO,VTRAFCON,M_HARM,LGT_COND,MONTH,SCH_BUS,HAZ_REL,NUM_INJ,CARGO_BT
0,0.0,3.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,4.0,1.0,1.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,6.0,3.0,0.0,1.0,1.0,3.0,2.0,3.0,0.0,2016.0,1.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,0.0,4.0,1.0,0.0,2.0,1.0,1.0,3.0,0.0,1.0,2.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
1,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,1.0,3.0,0.0,5.0,3.0,1.0,0.0,1.0,3.0,8.0,1.0,3.0,1.0,1.0,3.0,1.0,2.0,0.0,2016.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,2.0,1.0,2.0,0.0,2.0,3.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,2.0,0.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0,1.0,1.0,0.0
2,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,1.0,3.0,0.0,5.0,3.0,1.0,1.0,1.0,1.0,8.0,1.0,3.0,1.0,0.0,1.0,1.0,2.0,0.0,2016.0,1.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,5.0,0.0,2.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,0.0,0.0
3,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,2.0,1.0,1.0,2.0,1.0,3.0,0.0,3.0,3.0,1.0,0.0,1.0,1.0,8.0,3.0,3.0,1.0,1.0,4.0,1.0,2.0,0.0,2016.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,5.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,1.0,0.0
4,0.0,3.0,1.0,1.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,3.0,2.0,1.0,0.0,3.0,4.0,6.0,2.0,0.0,2.0,1.0,3.0,3.0,3.0,0.0,2016.0,2.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,4.0,0.0,2.0,1.0,5.0,1.0,2.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,1.0,0.0
5,0.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,1.0,2.0,1.0,0.0,3.0,4.0,8.0,3.0,0.0,2.0,1.0,3.0,3.0,3.0,0.0,2016.0,2.0,2.0,4.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,4.0,0.0,2.0,0.0,3.0,1.0,0.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,1.0,1.0
6,0.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,1.0,2.0,1.0,0.0,3.0,4.0,8.0,3.0,0.0,0.0,1.0,0.0,3.0,2.0,0.0,2016.0,2.0,2.0,4.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,1.0,1.0,4.0,0.0,2.0,2.0,3.0,0.0,0.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,1.0
7,0.0,3.0,1.0,1.0,0.0,1.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,1.0,1.0,2.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,0.0,3.0,2.0,1.0,0.0,3.0,3.0,2.0,0.0,2016.0,1.0,2.0,4.0,3.0,1.0,9.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,2.0,2.0,1.0,5.0,0.0,0.0,0.0,3.0,1.0,2.0,1.0,1.0,3.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
8,0.0,3.0,0.0,1.0,0.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,4.0,1.0,2.0,0.0,3.0,3.0,1.0,0.0,1.0,1.0,8.0,3.0,0.0,1.0,1.0,3.0,1.0,2.0,0.0,2016.0,1.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,7.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,3.0,0.0,0.0,2.0,0.0,0.0,5.0,2.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
9,0.0,1.0,1.0,1.0,0.0,1.0,1.0,3.0,1.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,0.0,3.0,0.0,1.0,1.0,3.0,3.0,2.0,0.0,2016.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,2.0,1.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,2.0,1.0



Make starting point for data_Imputed
data_Imputed


Unnamed: 0,HAZ_PLAC,NUM_INJV,REL_ROAD,SPEC_USE,HAZ_INV,PERMVIT,REST_MIS,ACC_TYPE,IMPACT1,EJECTION,PERNOTMVIT,TOWED,RELJCT2,VE_FORMS,MODEL,ROLLOVER,HOUR,HIT_RUN,NUMOCCS,REGION,VEH_ALCH,HOSPITAL,PCRASH5,PSU,MAKE,PJ,VTRAFWAY,REST_USE,SEX,SEAT_POS,MAN_COLL,AGE,PVH_INVL,YEAR,VSURCOND,VE_TOTAL,P_CRASH1,WEATHER,BUS_USE,ALCOHOL,ALC_RES,J_KNIFE,ALC_STATUS,TYP_INT,VTCONT_F,INT_HWY,VALIGN,EMER_USE,FIRE_EXP,MAK_MOD,PER_TYP,DAY_WEEK,VSPD_LIM,PEDS,MAX_SEV,VEH_AGE,P_CRASH2,AIR_BAG,MAX_VSEV,PCRASH4,HARM_EV,INJ_SEV,WRK_ZONE,SPEEDREL,URBANICITY,RELJCT1,TOW_VEH,BODY_TYP,VPROFILE,DR_PRES,HAZ_CNO,VTRAFCON,M_HARM,LGT_COND,MONTH,SCH_BUS,HAZ_REL,NUM_INJ,CARGO_BT
0,0.0,3.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,4.0,1.0,1.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,6.0,3.0,0.0,1.0,1.0,3.0,2.0,3.0,0.0,2016.0,1.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,0.0,4.0,1.0,0.0,2.0,1.0,1.0,3.0,0.0,1.0,2.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
1,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,1.0,3.0,0.0,5.0,3.0,1.0,0.0,1.0,3.0,8.0,1.0,3.0,1.0,1.0,3.0,1.0,2.0,0.0,2016.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,2.0,1.0,2.0,0.0,2.0,3.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,2.0,0.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0,1.0,1.0,0.0
2,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,1.0,3.0,0.0,5.0,3.0,1.0,1.0,1.0,1.0,8.0,1.0,3.0,1.0,0.0,1.0,1.0,2.0,0.0,2016.0,1.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,5.0,0.0,2.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,1.0,0.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,0.0,0.0
3,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,2.0,1.0,1.0,2.0,1.0,3.0,0.0,3.0,3.0,1.0,0.0,1.0,1.0,8.0,3.0,3.0,1.0,1.0,4.0,1.0,2.0,0.0,2016.0,2.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,5.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,1.0,0.0
4,0.0,3.0,1.0,1.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,3.0,2.0,1.0,0.0,3.0,4.0,6.0,2.0,0.0,2.0,1.0,3.0,3.0,3.0,0.0,2016.0,2.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,4.0,0.0,2.0,1.0,5.0,1.0,2.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,1.0,0.0
5,0.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,1.0,2.0,1.0,0.0,3.0,4.0,8.0,3.0,0.0,2.0,1.0,3.0,3.0,3.0,0.0,2016.0,2.0,2.0,4.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,4.0,0.0,2.0,0.0,3.0,1.0,0.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,1.0,1.0
6,0.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,1.0,2.0,1.0,0.0,3.0,4.0,8.0,3.0,0.0,0.0,1.0,0.0,3.0,2.0,0.0,2016.0,2.0,2.0,4.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,1.0,1.0,4.0,0.0,2.0,2.0,3.0,0.0,0.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,1.0,1.0,0.0,1.0
7,0.0,3.0,1.0,1.0,0.0,1.0,1.0,3.0,4.0,1.0,0.0,2.0,1.0,2.0,1.0,1.0,2.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,0.0,3.0,2.0,1.0,0.0,3.0,3.0,2.0,0.0,2016.0,1.0,2.0,4.0,3.0,1.0,9.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,2.0,2.0,1.0,5.0,0.0,0.0,0.0,3.0,1.0,2.0,1.0,1.0,3.0,0.0,1.0,2.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
8,0.0,3.0,0.0,1.0,0.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,4.0,1.0,2.0,0.0,3.0,3.0,1.0,0.0,1.0,1.0,8.0,3.0,0.0,1.0,1.0,3.0,1.0,2.0,0.0,2016.0,1.0,2.0,1.0,1.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,3.0,2.0,1.0,7.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,3.0,0.0,0.0,2.0,0.0,0.0,5.0,2.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,0.0,0.0
9,0.0,1.0,1.0,1.0,0.0,1.0,1.0,3.0,1.0,1.0,0.0,2.0,1.0,2.0,4.0,1.0,3.0,0.0,3.0,3.0,1.0,0.0,3.0,1.0,0.0,3.0,0.0,1.0,1.0,3.0,3.0,2.0,0.0,2016.0,2.0,2.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,2.0,1.0,4.0,0.0,0.0,2.0,4.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,2.0,1.0



Start Loop

['HAZ_PLAC', 30282]
30282


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,187794,218073,30279
1,1.0,45,48,3
2,,30282,0,0



['NUM_INJV', 30288]
30288


array([3., 3., 3., ..., 3., 1., 3.])

(218121, 79)

[1.0, 3.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,45313,51768,6455
1,3.0,121902,145735,23833
2,0.0,20602,20602,0
3,2.0,16,16,0
4,,30288,0,0



['REL_ROAD', 30300]
30300


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,170989,199638,28649
1,0.0,15444,17095,1651
2,2.0,1388,1388,0
3,,30300,0,0



['SPEC_USE', 30303]
30303


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,186359,216662,30303
1,2.0,953,953,0
2,0.0,506,506,0
3,,30303,0,0



['HAZ_INV', 30309]
30309


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,187769,218078,30309
1,1.0,43,43,0
2,,30309,0,0



['PERMVIT', 30310]
30310


array([2., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,115410,144440,29030
1,1.0,58096,58096,0
2,0.0,14305,15585,1280
3,,30310,0,0



['REST_MIS', 30326]
30326


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,183521,213847,30326
1,0.0,4274,4274,0
2,,30326,0,0



['ACC_TYPE', 30328]
30328


array([3., 4., 3., ..., 3., 3., 3.])

(218121, 79)

[1.0, 0.0, 4.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,27859,32564,4705
1,0.0,34090,39050,4960
2,4.0,38334,43217,4883
3,3.0,44254,57704,13450
4,2.0,43256,45586,2330
5,,30328,0,0



['IMPACT1', 30333]
30333


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 4.0, 0.0, 3.0, 2.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,76956,100130,23174
1,4.0,46326,53485,7159
2,0.0,16179,16179,0
3,3.0,27089,27089,0
4,2.0,13590,13590,0
5,5.0,7648,7648,0
6,,30333,0,0



['EJECTION', 30334]
30334


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,183483,213817,30334
1,0.0,4304,4304,0
2,,30334,0,0



['PERNOTMVIT', 30335]
30335


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,182343,212678,30335
1,1.0,5443,5443,0
2,,30335,0,0



['TOWED', 30335]
30335


array([2., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[0.0, 2.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,69554,76795,7241
1,2.0,109158,132252,23094
2,3.0,8144,8144,0
3,1.0,930,930,0
4,,30335,0,0



['RELJCT2', 30338]
30338


array([1., 1., 1., ..., 1., 3., 1.])

(218121, 79)

[1.0, 0.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,78187,98793,20606
1,0.0,49591,55582,5991
2,2.0,20000,20000,0
3,3.0,40005,43746,3741
4,,30338,0,0



['VE_FORMS', 30340]
30340


array([2., 2., 2., ..., 2., 1., 2.])

(218121, 79)

[2.0, 1.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,135000,162096,27096
1,1.0,24014,27258,3244
2,3.0,21510,21510,0
3,4.0,7257,7257,0
4,,30340,0,0



['MODEL', 30340]
30340


array([4., 3., 4., ..., 1., 2., 1.])

(218121, 79)

[4.0, 2.0, 1.0, 0.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,38301,46359,8058
1,2.0,38389,41425,3036
2,1.0,40907,56213,15306
3,0.0,33518,34442,924
4,3.0,36666,39682,3016
5,,30340,0,0



['ROLLOVER', 30342]
30342


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,182838,213180,30342
1,0.0,4941,4941,0
2,,30342,0,0



['HOUR', 30347]
30347


array([3., 3., 3., ..., 3., 3., 3.])

(218121, 79)

[1.0, 3.0, 2.0, 4.0, 6.0, 0.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,32842,32842,0
1,3.0,52184,81822,29638
2,2.0,46168,46466,298
3,4.0,22844,22844,0
4,6.0,10140,10531,391
5,0.0,6372,6372,0
6,5.0,17224,17244,20
7,,30347,0,0



['HIT_RUN', 30351]
30351


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,186825,217176,30351
1,1.0,945,945,0
2,,30351,0,0



['NUMOCCS', 30353]
30353


array([3., 1., 3., ..., 3., 3., 3.])

(218121, 79)

[3.0, 5.0, 1.0, 6.0, 2.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,97083,121723,24640
1,5.0,21432,21432,0
2,1.0,48896,54609,5713
3,6.0,17515,17515,0
4,2.0,2233,2233,0
5,0.0,471,471,0
6,4.0,138,138,0
7,,30353,0,0



['REGION', 30354]
30354


array([3., 3., 3., ..., 3., 3., 3.])

(218121, 79)

[3.0, 2.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,140907,171261,30354
1,2.0,32082,32082,0
2,4.0,8042,8042,0
3,1.0,6736,6736,0
4,,30354,0,0



['VEH_ALCH', 30356]
30356


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,183561,213896,30335
1,0.0,4204,4225,21
2,,30356,0,0



['HOSPITAL', 30356]
30356


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,157665,187478,29813
1,1.0,30100,30643,543
2,,30356,0,0



['PCRASH5', 30363]
30363


array([3., 3., 3., ..., 3., 1., 3.])

(218121, 79)

[3.0, 1.0, 2.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,150097,178628,28531
1,1.0,16018,17850,1832
2,2.0,17725,17725,0
3,4.0,3560,3560,0
4,0.0,358,358,0
5,,30363,0,0



['PSU', 30364]
30364


array([3., 4., 3., ..., 3., 3., 3.])

(218121, 79)

[1.0, 4.0, 3.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,33470,33470,0
1,4.0,40952,46285,5333
2,3.0,53089,76429,23340
3,2.0,44691,46382,1691
4,0.0,15555,15555,0
5,,30364,0,0



['MAKE', 30364]
30364


array([8., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[6.0, 8.0, 0.0, 2.0, 4.0, 1.0, 7.0, 3.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,6.0,26168,26168,0
1,8.0,36083,43659,7576
2,0.0,37074,59862,22788
3,2.0,24661,24661,0
4,4.0,22143,22143,0
5,1.0,27833,27833,0
6,7.0,7519,7519,0
7,3.0,3493,3493,0
8,5.0,2783,2783,0
9,,30364,0,0



['PJ', 30365]
30365


array([3., 3., 3., ..., 2., 3., 3.])

(218121, 79)

[3.0, 1.0, 2.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,48273,60410,12137
1,1.0,25411,25411,0
2,2.0,45105,56386,11281
3,0.0,28383,29642,1259
4,4.0,40584,46272,5688
5,,30365,0,0



['VTRAFWAY', 30367]
30367


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 3.0, 2.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,83608,111508,27900
1,3.0,41296,43436,2140
2,2.0,10906,10906,0
3,4.0,15484,15811,327
4,1.0,36460,36460,0
5,,30367,0,0



['REST_USE', 30367]
30367


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,165435,195796,30361
1,2.0,11480,11480,0
2,0.0,10839,10845,6
3,,30367,0,0



['SEX', 30367]
30367


array([1., 1., 1., ..., 0., 0., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,97614,110856,13242
1,0.0,90140,107265,17125
2,,30367,0,0



['SEAT_POS', 30375]
30375


array([3., 3., 3., ..., 3., 3., 3.])

(218121, 79)

[3.0, 1.0, 4.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,131500,161867,30367
1,1.0,30942,30949,7
2,4.0,14662,14663,1
3,0.0,1491,1491,0
4,2.0,9151,9151,0
5,,30375,0,0



['MAN_COLL', 30377]
30377


array([3., 1., 3., ..., 3., 3., 3.])

(218121, 79)

[2.0, 1.0, 3.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,55713,63292,7579
1,1.0,26148,30029,3881
2,3.0,75937,94854,18917
3,4.0,22675,22675,0
4,0.0,7271,7271,0
5,,30377,0,0



['AGE', 30377]
30377


array([2., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[3.0, 2.0, 0.0, 1.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,34770,34770,0
1,2.0,112795,143125,30330
2,0.0,18743,18790,47
3,1.0,14426,14426,0
4,4.0,7010,7010,0
5,,30377,0,0



['PVH_INVL', 30379]
30379


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,184873,215252,30379
1,1.0,2869,2869,0
2,,30379,0,0



['YEAR', 30380]
30380


array([2016., 2017., 2016., ..., 2016., 2016., 2016.])

(218121, 79)

[2016.0, 2017.0, 2018.0, 2019.0, 2020.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2016.0,45571,70885,25314
1,2017.0,45709,50149,4440
2,2018.0,37800,37831,31
3,2019.0,28721,28806,85
4,2020.0,29940,30450,510
5,,30380,0,0



['VSURCOND', 30380]
30380


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,151176,181391,30215
1,2.0,27002,27002,0
2,3.0,9142,9307,165
3,0.0,421,421,0
4,,30380,0,0



['VE_TOTAL', 30383]
30383


array([1., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[1.0, 2.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,21960,25001,3041
1,2.0,136259,163601,27342
2,3.0,22038,22038,0
3,4.0,7481,7481,0
4,,30383,0,0



['P_CRASH1', 30384]
30384


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 4.0, 3.0, 5.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,92297,122334,30037
1,4.0,32715,33062,347
2,3.0,13428,13428,0
3,5.0,17147,17147,0
4,0.0,13107,13107,0
5,2.0,19043,19043,0
6,,30384,0,0



['WEATHER', 30385]
30385


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[2.0, 1.0, 3.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,18304,18304,0
1,1.0,133668,164053,30385
2,3.0,32282,32282,0
3,4.0,2555,2555,0
4,0.0,927,927,0
5,,30385,0,0



['BUS_USE', 30388]
30388


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,186991,217379,30388
1,2.0,722,722,0
2,0.0,20,20,0
3,,30388,0,0



['ALCOHOL', 30390]
30390


array([2., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[2.0, 9.0, 1.0, 8.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,172246,202581,30335
1,9.0,8343,8343,0
2,1.0,7138,7193,55
3,8.0,4,4,0
4,,30390,0,0



['ALC_RES', 30390]
30390


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,186759,217149,30390
1,1.0,972,972,0
2,,30390,0,0



['J_KNIFE', 30391]
30391


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,183456,213843,30387
1,2.0,4192,4196,4
2,0.0,82,82,0
3,,30391,0,0



['ALC_STATUS', 30393]
30393


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,184295,214688,30393
1,0.0,3433,3433,0
2,,30393,0,0



['TYP_INT', 30395]
30395


array([2., 1., 1., ..., 2., 1., 2.])

(218121, 79)

[0.0, 1.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,25031,25031,0
1,1.0,103139,125359,22220
2,2.0,58095,66270,8175
3,3.0,1461,1461,0
4,,30395,0,0



['VTCONT_F', 30395]
30395


array([3., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 3.0, 0.0, 4.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,123307,147881,24574
1,3.0,64124,69945,5821
2,0.0,186,186,0
3,4.0,102,102,0
4,2.0,7,7,0
5,,30395,0,0



['INT_HWY', 30398]
30398


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,165906,196304,30398
1,1.0,21817,21817,0
2,,30398,0,0



['VALIGN', 30399]
30399


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,165705,196104,30399
1,0.0,15976,15976,0
2,2.0,6041,6041,0
3,,30399,0,0



['EMER_USE', 30402]
30402


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,187240,217642,30402
1,2.0,199,199,0
2,0.0,280,280,0
3,,30402,0,0



['FIRE_EXP', 30402]
30402


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,187454,217856,30402
1,0.0,265,265,0
2,,30402,0,0



['MAK_MOD', 30402]
30402


array([4., 3., 1., ..., 4., 1., 1.])

(218121, 79)

[4.0, 0.0, 2.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,34796,39270,4474
1,0.0,32705,33494,789
2,2.0,39431,45153,5722
3,3.0,40616,49322,8706
4,1.0,40171,50882,10711
5,,30402,0,0



['PER_TYP', 30402]
30402


array([2., 2., 2., ..., 2., 2., 1.])

(218121, 79)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,131302,159717,28415
1,1.0,56416,58403,1987
2,0.0,1,1,0
3,,30402,0,0



['DAY_WEEK', 30403]
30403


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,141952,172355,30403
1,0.0,45766,45766,0
2,,30403,0,0



['VSPD_LIM', 30403]
30403


array([1., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[1.0, 5.0, 4.0, 7.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,28052,31076,3024
1,5.0,33665,33665,0
2,4.0,19986,19986,0
3,7.0,33142,34548,1406
4,2.0,46537,72510,25973
5,3.0,4866,4866,0
6,0.0,21470,21470,0
7,,30403,0,0



['PEDS', 30404]
30404


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,182718,213100,30382
1,1.0,4830,4852,22
2,2.0,169,169,0
3,,30404,0,0



['MAX_SEV', 30404]
30404


array([0., 1., 0., ..., 1., 0., 0.])

(218121, 79)

[2.0, 0.0, 3.0, 4.0, 1.0, 5.0, 6.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,29112,29634,522
1,0.0,87644,108888,21244
2,3.0,19812,19812,0
3,4.0,3232,3232,0
4,1.0,47058,55696,8638
5,5.0,854,854,0
6,6.0,5,5,0
7,,30404,0,0



['VEH_AGE', 30404]
30404


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[4.0, 3.0, 1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,12146,12146,0
1,3.0,27254,27254,0
2,1.0,49391,49391,0
3,2.0,42923,43883,960
4,0.0,56003,85447,29444
5,,30404,0,0



['P_CRASH2', 30405]
30405


array([0., 1., 3., ..., 0., 1., 3.])

(218121, 79)

[1.0, 0.0, 5.0, 3.0, 4.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,31314,37649,6335
1,0.0,35348,40211,4863
2,5.0,34463,39541,5078
3,3.0,42142,56271,14129
4,4.0,24194,24194,0
5,2.0,20255,20255,0
6,,30405,0,0



['AIR_BAG', 30405]
30405


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,39203,39203,0
1,1.0,148513,178918,30405
2,,30405,0,0



['MAX_VSEV', 30406]
30406


array([2., 2., 2., ..., 2., 2., 0.])

(218121, 79)

[2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,121735,147152,25417
1,0.0,32713,36931,4218
2,1.0,33267,34038,771
3,,30406,0,0



['PCRASH4', 30408]
30408


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,7963,7963,0
1,1.0,179750,210158,30408
2,,30408,0,0



['HARM_EV', 30410]
30410


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[3.0, 1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,13072,15215,2143
1,1.0,161532,189179,27647
2,2.0,10302,10922,620
3,0.0,2805,2805,0
4,,30410,0,0



['INJ_SEV', 30412]
30412


array([3., 0., 3., ..., 3., 0., 3.])

(218121, 79)

[3.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,134008,161997,27989
1,0.0,26135,28554,2419
2,1.0,27566,27570,4
3,,30412,0,0



['WRK_ZONE', 30413]
30413


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,183180,213593,30413
1,1.0,4155,4155,0
2,2.0,330,330,0
3,3.0,43,43,0
4,,30413,0,0



['SPEEDREL', 30414]
30414


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,179755,210169,30414
1,0.0,7952,7952,0
2,,30414,0,0



['URBANICITY', 30417]
30417


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,47555,47557,2
1,1.0,140149,170564,30415
2,,30417,0,0



['RELJCT1', 30418]
30418


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0, 8.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,149212,179630,30418
1,1.0,6245,6245,0
2,8.0,32205,32205,0
3,9.0,41,41,0
4,,30418,0,0



['TOW_VEH', 30419]
30419


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,183403,213822,30419
1,1.0,4299,4299,0
2,,30419,0,0



['BODY_TYP', 30420]
30420


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[5.0, 3.0, 1.0, 2.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,5.0,41941,47954,6013
1,3.0,31582,31582,0
2,1.0,72722,97129,24407
3,2.0,16776,16776,0
4,0.0,14253,14253,0
5,4.0,10427,10427,0
6,,30420,0,0



['VPROFILE', 30428]
30428


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,153270,183071,29801
1,2.0,20328,20955,627
2,0.0,14095,14095,0
3,,30428,0,0



['DR_PRES', 30429]
30429


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,187655,218084,30429
1,0.0,37,37,0
2,,30429,0,0



['HAZ_CNO', 30429]
30429


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,187646,218075,30429
1,2.0,44,44,0
2,0.0,2,2,0
3,,30429,0,0



['VTRAFCON', 30433]
30433


array([1., 1., 1., ..., 1., 2., 1.])

(218121, 79)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,123335,151847,28512
1,2.0,40360,42281,1921
2,3.0,18811,18811,0
3,0.0,5182,5182,0
4,,30433,0,0



['M_HARM', 30435]
30435


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,160428,187717,27289
1,0.0,15962,18205,2243
2,2.0,11296,12199,903
3,,30435,0,0



['LGT_COND', 30442]
30442


array([3., 3., 3., ..., 3., 3., 3.])

(218121, 79)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,138257,168699,30442
1,2.0,4293,4293,0
2,0.0,15898,15898,0
3,1.0,29231,29231,0
4,,30442,0,0



['MONTH', 30456]
30456


array([2., 2., 2., ..., 2., 2., 2.])

(218121, 79)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,59162,63768,4606
1,1.0,62803,64808,2005
2,2.0,65700,89545,23845
3,,30456,0,0



['SCH_BUS', 30466]
30466


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,186778,217244,30466
1,1.0,877,877,0
2,,30466,0,0



['HAZ_REL', 30472]
30472


array([1., 1., 1., ..., 1., 1., 1.])

(218121, 79)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,187599,218071,30472
1,2.0,39,39,0
2,0.0,11,11,0
3,,30472,0,0



['NUM_INJ', 30483]
30483


array([0., 0., 0., ..., 1., 0., 0.])

(218121, 79)

[1.0, 0.0, 2.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,55547,65739,10192
1,0.0,87601,107892,20291
2,2.0,24176,24176,0
3,3.0,10643,10643,0
4,4.0,9671,9671,0
5,,30483,0,0



['CARGO_BT', 30525]
30525


array([0., 0., 0., ..., 0., 0., 0.])

(218121, 79)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,180810,211335,30525
1,1.0,6786,6786,0
2,,30525,0,0




data_RF.shape
(218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,0,1,3,2,1,0,2016,3,2,0,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
1,3,0,3,0,0,0,0,1,0,2,1,2,1,0,1,0,0,1,3,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,2,2,1,1,3,2,0,0,1,1,0,3,2,1,1,3,1,3
2,3,0,2,0,0,2,0,1,0,1,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,0,1,3,1,2016,3,2,1,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,1,1,1,1,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1


## Now do IVEware Imputation:  IVE_12_22_22.xml

In [11]:
data_IVEware = pd.read_csv('../../Big_Files/data_IVEware.csv')
data_IVEware.drop(columns='Unnamed: 0', inplace=True)

print ('data_Ground_Truth', data_Ground_Truth.shape)
display(data_Ground_Truth.head(10))
print ('data_NaN', data_NaN.shape)
display(data_NaN.head(10))
print ('data_RF', data_RF.shape)
display(data_RF.head(10))
print ('data_Mode', data_Mode.shape)
display(data_Mode.head(10))
print ('data_IVEware', data_IVEware.shape)
display(data_IVEware.head(10))


data_Ground_Truth (218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
1,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,2,1,1,3,0,3
2,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
5,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,2
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,0,1,1,0,0,1,1,0,0,1,2
7,2,0,3,0,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
8,2,0,3,0,0,0,0,1,0,2,1,3,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2
9,3,0,0,0,0,1,1,1,0,2,2,2,1,0,2,0,0,2,1,3,3,1,1,2016,3,2,1,0,3,5,1,1,1,1,1,1,0,0,1,0,1,1,1,4,0,1,4,1,3,1,4,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,2,1,0,1,1,0,1,2,1,1,3,1,2


data_NaN (218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1.0,,3.0,0.0,0.0,2.0,1.0,,,2.0,,,1.0,0.0,2.0,0.0,0.0,,1.0,3.0,2.0,,0.0,2016.0,3.0,2.0,,0.0,1.0,5.0,1.0,0.0,1.0,1.0,1.0,,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,2.0,4.0,,3.0,1.0,1.0,,3.0,,1.0,1.0,0.0,0.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
1,3.0,0.0,,0.0,0.0,,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,,1.0,1.0,3.0,1.0,2016.0,3.0,,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,,2.0,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,,3.0
2,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,,1.0,2.0,1.0,0.0,1.0,0.0,,,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,,0.0,3.0,1.0,,1.0,1.0,1.0,1.0,0.0,,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,,1.0,1.0,3.0,2.0,0.0,,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,,1.0,3.0,1.0,2016.0,,2.0,,0.0,0.0,,1.0,0.0,1.0,1.0,1.0,1.0,0.0,,1.0,0.0,1.0,,0.0,0.0,8.0,0.0,2.0,1.0,,,0.0,0.0,1.0,,1.0,,0.0,,1.0,1.0,,5.0,2.0,1.0,1.0,3.0,2.0,0.0,0.0,1.0,1.0,0.0,3.0,1.0,1.0,1.0,4.0,1.0,3.0
4,,0.0,3.0,0.0,0.0,2.0,,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,,0.0,1.0,4.0,2.0,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,4.0,5.0,1.0,0.0,1.0,1.0,1.0,,0.0,0.0,,0.0,1.0,1.0,,,6.0,2.0,4.0,3.0,3.0,,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,,2.0,3.0,1.0,1.0
5,3.0,0.0,3.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,,1.0,1.0,0.0,2.0,0.0,,1.0,4.0,,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,4.0,1.0,1.0,,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,,1.0,2.0,1.0,0.0,2.0,1.0,,1.0,4.0,2.0,1.0,1.0,0.0,3.0,1.0,0.0,1.0,1.0,,3.0,2.0,1.0,2.0,3.0,1.0,
6,3.0,0.0,3.0,0.0,,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,,3.0,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,,,1.0,1.0,1.0,,1.0,0.0,0.0,1.0,0.0,4.0,,1.0,4.0,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,0.0,2.0,1.0,,1.0,4.0,2.0,,1.0,0.0,,0.0,0.0,1.0,1.0,,,1.0,1.0,0.0,0.0,1.0,2.0
7,2.0,0.0,3.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0,,3.0,1.0,0.0,2.0,0.0,0.0,0.0,1.0,3.0,3.0,1.0,1.0,2016.0,3.0,9.0,,0.0,3.0,1.0,1.0,,1.0,,1.0,1.0,,0.0,1.0,0.0,4.0,1.0,1.0,2.0,0.0,2.0,1.0,3.0,3.0,4.0,3.0,,,1.0,1.0,1.0,,2.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,0.0,1.0,,,3.0,2.0,,1.0,3.0,0.0,0.0
8,2.0,0.0,3.0,0.0,0.0,,0.0,,0.0,2.0,,,1.0,0.0,1.0,0.0,0.0,,1.0,3.0,1.0,,1.0,2016.0,3.0,2.0,0.0,0.0,0.0,5.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,,3.0,8.0,,4.0,3.0,3.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,7.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,,1.0,,3.0,2.0,1.0,1.0,3.0,,
9,3.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,,2.0,2.0,1.0,0.0,2.0,0.0,0.0,2.0,1.0,3.0,3.0,1.0,,2016.0,3.0,2.0,,0.0,,5.0,1.0,1.0,1.0,1.0,,1.0,0.0,0.0,,0.0,1.0,1.0,1.0,4.0,0.0,1.0,4.0,1.0,3.0,1.0,4.0,1.0,,,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,2.0,1.0,0.0,1.0,1.0,0.0,1.0,2.0,1.0,,3.0,1.0,2.0


data_RF (218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,0,1,3,2,1,0,2016,3,2,0,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
1,3,0,3,0,0,0,0,1,0,2,1,2,1,0,1,0,0,1,3,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,2,2,1,1,3,2,0,0,1,1,0,3,2,1,1,3,1,3
2,3,0,2,0,0,2,0,1,0,1,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,0,1,3,1,2016,3,2,1,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,1,1,1,1,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
5,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,4,3,1,1,2016,2,2,2,0,3,5,1,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,0
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,4,3,1,1,2016,2,2,2,0,3,5,1,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,2,0,0,1,1,0,3,1,1,0,0,1,2
7,2,0,3,0,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
8,2,0,3,0,0,2,0,1,0,2,1,1,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2
9,3,0,0,0,0,1,1,1,0,1,2,2,1,0,2,0,0,2,1,3,3,1,1,2016,3,2,1,0,3,5,1,1,1,1,1,1,0,0,1,0,1,1,1,4,0,1,4,1,3,1,4,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,2,1,0,1,1,0,1,2,1,1,3,1,2


data_Mode (218121, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,1,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,0,1,3,2,1,0,2016,3,2,0,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
1,3,0,3,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,3,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,2,2,1,1,3,2,0,0,1,1,0,3,2,1,1,3,1,3
2,3,0,2,0,0,2,0,1,0,1,1,2,1,0,1,0,0,0,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,1,1,1,3,2,0,0,1,1,1,0,1,1,1,1,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,3,1,3,1,2016,3,2,0,0,0,1,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,3,1,0,0,1,1,1,1,0,2,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,1,3
4,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,3,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,1
5,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,3,3,1,1,2016,2,2,2,0,3,5,1,1,1,1,1,1,0,0,1,0,4,1,1,3,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,0
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,0,4,3,3,1,1,2016,2,2,2,0,3,1,1,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,2,0,0,1,1,0,3,1,1,0,0,1,2
7,2,0,3,0,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,0
8,2,0,3,0,0,2,0,1,0,2,2,1,1,0,1,0,0,0,1,3,1,1,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,1,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,0
9,3,0,0,0,0,1,1,1,0,1,2,2,1,0,2,0,0,2,1,3,3,1,1,2016,3,2,0,0,3,5,1,1,1,1,1,1,0,0,1,0,1,1,1,4,0,1,4,1,3,1,4,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,2,1,0,1,1,0,1,2,1,1,3,1,2


data_IVEware (219711, 79)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AGE,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,1,1,1,0,2,2,1,1,0,2,0,0,2,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,3,0,0,1,0,8,0,2,1,0,3,1,3
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,3,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,1,1,4,1,3,1,1,1,0,0,1,1,1,1,1,2,1,0,3,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,3,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,3,1,5,1,0,0,1,1,1,1,0,0,1,1,1,5,2,1,1,3,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,0,0,0,2,0,1,0,1,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,1,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,5,0,1,1,1,1,0,3
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,0,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,2,0,0,1,1,0,3,1,1,1,4,0,3
5,3,0,3,0,0,2,1,1,0,1,2,2,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,8,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,3,1,0,1,1,0,3,2,1,2,3,1,3
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,2,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,2,1,0,1,1,0,3,2,1,2,3,1,0
7,3,0,3,0,0,2,1,1,0,1,2,1,1,0,2,0,0,1,2,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,0,1,0,0,0,1,1,0,0,1,2
8,2,0,3,0,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,2,1,0,1,1,0,3,2,1,1,3,0,1
9,2,0,3,0,0,0,0,1,0,2,1,3,1,0,1,0,0,0,2,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,4,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,2,1,0,1,1,0,3,2,1,1,3,1,2


In [12]:
Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
)

Compare_Imputation_Methods_Part_2


ValueError: Can only compare identically-labeled Series objects

In [None]:
def Main():
    data = Get_Data()
    
#    data_Imputed = Impute_Full(data)
    data_Imputed = Impute_Round_Robin(data)
    data_Imputed.to_csv('../../Big_Files/CRSS_Imputed_All_12_22_22.csv', index=False)
#    display(data_Imputed.head(50))
    
    Check(data, data_Imputed)
    display(Audio(sound_file, autoplay=True))
    return 0
Main()