# Methods

- We have the discretized CRSS dataset in '../../Big_Files/Discretized_All_12_22_22.csv'
- MissForest is a round-robin imputation method implemented in R, generally considered one of the best imputation methods.  It has several Python implementations.
- I tried to use MissForest, https://pypi.org/project/MissForest/, to impute missing values, but it gave me errors, and finding the source of the errors led me down the path to write my own round-robin implementation.
- I compare here three methods:
    - Round-Robin Random Forest (my own implementation of Round Robin, using scikit-learn's random forest)
    - Imputation by mode
    - IVEware, using the hyperparameters in the CRSS Imputation report
- To compare, I followed the example for MissForest.
    - I dropped all samples with a missing value, so I would have ground truth.
    - I erased 15% of the values in each sample.
    - I used each imputation method to impute the missing values, and, for each feature, counted how many did not match the ground truth.
- My round-robin method
    - In data_NaN, change all of the 'Unknown' to np.NaN.
    - In each feature, count the number of unknown samples.
    - In another copy, data_Mode, impute by mode in all of the features.
    - Starting with the feature with the least (nonzero) number of missing samples:
        - Copy that feature from data_NaN into data_Mode, so that only that feature has missing values.
        - Separate the dataframe into two, one with known values in the target variable (X) and one with unknown values (Z).
        - From the dataframe with known values (X), separate out the target variable (call it 'y')
        - Using Random Forest, build a model that maps X to y.  
        - Use the model to impute the missing values
    - At each iteration we replace the mode-imputed values with RF-imputed values.
- The IVEware implementation is available in several platforms, but Python is not one of them.  I run it in R outside this notebook.  Be aware that the random selection of values to erase is different for each run, so the IVEware imputation must be run anew.  

# Results of Comparison of Three Imputation Methods

- We ran the imputation on 78 features with 224,850 samples.  
    - The features are the features of the CRSS dataset that are have data for all of 2016 - 2020, are not the results of imputation by CRSS, may have a pattern (not random numbers like VIN numbers), and that do not have more than 20% of the samples missing.  
    - The features were discretized (binned) down to 2-10 categories before imputation.
    - The samples are those of the 619,027 that have no missing values in any of the 78 features.
- First Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.25% |
| Mode Imputation | 28.51% |
| IVEware | 24.23% |

    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More | Total |
| --- | --- | --- | --- | --- |
Compare RF to Mode |  45 | 33 | 0 | 78 |
Compare RF to IVEware | 50 | 0 | 28 | 78 |
Compare Mode to IVEware | 39 | 0 | 39 |  78 |


- Second Run
    - Percentage of Samples Incorrectly Imputed

| | Percentage of Samples Incorrectly Imputed |
| --- | --- |
| Random Forest | 22.17 % |
| Mode Imputation | 28.42% |
| IVEware |  23.84% |


    - Comparison of number of errors in the 78 features:

|  | Fewer | Equal | More |
| --- | --- | --- | --- |
| Compare RF to Mode | 46 | 31 | 1 |
| Compare RF to IVEware | 49 | 0 | 29 |
| Compare Mode to IVEware |  36 | 1 | 41 |

    - Number of NaN Imputed Differently by Different Methods

|  |  |
| --- | --- |
|Total Number of NaN|  2,443,202|
|RF Different from Mode|  273,351|
|RF Different from IVEware|  606,751|
|Mode Different from IVEware|  738,833|



## Discussion

- Random Forest is as good or better than Mode for (nearly) every feature.
- Random Forest is as good or better than IVEware on more than half of the features, but not overwhelmingly, and slightly better in the count of missing samples correctly imputed.
- IVEware and Mode are comparable in the number of features, but IVEware is much better in the count of missing samples correctly imputed.
- Random Forest and Mode make the same mistakes.  
- IVEware makes different mistakes from Random Forest and Mode.

## Conclusion

- Use Random Forest

In [15]:
%%latex
\tableofcontents

<IPython.core.display.Latex object>

# Setup
## Import Libraries

In [2]:
import sys, copy, math, time, os

print ('Python version: {}'.format(sys.version))

import numpy as np
print ('NumPy version: {}'.format(np.__version__))
np.set_printoptions(suppress=True)


import pandas as pd
print ('Pandas version:  {}'.format(pd.__version__))
pd.set_option('display.max_rows', 500)

!pip install scikit-learn

import sklearn
print ('SciKit-Learn version: {}'.format(sklearn.__version__))
from sklearn.model_selection import train_test_split

import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

# Set Randomness.  Copied from https://www.kaggle.com/code/abazdyrev/keras-nn-focal-loss-experiments
import random
#np.random.seed(42) # NumPy
#random.seed(42) # Python
#tf.set_random_seed(42) # Tensorflow

from IPython.display import Audio
sound_file = './beep.wav'

import warnings
warnings.filterwarnings('ignore')


Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:13) [Clang 14.0.6 ]
NumPy version: 1.24.0
Pandas version:  1.5.2
[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0mSciKit-Learn version: 1.2.0


# Import Data

## Get Data
- The Get_Data_from_Original() reads the (original) CRSS files from the CRSS directory, preprocesses it, and writes it to files in a folder outside this GitHub repo (because the files are too large for my subscription), and returns the dataframes.
- The Get_Data_from_Temp_Files() reads the temp files and returns the dataframes.  I created this option for running repeatedly during writing and debugging, because it's much faster.

In [3]:
def Get_Data():
    print ('Get_Data')
    data = pd.read_csv('../../Big_Files/CRSS_Discretized_All_12_22_22.csv', low_memory=False)
    print ('data.shape = ', data.shape)
    print ('Drop Imputed Columns')
    for feature in data:
        if '_IM' in feature:
#            print (feature)
            data.drop(columns=feature, inplace=True)
    
    print ('data.shape = ', data.shape)
    print ()
    
    return data

In [4]:
#data = Get_Data()


In [5]:
def Impute_Round_Robin(data):
    print ('Impute()')
    pd.set_option('display.max_columns', None)
    
    # Replace 'Unknown' with np.NaN
    data.replace({'Unknown': np.nan}, inplace=True)
    display(data.head(20))
    print ()
    
#    data.sort_values(by = ['CASENUM', 'VEH_NO', 'PER_NO'], ascending = [True, True, True])
    
    # Make a list of features with missing samples, 
    #     ordered by the number of missing samples, 
    #     from least to most.  
    Missing = []
    Complete = []
    for feature in data:
        s = data[feature].isna().sum()
        if s==0:
            Complete.append([feature, s])
        if s>0:
            Missing.append([feature, s])
    Missing = sorted (Missing, key=lambda x:x[1], reverse=False)
    print ()
    print ('Complete[]')
    display(Complete)
    print ()
    print ('Missing[]')
    display(Missing)
    print ()
    
    print ('Make data_Mode')
    print ()
    data_Mode = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Mode[feature] = data[feature]
    for M in Missing:
        feature = M[0]
        m = data[feature].mode()[0]
        print (feature, M[1], m)
        data_Mode[feature] = data[feature].fillna(m)
    print ('data_Mode')
    display(data_Mode.head(20))
#    data.sort_values(
#        by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#        ascending = [True, True, True], 
#        inplace=True
#    )
#    print ()
#    print ('data.PER_NO.equals(data__Mode.PER_NO)')
#    print (data.PER_NO.equals(data_Mode.PER_NO))
#    print ()
#    
    print ()
    print ('Make starting point for data_Imputed')
    data_Imputed = pd.DataFrame()
    for X in Complete:
        feature = X[0]
        data_Imputed[feature] = data[feature]
    for X in Missing:
        feature = X[0]
        data_Imputed[feature] = data_Mode[feature]
    print ('data_Imputed')
    display(data_Imputed.head(20))
    print ()
#    data_Imputed.sort_values(
#        by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#        ascending = [True, True, True], 
#        inplace=True
#    )
#    print ()
#    print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#    print (data.PER_NO.equals(data_Imputed.PER_NO))
#    print ()
    
    print ('Start Loop')
    print ()
    n = 0
    for M in Missing:
        n += 1
        print (M)
        feature = M[0]
        data_Imputed[feature] = data[feature]
#        print ()
#        print ('data[feature].isna().sum()')
#        print (data[feature].isna().sum())
#        print ('data_Imputed[feature].isna().sum()')
#        print (data_Imputed[feature].isna().sum())
#        print ()
        W = data_Imputed.dropna(subset=[feature])
        X = data_Imputed.dropna(subset=[feature])
        y = X[feature]
        X.drop(columns=feature, inplace=True)
        Z = data_Imputed[data_Imputed[feature].isna()]
        Z.drop(columns=feature, inplace=True)
#        Z.reset_index(drop=True, inplace=True)
#        print (data.shape)
#        print (X.shape)
#        display(X.head(40))
#        display(y.head(40))
#        print (Z.shape)
#        display(Z)
        clf = RandomForestClassifier(max_depth=2, random_state=0)
        clf.fit(X,y)
#        print ('clf.predict(Z)')
        z = clf.predict(Z)
        print (len(z))
        display(z)
        Z[feature] = z
#        display(Z)
        data_Imputed = pd.concat([Z, W])
#        display(data_Imputed.head(60))
        print (data_Imputed.shape)
        print ()
#        data_Imputed.sort_values(
#            by = ['CASENUM', 'VEH_NO', 'PER_NO'], 
#            ascending = [True, True, True], 
#            inplace=True
#        )
#        print ()
#        print ('data.PER_NO.equals(data_Imputed.PER_NO)')
#        print (data.PER_NO.equals(data_Imputed.PER_NO))
#        print ()
               
        Check_Feature(data, data_Imputed, feature)
#        if n==10:
#            return data_Imputed
    
    
    
    
    print ()
    return data_Imputed

In [6]:
def Impute_Full(data):
    print ('Impute()')
    data.replace({'Unknown': np.nan}, inplace=True)
    for feature in data:
        print (feature, len(pd.unique(data[feature])))
    print ()
    mf = MissForest()
    data = mf.fit_transform(data)
    return data

In [7]:
def Check(data, data_Imputed):
    Features = data.columns
    print (Features)
    for feature in Features:
        U = pd.unique(data[feature]).tolist()
        print (U)
        A = []
        for u in U:
            a = len(data[data[feature]==u])
            b = len(data_Imputed[data_Imputed[feature]==u])
            A.append([u, a, b])
        display(A)
        print ()


In [8]:
def Check_Feature(data, data_Imputed, feature):
    U = pd.unique(data[feature]).tolist()
    U = [x for x in U if x == x]
    print (U)
    A = []
    for u in U:
        a = len(data[data[feature]==u])
        b = len(data_Imputed[data_Imputed[feature]==u])
        A.append([u, a, b, b-a])
    a = data[feature].isna().sum()
    b = data_Imputed[feature].isna().sum()
    A.append(['NaN', a, b, 0])
    A = pd.DataFrame(A, columns=['Value', 'Original', 'Imputed', 'Difference'])
    display(A)
    print ()


# Test_Accuracy

In [13]:
def Compare_Imputation_Methods_Part_1():
    print ()
    print ('Compare_Imputation_Methods_Part_1()')
    data = Get_Data()
    data.drop(columns=['CASENUM', 'VEH_NO', 'PER_NO'], inplace=True)
    print (data.shape)

    # Drop all samples with missing data, so we have ground truth
    data.replace({'Unknown':np.nan}, inplace=True)
    data.dropna(inplace=True)
    data.reset_index(inplace=True, drop=True)
    for feature in data:
        data[feature] = pd.to_numeric(data[feature])
    data.astype('int64')

    data_Ground_Truth = data.copy(deep=True)
    for feature in data_Ground_Truth:
        data_Ground_Truth[feature] = pd.to_numeric(data_Ground_Truth[feature])
    data_Ground_Truth = data_Ground_Truth.astype('int64')
    print ('data_Ground_Truth.shape')
    print (data_Ground_Truth.shape)
    display(data_Ground_Truth.head())

    # Randomly pick 15% of the values from each row
    # and set them to be missing
    print ('Remove 15% of values from each row')
    frac = .15
    N = data.shape[0] * frac # Number of NaN in each feature
    for c in data.columns:
        idx = np.random.choice(a=data.index, size=int(len(data) * frac))
        data.loc[idx, c] = np.nan
    data_NaN = data.copy(deep=True)
    print ('data_NaN.shape')
    print (data_NaN.shape)
    display(data_NaN.head())

    data_IVEware = data.fillna('')
    data_IVEware.to_csv('../../Big_Files/data_IVEware.txt', sep='\t', index=False)
    
    data_Mode = pd.DataFrame()
    for feature in data:
        data_Mode[feature] = data[feature].fillna(data[feature].mode()[0])
    data_Mode = data_Mode.astype('int64')
    print ('data_Mode.shape')
    print (data_Mode.shape)
    display(data_Mode.head())
    
    data_RF = Impute_Round_Robin(data)
    data_RF.sort_index(inplace=True)
    data_RF = data_RF[data.columns]  
    data_RF = data_RF.astype('int64')
    
    print ('data_RF.shape')
    print (data_RF.shape)
    display(data_RF.head())
#    print ()

    return data_Ground_Truth, data_NaN, data_RF, data_Mode

def Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
):
    print ('Compare_Imputation_Methods_Part_2')
    A = []
    for feature in data_NaN:
        N = data_NaN[feature].isna().sum()
#        print (feature, N)
#        print ()
        D = data_Ground_Truth[feature] != data_RF[feature]
        d = D.sum()
        E = data_Ground_Truth[feature] != data_Mode[feature]
        e = E.sum()
        F = data_Ground_Truth[feature] != data_IVEware[feature]
        f = F.sum()
        G = data_RF[feature] != data_Mode[feature]
        g = G.sum()
        H = data_RF[feature] != data_IVEware[feature]
        h = H.sum()
        I = data_Mode[feature] != data_IVEware[feature]
        i = I.sum()
        print (feature, N, d, e, f, g, h, i)
        print (
            feature, 
            data_Ground_Truth.dtypes[feature],
            data_NaN.dtypes[feature],
            data_RF.dtypes[feature],
            data_Mode.dtypes[feature],
            data_IVEware.dtypes[feature],
        )
        A.append([
            feature, N, 
            d, int(d/N*100), 
            e, int(e/N*100), 
            f, int(f/N*100),
            g, int(g/N*100),
            h, int(h/N*100),
            i, int(i/N*100),
        ])
#        print (D[:10])
        print ()
    print ()
    
    A = sorted(A, key=lambda x:x[3])
    B = pd.DataFrame(
        A, 
        columns=[
            'Feature', 'nNaN', 
            'nRF Incorrect', 'pRF Incorrect', 
            'nMode Incorrect', 'pMode Incorrect', 
            'nIVEware Incorrect', 'pIVEware Incorrect',
            'RF and Mode Different', 'RF v/s Mode %',
            'RF and IVEware Different', 'RF v/s IVEware %',
            'Mode and IVEware Different', 'Mode v/s IVEware %'
        ]
    )
    display(B)
    a = sum([x[1] for x in A])
    b = sum([x[2] for x in A])
    c = sum([x[4] for x in A])
    d = sum([x[6] for x in A])
    e = round(b/a*100,2)
    f = round(c/a*100,2)
    g = round(d/a*100,2)
    s = len(A) - sum([x[8] for x in A])
    t = len(A) - sum([x[9] for x in A])
    u = len(A) - sum([x[10] for x in A])

    RF_less_Mode = sum([x[2] < x[4] for x in A])
    RF_equal_Mode = sum([x[2] == x[4] for x in A])
    RF_greater_Mode = sum([x[2] > x[4] for x in A])

    RF_less_IVEware = sum([x[2] < x[6] for x in A])
    RF_equal_IVEware = sum([x[2] == x[6] for x in A])
    RF_greater_IVEware = sum([x[2] > x[6] for x in A])

    Mode_less_IVEware = sum([x[4] < x[6] for x in A])
    Mode_equal_IVEware = sum([x[4] == x[6] for x in A])
    Mode_greater_IVEware = sum([x[4] > x[6] for x in A])

    print ()
    print ('Error RF = ', e)
    print ('Error Mode = ', f)
    print ('Error IVEware = ', g)
    print ('nRF > nMode: ', s)
    print ('nRF > nIVEware: ', t)
    print ('nModel > nIVEware: ', u)
    print ('Compare RF to Mode: ', RF_less_Mode, RF_equal_Mode, RF_greater_Mode)
    print ('Compare RF to IVEware: ', RF_less_IVEware, RF_equal_IVEware, RF_greater_IVEware)
    print ('Compare Mode to IVEware: ', Mode_less_IVEware, Mode_equal_IVEware, Mode_greater_IVEware)
    print ()
    print ('Number of NaN in data_NaN: ', data_NaN.isna().sum().sum())
    print ('RF Different from Mode: ', sum([x[8] for x in A]))
    print ('RF Different from IVEware: ', sum([x[10] for x in A]))
    print ('Mode Different from IVEware: ', sum([x[12] for x in A]))
        
    display(Audio(sound_file, autoplay=True))
    
    
        

In [10]:
data_Ground_Truth, data_NaN, data_RF, data_Mode = Compare_Imputation_Methods_Part_1()


Compare_Imputation_Methods_Part_1()
Get_Data
data.shape =  (619027, 105)
Drop Imputed Columns
data.shape =  (619027, 81)

(619027, 78)
data_Ground_Truth.shape
(224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,1,1,1,0,2,...,1,0,8,0,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,...,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,...,1,1,5,0,1,1,1,1,0,3
4,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,1,1,1,4,1,3


Remove 15% of values from each row
data_NaN.shape
(224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,0.0,2.0,...,,0.0,,,2.0,1.0,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,0.0,2.0,1.0,0.0,0.0,2.0,...,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
2,3.0,0.0,2.0,,0.0,2.0,0.0,1.0,0.0,2.0,...,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,...,1.0,1.0,5.0,0.0,1.0,1.0,1.0,,0.0,
4,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,...,1.0,,,3.0,1.0,1.0,1.0,4.0,1.0,3.0


data_Mode.shape
(224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,...,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,2,1,1,0,2,...,1,0,0,3,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,...,1,1,0,3,2,1,1,3,1,4
2,3,0,2,2,0,2,0,1,0,2,...,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,...,1,1,5,0,1,1,1,3,0,0
4,3,0,2,0,0,2,0,1,0,2,...,1,1,0,3,1,1,1,4,1,3


Impute()


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,0.0,2.0,2.0,1.0,1.0,0.0,2.0,0.0,,1.0,0.0,2.0,4.0,1.0,1.0,2016.0,4.0,2.0,3.0,0.0,4.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,,0.0,5.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,3.0,,,1.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0,,0.0,,,2.0,1.0,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,0.0,2.0,1.0,0.0,0.0,2.0,2.0,1.0,1.0,,,0.0,0.0,2.0,1.0,3.0,2.0,1.0,0.0,2016.0,3.0,,1.0,0.0,,5.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,1.0,0.0,,1.0,1.0,4.0,6.0,2.0,4.0,3.0,3.0,1.0,,1.0,3.0,1.0,1.0,,,,1.0,,1.0,1.0,,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
2,3.0,0.0,2.0,,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,,3.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,,1.0,1.0,5.0,2.0,1.0,,3.0,0.0,0.0,1.0,1.0,5.0,0.0,1.0,1.0,1.0,,0.0,
4,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,1.0,0.0,,1.0,,0.0,,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,1.0,3.0,0.0,,1.0,,,3.0,1.0,1.0,1.0,4.0,1.0,3.0
5,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,2.0,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,4.0,5.0,1.0,,1.0,1.0,,,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,2.0,4.0,3.0,3.0,1.0,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,1.0
6,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,,2.0,3.0,1.0,1.0,2016.0,2.0,,2.0,,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,4.0,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,,2.0,1.0,1.0,1.0,,2.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,2.0
7,3.0,0.0,3.0,0.0,,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,4.0,2.0,3.0,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,,8.0,0.0,4.0,,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,,0.0,1.0,2.0
8,2.0,0.0,,,0.0,1.0,1.0,1.0,0.0,2.0,2.0,3.0,1.0,0.0,2.0,,0.0,0.0,1.0,3.0,,1.0,,2016.0,3.0,9.0,0.0,0.0,3.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,2.0,0.0,,1.0,3.0,3.0,4.0,3.0,,3.0,1.0,1.0,1.0,0.0,2.0,,1.0,1.0,,1.0,1.0,1.0,2.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,,1.0,3.0,0.0,0.0
9,2.0,,3.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,,3.0,1.0,,1.0,0.0,0.0,0.0,1.0,3.0,,3.0,1.0,2016.0,3.0,2.0,0.0,0.0,0.0,5.0,1.0,0.0,1.0,1.0,,1.0,,0.0,1.0,0.0,,1.0,0.0,,8.0,2.0,4.0,3.0,3.0,,0.0,0.0,1.0,1.0,1.0,0.0,0.0,,,,2.0,7.0,1.0,1.0,1.0,0.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,,2.0




Complete[]


[]


Missing[]


[['VTCONT_F', 31220],
 ['NUM_INJ', 31227],
 ['URBANICITY', 31232],
 ['VTRAFCON', 31242],
 ['ACC_TYPE', 31253],
 ['MONTH', 31255],
 ['YEAR', 31258],
 ['VE_TOTAL', 31260],
 ['RELJCT1', 31265],
 ['MAX_SEV', 31266],
 ['HARM_EV', 31270],
 ['MODEL', 31273],
 ['INJ_SEV', 31279],
 ['HAZ_CNO', 31282],
 ['PERNOTMVIT', 31284],
 ['MAKE', 31287],
 ['TOW_VEH', 31288],
 ['BODY_TYP', 31289],
 ['VPROFILE', 31291],
 ['LGT_COND', 31292],
 ['VSURCOND', 31297],
 ['P_CRASH1', 31300],
 ['CARGO_BT', 31301],
 ['PSU', 31302],
 ['REL_ROAD', 31306],
 ['VE_FORMS', 31306],
 ['FIRE_EXP', 31309],
 ['DAY_WEEK', 31313],
 ['VSPD_LIM', 31313],
 ['RELJCT2', 31314],
 ['J_KNIFE', 31315],
 ['WRK_ZONE', 31318],
 ['DR_PRES', 31319],
 ['REST_MIS', 31319],
 ['PCRASH4', 31320],
 ['IMPACT1', 31321],
 ['HAZ_INV', 31322],
 ['VEH_ALCH', 31323],
 ['SEAT_POS', 31323],
 ['HOUR', 31324],
 ['ALC_STATUS', 31327],
 ['HAZ_REL', 31329],
 ['SCH_BUS', 31330],
 ['EJECTION', 31330],
 ['PVH_INVL', 31331],
 ['ROLLOVER', 31334],
 ['INT_HWY', 31335],


Make data_Mode

VTCONT_F 31220 1.0
NUM_INJ 31227 0.0
URBANICITY 31232 1.0
VTRAFCON 31242 1.0
ACC_TYPE 31253 3.0
MONTH 31255 2.0
YEAR 31258 2017.0
VE_TOTAL 31260 2.0
RELJCT1 31265 0.0
MAX_SEV 31266 0.0
HARM_EV 31270 1.0
MODEL 31273 1.0
INJ_SEV 31279 3.0
HAZ_CNO 31282 1.0
PERNOTMVIT 31284 0.0
MAKE 31287 0.0
TOW_VEH 31288 0.0
BODY_TYP 31289 1.0
VPROFILE 31291 1.0
LGT_COND 31292 3.0
VSURCOND 31297 1.0
P_CRASH1 31300 1.0
CARGO_BT 31301 0.0
PSU 31302 3.0
REL_ROAD 31306 1.0
VE_FORMS 31306 2.0
FIRE_EXP 31309 1.0
DAY_WEEK 31313 1.0
VSPD_LIM 31313 2.0
RELJCT2 31314 1.0
J_KNIFE 31315 1.0
WRK_ZONE 31318 0.0
DR_PRES 31319 1.0
REST_MIS 31319 1.0
PCRASH4 31320 1.0
IMPACT1 31321 1.0
HAZ_INV 31322 0.0
VEH_ALCH 31323 1.0
SEAT_POS 31323 3.0
HOUR 31324 3.0
ALC_STATUS 31327 1.0
HAZ_REL 31329 1.0
SCH_BUS 31330 0.0
EJECTION 31330 1.0
PVH_INVL 31331 0.0
ROLLOVER 31334 1.0
INT_HWY 31335 0.0
TYP_INT 31335 1.0
ALC_RES 31339 0.0
SEX 31340 1.0
PERMVIT 31342 2.0
ALCOHOL 31344 2.0
VALIGN 31347 1.0
HAZ_PLAC 31349 0.

Unnamed: 0,VTCONT_F,NUM_INJ,URBANICITY,VTRAFCON,ACC_TYPE,MONTH,YEAR,VE_TOTAL,RELJCT1,MAX_SEV,HARM_EV,MODEL,INJ_SEV,HAZ_CNO,PERNOTMVIT,MAKE,TOW_VEH,BODY_TYP,VPROFILE,LGT_COND,VSURCOND,P_CRASH1,CARGO_BT,PSU,REL_ROAD,VE_FORMS,FIRE_EXP,DAY_WEEK,VSPD_LIM,RELJCT2,J_KNIFE,WRK_ZONE,DR_PRES,REST_MIS,PCRASH4,IMPACT1,HAZ_INV,VEH_ALCH,SEAT_POS,HOUR,ALC_STATUS,HAZ_REL,SCH_BUS,EJECTION,PVH_INVL,ROLLOVER,INT_HWY,TYP_INT,ALC_RES,SEX,PERMVIT,ALCOHOL,VALIGN,HAZ_PLAC,VEH_AGE,M_HARM,MAK_MOD,REGION,BUS_USE,HOSPITAL,REST_USE,MAX_VSEV,SPEC_USE,PER_TYP,PJ,HIT_RUN,NUMOCCS,P_CRASH2,WEATHER,MAN_COLL,VTRAFWAY,EMER_USE,NUM_INJV,SPEEDREL,TOWED,PCRASH5,AIR_BAG,PEDS
0,1.0,1.0,2.0,1.0,4.0,0.0,2016.0,2.0,0.0,3.0,1.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,5.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,1.0,0.0,4.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,0.0,3.0,3.0,1.0,4.0,2.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0
1,1.0,2.0,2.0,1.0,3.0,0.0,2016.0,2.0,0.0,1.0,1.0,4.0,3.0,1.0,0.0,6.0,0.0,5.0,1.0,3.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0,4.0,1.0,4.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,3.0,1.0,2.0,0.0,1.0,3.0,1.0,2.0,3.0,0.0,0.0
2,1.0,1.0,2.0,1.0,0.0,2.0,2016.0,1.0,0.0,2.0,3.0,2.0,3.0,1.0,0.0,8.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,3.0,1.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
3,1.0,1.0,2.0,1.0,0.0,0.0,2016.0,1.0,0.0,2.0,3.0,1.0,0.0,1.0,0.0,8.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,3.0,1.0,5.0,1.0,0.0,1.0,1.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
4,1.0,1.0,2.0,1.0,0.0,0.0,2016.0,1.0,0.0,2.0,3.0,1.0,3.0,1.0,0.0,0.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,4.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,3.0,1.0,0.0,3.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
5,1.0,0.0,1.0,1.0,4.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,3.0,1.0,0.0,6.0,0.0,5.0,1.0,3.0,2.0,1.0,0.0,4.0,1.0,2.0,1.0,1.0,4.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,1.0,1.0,4.0,2.0,1.0,0.0,2.0,2.0,1.0,2.0,2.0,0.0,3.0,5.0,1.0,3.0,0.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0
6,1.0,1.0,1.0,1.0,3.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,3.0,1.0,0.0,8.0,0.0,5.0,1.0,3.0,2.0,4.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,3.0,3.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,4.0,2.0,2.0,0.0,2.0,0.0,2.0,2.0,2.0,0.0,1.0,3.0,1.0,3.0,0.0,1.0,1.0,1.0,2.0,3.0,1.0,0.0
7,1.0,1.0,1.0,1.0,3.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,0.0,1.0,0.0,8.0,0.0,5.0,1.0,3.0,2.0,4.0,1.0,4.0,1.0,2.0,1.0,1.0,4.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,3.0,2.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,0.0,1.0,3.0,1.0,3.0,0.0,1.0,3.0,1.0,2.0,3.0,0.0,0.0
8,1.0,0.0,2.0,1.0,3.0,2.0,2016.0,2.0,0.0,0.0,1.0,1.0,3.0,1.0,0.0,0.0,0.0,1.0,1.0,3.0,1.0,4.0,0.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,3.0,2.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,9.0,1.0,0.0,0.0,1.0,2.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,3.0,3.0,3.0,2.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0
9,1.0,0.0,2.0,1.0,0.0,0.0,2016.0,2.0,0.0,0.0,3.0,4.0,3.0,1.0,0.0,8.0,0.0,5.0,2.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,7.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,3.0,2.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,2.0,0.0,3.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,0.0,3.0,3.0,0.0,1.0,3.0,0.0,2.0,1.0,1.0,0.0



Make starting point for data_Imputed
data_Imputed


Unnamed: 0,VTCONT_F,NUM_INJ,URBANICITY,VTRAFCON,ACC_TYPE,MONTH,YEAR,VE_TOTAL,RELJCT1,MAX_SEV,HARM_EV,MODEL,INJ_SEV,HAZ_CNO,PERNOTMVIT,MAKE,TOW_VEH,BODY_TYP,VPROFILE,LGT_COND,VSURCOND,P_CRASH1,CARGO_BT,PSU,REL_ROAD,VE_FORMS,FIRE_EXP,DAY_WEEK,VSPD_LIM,RELJCT2,J_KNIFE,WRK_ZONE,DR_PRES,REST_MIS,PCRASH4,IMPACT1,HAZ_INV,VEH_ALCH,SEAT_POS,HOUR,ALC_STATUS,HAZ_REL,SCH_BUS,EJECTION,PVH_INVL,ROLLOVER,INT_HWY,TYP_INT,ALC_RES,SEX,PERMVIT,ALCOHOL,VALIGN,HAZ_PLAC,VEH_AGE,M_HARM,MAK_MOD,REGION,BUS_USE,HOSPITAL,REST_USE,MAX_VSEV,SPEC_USE,PER_TYP,PJ,HIT_RUN,NUMOCCS,P_CRASH2,WEATHER,MAN_COLL,VTRAFWAY,EMER_USE,NUM_INJV,SPEEDREL,TOWED,PCRASH5,AIR_BAG,PEDS
0,1.0,1.0,2.0,1.0,4.0,0.0,2016.0,2.0,0.0,3.0,1.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,5.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,0.0,1.0,0.0,4.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,0.0,3.0,3.0,1.0,4.0,2.0,1.0,1.0,1.0,0.0,2.0,0.0,0.0
1,1.0,2.0,2.0,1.0,3.0,0.0,2016.0,2.0,0.0,1.0,1.0,4.0,3.0,1.0,0.0,6.0,0.0,5.0,1.0,3.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,2.0,1.0,0.0,4.0,1.0,4.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,3.0,1.0,2.0,0.0,1.0,3.0,1.0,2.0,3.0,0.0,0.0
2,1.0,1.0,2.0,1.0,0.0,2.0,2016.0,1.0,0.0,2.0,3.0,2.0,3.0,1.0,0.0,8.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,2.0,2.0,1.0,0.0,3.0,0.0,0.0,3.0,1.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
3,1.0,1.0,2.0,1.0,0.0,0.0,2016.0,1.0,0.0,2.0,3.0,1.0,0.0,1.0,0.0,8.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,2.0,2.0,1.0,0.0,0.0,0.0,0.0,3.0,1.0,5.0,1.0,0.0,1.0,1.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
4,1.0,1.0,2.0,1.0,0.0,0.0,2016.0,1.0,0.0,2.0,3.0,1.0,3.0,1.0,0.0,0.0,0.0,3.0,1.0,2.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,5.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,4.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,3.0,1.0,0.0,3.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,5.0,0.0,2.0,1.0,3.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0
5,1.0,0.0,1.0,1.0,4.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,3.0,1.0,0.0,6.0,0.0,5.0,1.0,3.0,2.0,1.0,0.0,4.0,1.0,2.0,1.0,1.0,4.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,3.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,1.0,1.0,4.0,2.0,1.0,0.0,2.0,2.0,1.0,2.0,2.0,0.0,3.0,5.0,1.0,3.0,0.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0
6,1.0,1.0,1.0,1.0,3.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,3.0,1.0,0.0,8.0,0.0,5.0,1.0,3.0,2.0,4.0,1.0,3.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,3.0,3.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,4.0,2.0,2.0,0.0,2.0,0.0,2.0,2.0,2.0,0.0,1.0,3.0,1.0,3.0,0.0,1.0,1.0,1.0,2.0,3.0,1.0,0.0
7,1.0,1.0,1.0,1.0,3.0,0.0,2016.0,2.0,0.0,2.0,1.0,4.0,0.0,1.0,0.0,8.0,0.0,5.0,1.0,3.0,2.0,4.0,1.0,4.0,1.0,2.0,1.0,1.0,4.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,0.0,3.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,2.0,2.0,1.0,0.0,2.0,1.0,3.0,2.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,0.0,1.0,3.0,1.0,3.0,0.0,1.0,3.0,1.0,2.0,3.0,0.0,0.0
8,1.0,0.0,2.0,1.0,3.0,2.0,2016.0,2.0,0.0,0.0,1.0,1.0,3.0,1.0,0.0,0.0,0.0,1.0,1.0,3.0,1.0,4.0,0.0,1.0,1.0,2.0,1.0,1.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,4.0,0.0,1.0,3.0,2.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,9.0,1.0,0.0,0.0,1.0,2.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,3.0,3.0,3.0,2.0,1.0,3.0,1.0,2.0,3.0,1.0,0.0
9,1.0,0.0,2.0,1.0,0.0,0.0,2016.0,2.0,0.0,0.0,3.0,4.0,3.0,1.0,0.0,8.0,0.0,5.0,2.0,3.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,7.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,3.0,2.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,2.0,0.0,3.0,3.0,1.0,0.0,1.0,2.0,1.0,2.0,3.0,0.0,3.0,0.0,3.0,3.0,0.0,1.0,3.0,0.0,2.0,1.0,1.0,0.0



Start Loop

['VTCONT_F', 31220]
31220


array([1., 1., 1., ..., 1., 1., 3.])

(224850, 78)

[1.0, 3.0, 0.0, 4.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,126953,151723,24770
1,3.0,66387,72837,6450
2,0.0,183,183,0
3,4.0,100,100,0
4,2.0,7,7,0
5,,31220,0,0



['NUM_INJ', 31227]
31227


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[1.0, 2.0, 0.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,57339,67620,10281
1,2.0,25145,25145,0
2,0.0,89870,110816,20946
3,3.0,11180,11180,0
4,4.0,10089,10089,0
5,,31227,0,0



['URBANICITY', 31232]
31232


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,49283,49290,7
1,1.0,144335,175560,31225
2,,31232,0,0



['VTRAFCON', 31242]
31242


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,126960,156189,29229
1,2.0,41929,43942,2013
2,3.0,19344,19344,0
3,0.0,5375,5375,0
4,,31242,0,0



['ACC_TYPE', 31253]
31253


array([4., 0., 2., ..., 3., 2., 3.])

(224850, 78)

[4.0, 0.0, 3.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,39580,44341,4761
1,0.0,35316,40711,5395
2,3.0,45371,59813,14442
3,2.0,44490,47069,2579
4,1.0,28840,32916,4076
5,,31253,0,0



['MONTH', 31255]
31255


array([2., 1., 2., ..., 2., 2., 2.])

(224850, 78)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,61540,67057,5517
1,1.0,64914,66028,1114
2,2.0,67141,91765,24624
3,,31255,0,0



['YEAR', 31258]
31258


array([2016., 2017., 2016., ..., 2016., 2016., 2016.])

(224850, 78)

[2016.0, 2017.0, 2018.0, 2019.0, 2020.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2016.0,46524,59377,12853
1,2017.0,47249,65043,17794
2,2018.0,38546,38648,102
3,2019.0,29520,29605,85
4,2020.0,31753,32177,424
5,,31258,0,0



['VE_TOTAL', 31260]
31260


array([2., 2., 2., ..., 2., 2., 2.])

(224850, 78)

[2.0, 1.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,140386,169544,29158
1,1.0,22665,24767,2102
2,3.0,22814,22814,0
3,4.0,7725,7725,0
4,,31260,0,0



['RELJCT1', 31265]
31265


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0, 8.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,154191,185456,31265
1,1.0,6482,6482,0
2,8.0,32876,32876,0
3,9.0,36,36,0
4,,31265,0,0



['MAX_SEV', 31266]
31266


array([0., 0., 1., ..., 0., 0., 1.])

(224850, 78)

[3.0, 1.0, 2.0, 0.0, 4.0, 5.0, 6.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,20339,20340,1
1,1.0,49002,58156,9154
2,2.0,30106,30742,636
3,0.0,89896,111371,21475
4,4.0,3361,3361,0
5,5.0,876,876,0
6,6.0,4,4,0
7,,31266,0,0



['HARM_EV', 31270]
31270


array([1., 1., 1., ..., 1., 1., 3.])

(224850, 78)

[1.0, 3.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,166676,195468,28792
1,3.0,13484,15354,1870
2,2.0,10510,11118,608
3,0.0,2910,2910,0
4,,31270,0,0



['MODEL', 31273]
31273


array([1., 0., 2., ..., 4., 4., 4.])

(224850, 78)

[0.0, 4.0, 2.0, 1.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,34641,35539,898
1,4.0,39298,47702,8404
2,2.0,39610,42598,2988
3,1.0,41980,55939,13959
4,3.0,38048,43072,5024
5,,31273,0,0



['INJ_SEV', 31279]
31279


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[3.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,137430,166598,29168
1,0.0,26999,29110,2111
2,1.0,29142,29142,0
3,,31279,0,0



['HAZ_CNO', 31282]
31282


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,193520,224802,31282
1,2.0,46,46,0
2,0.0,2,2,0
3,,31282,0,0



['PERNOTMVIT', 31284]
31284


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,188008,219292,31284
1,1.0,5558,5558,0
2,,31284,0,0



['MAKE', 31287]
31287


array([0., 0., 8., ..., 0., 0., 0.])

(224850, 78)

[0.0, 6.0, 8.0, 2.0, 4.0, 1.0, 7.0, 3.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,38032,62309,24277
1,6.0,26988,26988,0
2,8.0,37152,44162,7010
3,2.0,25499,25499,0
4,4.0,22969,22969,0
5,1.0,28802,28802,0
6,7.0,7661,7661,0
7,3.0,3625,3625,0
8,5.0,2835,2835,0
9,,31287,0,0



['TOW_VEH', 31288]
31288


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,189198,220485,31287
1,1.0,4364,4365,1
2,,31288,0,0



['BODY_TYP', 31289]
31289


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[0.0, 5.0, 3.0, 1.0, 2.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,14530,14530,0
1,5.0,43247,49487,6240
2,3.0,32560,32560,0
3,1.0,74913,99962,25049
4,2.0,17361,17361,0
5,4.0,10950,10950,0
6,,31289,0,0



['VPROFILE', 31291]
31291


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,157923,188624,30701
1,2.0,21187,21777,590
2,0.0,14449,14449,0
3,,31291,0,0



['LGT_COND', 31292]
31292


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[3.0, 2.0, 0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,142544,173836,31292
1,2.0,4412,4412,0
2,0.0,16449,16449,0
3,1.0,30153,30153,0
4,,31292,0,0



['VSURCOND', 31297]
31297


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,155697,186862,31165
1,2.0,27909,27909,0
2,3.0,9495,9627,132
3,0.0,452,452,0
4,,31297,0,0



['P_CRASH1', 31300]
31300


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 4.0, 3.0, 5.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,94844,125029,30185
1,4.0,33922,35037,1115
2,3.0,13794,13794,0
3,5.0,17664,17664,0
4,0.0,13530,13530,0
5,2.0,19796,19796,0
6,,31300,0,0



['CARGO_BT', 31301]
31301


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,186605,217906,31301
1,1.0,6944,6944,0
2,,31301,0,0



['PSU', 31302]
31302


array([4., 3., 3., ..., 4., 3., 3.])

(224850, 78)

[0.0, 1.0, 4.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,15820,15858,38
1,1.0,34290,34290,0
2,4.0,43830,49541,5711
3,3.0,54639,79233,24594
4,2.0,44969,45928,959
5,,31302,0,0



['REL_ROAD', 31306]
31306


array([1., 1., 1., ..., 1., 0., 1.])

(224850, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,176191,205638,29447
1,0.0,15930,17789,1859
2,2.0,1423,1423,0
3,,31306,0,0



['VE_FORMS', 31306]
31306


array([2., 2., 1., ..., 2., 2., 2.])

(224850, 78)

[2.0, 1.0, 3.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,139051,167193,28142
1,1.0,24909,28073,3164
2,3.0,22134,22134,0
3,4.0,7450,7450,0
4,,31306,0,0



['FIRE_EXP', 31309]
31309


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,193267,224576,31309
1,0.0,274,274,0
2,,31309,0,0



['DAY_WEEK', 31313]
31313


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,146324,177637,31313
1,0.0,47213,47213,0
2,,31313,0,0



['VSPD_LIM', 31313]
31313


array([2., 2., 2., ..., 7., 2., 7.])

(224850, 78)

[0.0, 1.0, 5.0, 4.0, 7.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,22227,22227,0
1,1.0,28746,31989,3243
2,5.0,34598,34598,0
3,4.0,20424,20424,0
4,7.0,33924,35409,1485
5,2.0,48479,75064,26585
6,3.0,5139,5139,0
7,,31313,0,0



['RELJCT2', 31314]
31314


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,80552,101971,21419
1,0.0,51006,57136,6130
2,2.0,20529,20530,1
3,3.0,41449,45213,3764
4,,31314,0,0



['J_KNIFE', 31315]
31315


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,189128,220438,31310
1,2.0,4320,4325,5
2,0.0,87,87,0
3,,31315,0,0



['WRK_ZONE', 31318]
31318


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,188852,220170,31318
1,1.0,4306,4306,0
2,2.0,339,339,0
3,3.0,35,35,0
4,,31318,0,0



['DR_PRES', 31319]
31319


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,193495,224814,31319
1,0.0,36,36,0
2,,31319,0,0



['REST_MIS', 31319]
31319


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,189200,220519,31319
1,0.0,4331,4331,0
2,,31319,0,0



['PCRASH4', 31320]
31320


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,185250,216570,31320
1,0.0,8280,8280,0
2,,31320,0,0



['IMPACT1', 31321]
31321


array([1., 1., 4., ..., 1., 1., 1.])

(224850, 78)

[5.0, 1.0, 4.0, 0.0, 3.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,5.0,7834,7834,0
1,1.0,79143,102793,23650
2,4.0,47972,55643,7671
3,0.0,16614,16614,0
4,3.0,27986,27986,0
5,2.0,13980,13980,0
6,,31321,0,0



['HAZ_INV', 31322]
31322


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,193479,224799,31320
1,1.0,49,51,2
2,,31322,0,0



['VEH_ALCH', 31323]
31323


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,189179,220473,31294
1,0.0,4348,4377,29
2,,31323,0,0



['SEAT_POS', 31323]
31323


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[3.0, 4.0, 0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,134258,165581,31323
1,4.0,15431,15431,0
2,0.0,1579,1579,0
3,1.0,32603,32603,0
4,2.0,9656,9656,0
5,,31323,0,0



['HOUR', 31324]
31324


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[3.0, 1.0, 2.0, 4.0, 6.0, 0.0, 5.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,53613,84163,30550
1,1.0,33929,33929,0
2,2.0,47646,48043,397
3,4.0,23713,23713,0
4,6.0,10422,10777,355
5,0.0,6561,6561,0
6,5.0,17642,17664,22
7,,31324,0,0



['ALC_STATUS', 31327]
31327


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,189994,221321,31327
1,0.0,3529,3529,0
2,,31327,0,0



['HAZ_REL', 31329]
31329


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,193477,224806,31329
1,2.0,36,36,0
2,0.0,8,8,0
3,,31329,0,0



['SCH_BUS', 31330]
31330


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,192598,223928,31330
1,1.0,922,922,0
2,,31330,0,0



['EJECTION', 31330]
31330


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,4473,4473,0
1,1.0,189047,220377,31330
2,,31330,0,0



['PVH_INVL', 31331]
31331


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,190593,221924,31331
1,1.0,2926,2926,0
2,,31331,0,0



['ROLLOVER', 31334]
31334


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,188394,219728,31334
1,0.0,5122,5122,0
2,,31334,0,0



['INT_HWY', 31335]
31335


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,170887,202222,31335
1,1.0,22628,22628,0
2,,31335,0,0



['TYP_INT', 31335]
31335


array([2., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0, 2.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,106179,129451,23272
1,0.0,25870,25870,0
2,2.0,59981,68044,8063
3,3.0,1485,1485,0
4,,31335,0,0



['ALC_RES', 31339]
31339


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,192522,223861,31339
1,1.0,989,989,0
2,,31339,0,0



['SEX', 31340]
31340


array([0., 0., 0., ..., 1., 0., 0.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,100436,114930,14494
1,0.0,93074,109920,16846
2,,31340,0,0



['PERMVIT', 31342]
31342


array([2., 2., 2., ..., 2., 2., 2.])

(224850, 78)

[2.0, 1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,119617,149401,29784
1,1.0,59341,59344,3
2,0.0,14550,16105,1555
3,,31342,0,0



['ALCOHOL', 31344]
31344


array([2., 2., 2., ..., 2., 2., 2.])

(224850, 78)

[2.0, 9.0, 1.0, 8.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,177422,208701,31279
1,9.0,8641,8641,0
2,1.0,7438,7503,65
3,8.0,5,5,0
4,,31344,0,0



['VALIGN', 31347]
31347


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,170825,202172,31347
1,0.0,16491,16491,0
2,2.0,6187,6187,0
3,,31347,0,0



['HAZ_PLAC', 31349]
31349


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,193458,224805,31347
1,1.0,43,45,2
2,,31349,0,0



['VEH_AGE', 31349]
31349


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 4.0, 3.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,57626,87985,30359
1,4.0,12515,12515,0
2,3.0,28176,28176,0
3,1.0,50877,50877,0
4,2.0,44307,45297,990
5,,31349,0,0



['M_HARM', 31350]
31350


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,165382,193923,28541
1,0.0,16592,18678,2086
2,2.0,11526,12249,723
3,,31350,0,0



['MAK_MOD', 31351]
31351


array([4., 3., 3., ..., 1., 4., 1.])

(224850, 78)

[0.0, 4.0, 2.0, 1.0, 3.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,33649,34529,880
1,4.0,35766,40734,4968
2,2.0,40666,46439,5773
3,1.0,41447,52288,10841
4,3.0,41971,50860,8889
5,,31351,0,0



['REGION', 31353]
31353


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[4.0, 3.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,8509,8509,0
1,3.0,145001,176354,31353
2,2.0,33133,33133,0
3,1.0,6854,6854,0
4,,31353,0,0



['BUS_USE', 31356]
31356


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,192749,224105,31356
1,2.0,721,721,0
2,0.0,24,24,0
3,,31356,0,0



['HOSPITAL', 31356]
31356


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 5.0, 6.0, 2.0, 1.0, 8.0, 4.0, 3.0, 9.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,161796,193152,31356
1,5.0,16632,16632,0
2,6.0,1487,1487,0
3,2.0,138,138,0
4,1.0,685,685,0
5,8.0,1283,1283,0
6,4.0,2564,2564,0
7,3.0,8825,8825,0
8,9.0,84,84,0
9,,31356,0,0



['REST_USE', 31356]
31356


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,11211,11244,33
1,1.0,170351,201674,31323
2,2.0,11932,11932,0
3,,31356,0,0



['MAX_VSEV', 31357]
31357


array([2., 2., 2., ..., 2., 0., 2.])

(224850, 78)

[0.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,33820,37841,4021
1,2.0,124790,150407,25617
2,1.0,34883,36602,1719
3,,31357,0,0



['SPEC_USE', 31357]
31357


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,191979,223336,31357
1,2.0,978,978,0
2,0.0,536,536,0
3,,31357,0,0



['PER_TYP', 31357]
31357


array([2., 2., 2., ..., 2., 2., 2.])

(224850, 78)

[2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,134238,163687,29449
1,1.0,59255,61163,1908
2,,31357,0,0



['PJ', 31358]
31358


array([3., 2., 3., ..., 3., 3., 2.])

(224850, 78)

[2.0, 3.0, 1.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,45679,52864,7185
1,3.0,49576,65584,16008
2,1.0,25822,25822,0
3,0.0,29140,31087,1947
4,4.0,43275,49493,6218
5,,31358,0,0



['HIT_RUN', 31361]
31361


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,192513,223874,31361
1,1.0,976,976,0
2,,31361,0,0



['NUMOCCS', 31367]
31367


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[3.0, 5.0, 1.0, 6.0, 2.0, 0.0, 4.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,3.0,98965,124637,25672
1,5.0,22324,22324,0
2,1.0,50886,56581,5695
3,6.0,18308,18308,0
4,2.0,2367,2367,0
5,0.0,492,492,0
6,4.0,141,141,0
7,,31367,0,0



['P_CRASH2', 31371]
31371


array([5., 3., 5., ..., 1., 1., 1.])

(224850, 78)

[0.0, 5.0, 3.0, 4.0, 2.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,36494,40971,4477
1,5.0,35454,43942,8488
2,3.0,43402,54477,11075
3,4.0,24912,24912,0
4,2.0,20955,20955,0
5,1.0,32262,39593,7331
6,,31371,0,0



['WEATHER', 31372]
31372


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 3.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,137851,169223,31372
1,2.0,18927,18927,0
2,3.0,33102,33102,0
3,4.0,2675,2675,0
4,0.0,923,923,0
5,,31372,0,0



['MAN_COLL', 31375]
31375


array([3., 3., 3., ..., 3., 3., 2.])

(224850, 78)

[4.0, 2.0, 1.0, 3.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,4.0,23183,23183,0
1,2.0,57477,65335,7858
2,1.0,27009,30979,3970
3,3.0,78217,97764,19547
4,0.0,7589,7589,0
5,,31375,0,0



['VTRAFWAY', 31376]
31376


array([0., 0., 3., ..., 0., 0., 0.])

(224850, 78)

[2.0, 0.0, 3.0, 4.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,11338,11338,0
1,0.0,86130,114734,28604
2,3.0,42732,44811,2079
3,4.0,15983,16676,693
4,1.0,37291,37291,0
5,,31376,0,0



['EMER_USE', 31377]
31377


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 2.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,192980,224357,31377
1,2.0,198,198,0
2,0.0,295,295,0
3,,31377,0,0



['NUM_INJV', 31378]
31378


array([1., 1., 3., ..., 3., 3., 3.])

(224850, 78)

[1.0, 3.0, 0.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,47159,54775,7616
1,3.0,124626,148388,23762
2,0.0,21669,21669,0
3,2.0,18,18,0
4,,31378,0,0



['SPEEDREL', 31388]
31388


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[1.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,1.0,185231,216619,31388
1,0.0,8231,8231,0
2,,31388,0,0



['TOWED', 31399]
31399


array([2., 2., 2., ..., 0., 2., 2.])

(224850, 78)

[0.0, 2.0, 3.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,71733,79561,7828
1,2.0,112332,135903,23571
2,3.0,8401,8401,0
3,1.0,985,985,0
4,,31399,0,0



['PCRASH5', 31401]
31401


array([3., 3., 3., ..., 3., 3., 3.])

(224850, 78)

[2.0, 3.0, 1.0, 4.0, 0.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,2.0,18170,18170,0
1,3.0,154838,184125,29287
2,1.0,16474,18588,2114
3,4.0,3595,3595,0
4,0.0,372,372,0
5,,31401,0,0



['AIR_BAG', 31418]
31418


array([1., 1., 1., ..., 1., 1., 1.])

(224850, 78)

[0.0, 1.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,40446,40446,0
1,1.0,152986,184404,31418
2,,31418,0,0



['PEDS', 31429]
31429


array([0., 0., 0., ..., 0., 0., 0.])

(224850, 78)

[0.0, 1.0, 2.0]


Unnamed: 0,Value,Original,Imputed,Difference
0,0.0,188349,219778,31429
1,1.0,4899,4899,0
2,2.0,173,173,0
3,,31429,0,0




data_RF.shape
(224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,1,3,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,0,0,1,0,0,3,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,2,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,1,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,5,0,1,1,1,3,0,2
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,1,1,1,4,1,3


## Now do IVEware Imputation

In [11]:
data_IVEware = pd.read_csv('../../Big_Files/data_IVEware.csv')
data_IVEware.drop(columns='Unnamed: 0', inplace=True)

print ('data_Ground_Truth', data_Ground_Truth.shape)
display(data_Ground_Truth.head(10))
print ('data_NaN', data_NaN.shape)
display(data_NaN.head(10))
print ('data_RF', data_RF.shape)
display(data_RF.head(10))
print ('data_Mode', data_Mode.shape)
display(data_Mode.head(10))
print ('data_IVEware', data_IVEware.shape)
display(data_IVEware.head(10))


data_Ground_Truth (224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,1,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,5,2,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,0,0,1,0,8,0,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,1,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,5,0,1,1,1,1,0,3
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,1,1,0,0,1,1,0,0,1,2
8,2,0,3,0,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,5,1,1,1,2,1,0,1,1,0,3,2,1,1,3,0,0
9,2,0,3,0,0,0,0,1,0,2,1,3,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,0,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,1,0,1,1,0,3,2,1,1,3,1,2


data_NaN (224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,0.0,2.0,2.0,1.0,1.0,0.0,2.0,0.0,,1.0,0.0,2.0,4.0,1.0,1.0,2016.0,4.0,2.0,3.0,0.0,4.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,,0.0,5.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,3.0,,,1.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,2.0,0.0,0.0,,0.0,,,2.0,1.0,0.0,3.0,1.0,0.0
1,1.0,0.0,3.0,0.0,0.0,2.0,1.0,0.0,0.0,2.0,2.0,1.0,1.0,,,0.0,0.0,2.0,1.0,3.0,2.0,1.0,0.0,2016.0,3.0,,1.0,0.0,,5.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,1.0,0.0,,1.0,1.0,4.0,6.0,2.0,4.0,3.0,3.0,1.0,,1.0,3.0,1.0,1.0,,,,1.0,,1.0,1.0,,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,1.0,4.0
2,3.0,0.0,2.0,,0.0,2.0,0.0,1.0,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,,0.0,0.0,8.0,0.0,2.0,1.0,5.0,1.0,0.0,,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,,3.0,0.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,0.0,3.0
3,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,8.0,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,,1.0,1.0,5.0,2.0,1.0,,3.0,0.0,0.0,1.0,1.0,5.0,0.0,1.0,1.0,1.0,,0.0,
4,3.0,0.0,2.0,0.0,0.0,2.0,0.0,,0.0,2.0,1.0,2.0,1.0,0.0,1.0,0.0,,1.0,1.0,1.0,1.0,3.0,1.0,2016.0,3.0,2.0,2.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,1.0,,0.0,1.0,0.0,,1.0,,0.0,,0.0,,1.0,5.0,1.0,0.0,0.0,1.0,1.0,,0.0,0.0,0.0,1.0,1.0,1.0,5.0,2.0,1.0,1.0,3.0,0.0,,1.0,,,3.0,1.0,1.0,1.0,4.0,1.0,3.0
5,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,,4.0,2.0,3.0,1.0,1.0,2016.0,2.0,2.0,2.0,0.0,4.0,5.0,1.0,,1.0,1.0,,,0.0,0.0,1.0,0.0,1.0,1.0,1.0,4.0,6.0,2.0,4.0,3.0,3.0,1.0,5.0,1.0,3.0,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,1.0
6,3.0,0.0,3.0,0.0,0.0,,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,,2.0,3.0,1.0,1.0,2016.0,2.0,,2.0,,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,4.0,8.0,0.0,4.0,1.0,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,,2.0,1.0,1.0,1.0,,2.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,1.0,2.0,3.0,1.0,2.0
7,3.0,0.0,3.0,0.0,,2.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,0.0,2.0,0.0,0.0,1.0,4.0,2.0,3.0,,1.0,2016.0,2.0,2.0,2.0,0.0,3.0,5.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,,8.0,0.0,4.0,,1.0,4.0,3.0,1.0,3.0,1.0,2.0,1.0,0.0,2.0,1.0,1.0,1.0,4.0,2.0,,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,,0.0,1.0,2.0
8,2.0,0.0,,,0.0,1.0,1.0,1.0,0.0,2.0,2.0,3.0,1.0,0.0,2.0,,0.0,0.0,1.0,3.0,,1.0,,2016.0,3.0,9.0,0.0,0.0,3.0,1.0,1.0,,1.0,1.0,1.0,1.0,0.0,0.0,1.0,,4.0,1.0,1.0,2.0,0.0,,1.0,3.0,3.0,4.0,3.0,,3.0,1.0,1.0,1.0,0.0,2.0,,1.0,1.0,,1.0,1.0,1.0,2.0,1.0,0.0,1.0,1.0,0.0,3.0,2.0,,1.0,3.0,0.0,0.0
9,2.0,,3.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,,3.0,1.0,,1.0,0.0,0.0,0.0,1.0,3.0,,3.0,1.0,2016.0,3.0,2.0,0.0,0.0,0.0,5.0,1.0,0.0,1.0,1.0,,1.0,,0.0,1.0,0.0,,1.0,0.0,,8.0,2.0,4.0,3.0,3.0,,0.0,0.0,1.0,1.0,1.0,0.0,0.0,,,,2.0,7.0,1.0,1.0,1.0,0.0,,0.0,1.0,1.0,0.0,3.0,2.0,1.0,1.0,3.0,,2.0


data_RF (224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,1,3,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,0,0,1,0,0,3,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,2,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,1,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,1,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,5,0,1,1,1,3,0,2
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,0,2,1,1,0,1,2,1,1,0,2,0,0,0,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,3,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,2,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,1,1,0,0,1,1,1,0,1,2
8,2,0,3,2,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,2,1,1,1,2,1,0,1,1,0,3,2,1,1,3,0,0
9,2,0,3,0,0,0,0,1,0,2,1,3,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,1,1,0,4,8,2,4,3,3,1,0,0,1,1,1,0,0,0,1,1,2,7,1,1,1,0,1,0,1,1,0,3,2,1,1,3,1,2


data_Mode (224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,1,0,5,1,1,0,0,0,0,1,3,1,3,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,0,0,1,0,0,3,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,3,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,3,1,3,1,1,1,0,2,1,1,1,1,1,1,1,0,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,2,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,1,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,5,0,1,1,1,3,0,0
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,1,0,0,0,1,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,0,2,1,1,0,1,2,1,1,0,2,0,0,0,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,3,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,2,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,3,8,0,4,3,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,0,0,1,1,0,0,1,1,1,0,1,2
8,2,0,3,2,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,2,1,1,1,2,1,0,1,1,0,3,2,1,1,3,0,0
9,2,0,3,0,0,0,0,1,0,2,2,3,1,0,1,0,0,0,1,3,3,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,1,1,0,3,8,2,4,3,3,1,0,0,1,1,1,0,0,2,1,1,2,7,1,1,1,0,1,0,1,1,0,3,2,1,1,3,1,2


data_IVEware (224850, 78)


Unnamed: 0,HOUR,INT_HWY,LGT_COND,MONTH,PEDS,PERMVIT,REL_ROAD,RELJCT2,SCH_BUS,URBANICITY,VE_TOTAL,WEATHER,DAY_WEEK,WRK_ZONE,VE_FORMS,PVH_INVL,PERNOTMVIT,NUM_INJ,PSU,PJ,MAN_COLL,HARM_EV,TYP_INT,YEAR,REGION,ALCOHOL,MAX_SEV,RELJCT1,ACC_TYPE,BODY_TYP,BUS_USE,CARGO_BT,DR_PRES,EMER_USE,FIRE_EXP,HAZ_CNO,HAZ_INV,HAZ_PLAC,HAZ_REL,HIT_RUN,IMPACT1,J_KNIFE,M_HARM,MAK_MOD,MAKE,MAX_VSEV,MODEL,NUM_INJV,NUMOCCS,P_CRASH1,P_CRASH2,PCRASH4,PCRASH5,ROLLOVER,SPEC_USE,SPEEDREL,TOW_VEH,TOWED,VALIGN,VEH_ALCH,VPROFILE,VSPD_LIM,VSURCOND,VTCONT_F,VTRAFCON,VTRAFWAY,AIR_BAG,ALC_RES,ALC_STATUS,EJECTION,HOSPITAL,INJ_SEV,PER_TYP,REST_MIS,REST_USE,SEAT_POS,SEX,VEH_AGE
0,3,0,3,0,0,2,1,1,0,2,2,1,1,0,2,0,0,1,0,2,4,1,1,2016,4,2,3,0,4,0,1,0,1,1,1,1,0,0,2,0,5,1,1,0,0,0,0,1,3,5,5,1,2,1,1,1,0,0,1,1,1,0,1,1,1,2,0,0,1,0,5,0,2,1,0,3,1,0
1,1,0,3,0,0,2,1,0,0,2,2,1,1,0,2,0,0,2,1,3,2,1,0,2016,3,2,1,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,0,1,1,1,1,1,1,1,0,0,0,1,1,0,3,2,1,1,3,1,4
2,3,0,2,1,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,1,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,2,1,1,3,0,3
3,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,8,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,5,0,1,1,1,1,0,4
4,3,0,2,0,0,2,0,1,0,2,1,2,1,0,1,0,0,1,1,1,1,3,1,2016,3,2,2,0,0,3,1,0,1,1,1,1,0,0,1,0,1,1,0,0,1,0,2,1,5,1,0,0,1,1,1,0,0,0,1,1,1,5,2,1,1,3,0,0,1,1,0,3,1,1,1,4,1,3
5,3,0,3,0,0,1,1,1,0,1,2,1,1,0,2,0,0,2,4,2,3,1,1,2016,2,2,2,0,4,5,1,0,1,1,1,1,0,0,1,0,1,1,1,4,6,2,4,3,3,1,5,1,3,1,1,1,0,2,1,1,1,4,2,2,1,0,1,0,1,1,0,3,2,1,2,3,1,1
6,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,3,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,1,1,0,1,0,1,1,0,3,2,1,2,3,1,2
7,3,0,3,0,0,2,1,1,1,1,2,1,1,0,2,0,0,1,4,2,3,1,1,2016,2,2,2,0,3,5,2,1,1,1,1,1,0,0,1,0,4,1,1,4,8,0,4,1,1,4,3,1,3,1,2,1,0,2,1,1,1,4,2,2,1,0,0,0,1,1,0,0,1,1,2,0,1,2
8,2,0,3,2,0,1,1,1,0,2,2,3,1,0,2,0,0,0,1,3,3,1,1,2016,3,9,0,0,3,1,1,0,1,1,1,1,0,0,1,0,4,1,1,2,0,2,1,3,3,4,3,1,3,1,1,1,0,2,1,1,1,2,1,1,1,2,1,0,1,1,0,3,2,1,1,3,0,0
9,2,0,3,0,0,0,0,1,0,2,1,3,1,0,1,0,0,0,1,3,1,3,1,2016,3,2,0,0,0,5,1,0,1,1,1,1,0,0,1,0,4,1,0,4,8,2,4,3,3,1,0,0,1,1,1,0,0,2,1,1,2,7,1,1,1,0,1,0,1,1,0,3,2,1,1,3,1,2


In [14]:
Compare_Imputation_Methods_Part_2(
    data_Ground_Truth, data_NaN, data_RF, data_Mode, data_IVEware
)

Compare_Imputation_Methods_Part_2
HOUR 31324 22288 22492 23069 774 22556 22589
HOUR int64 float64 int64 int64 int64

INT_HWY 31335 3527 3527 3765 0 3602 3602
INT_HWY int64 float64 int64 int64 int64

LGT_COND 31292 8146 8146 8397 0 8132 8132
LGT_COND int64 float64 int64 int64 int64

MONTH 31255 18853 20482 20866 6631 20577 20442
MONTH int64 float64 int64 int64 int64

PEDS 31429 795 795 364 0 804 804
PEDS int64 float64 int64 int64 int64

PERMVIT 31342 10696 11936 8141 1558 10493 10914
PERMVIT int64 float64 int64 int64 int64

REL_ROAD 31306 919 2761 417 1859 895 2720
REL_ROAD int64 float64 int64 int64 int64

RELJCT2 31314 9910 18306 7545 9895 10584 18261
RELJCT2 int64 float64 int64 int64 int64

SCH_BUS 31330 129 129 279 0 150 150
SCH_BUS int64 float64 int64 int64 int64

URBANICITY 31232 7776 7773 7938 7 7854 7861
URBANICITY int64 float64 int64 int64 int64

VE_TOTAL 31260 6491 8449 1812 2102 6690 8512
VE_TOTAL int64 float64 int64 int64 int64

WEATHER 31372 8981 8981 10678 0 8823 8823
WEATH

Unnamed: 0,Feature,nNaN,nRF Incorrect,pRF Incorrect,nMode Incorrect,pMode Incorrect,nIVEware Incorrect,pIVEware Incorrect,RF and Mode Different,RF v/s Mode %,RF and IVEware Different,RF v/s IVEware %,Mode and IVEware Different,Mode v/s IVEware %
0,SCH_BUS,31330,129,0,129,0,279,0,0,0,150,0,150,0
1,BUS_USE,31356,113,0,113,0,54,0,0,0,106,0,106,0
2,DR_PRES,31319,8,0,8,0,17,0,0,0,9,0,9,0
3,EMER_USE,31377,95,0,95,0,115,0,0,0,92,0,92,0
4,FIRE_EXP,31309,41,0,41,0,80,0,0,0,39,0,39,0
5,HAZ_CNO,31282,5,0,5,0,2,0,0,0,5,0,5,0
6,HAZ_INV,31322,2,0,4,0,10,0,2,0,8,0,6,0
7,HAZ_PLAC,31349,8,0,10,0,10,0,2,0,2,0,0,0
8,HAZ_REL,31329,9,0,9,0,31325,99,0,0,31327,99,31327,99
9,HIT_RUN,31361,161,0,161,0,322,1,0,0,165,0,165,0



Error RF =  22.17
Error Mode =  28.42
Error IVEware =  23.84
nRF > nMode:  -273273
nRF > nIVEware:  -772
nModel > nIVEware:  -606673
Compare RF to Mode:  46 31 1
Compare RF to IVEware:  49 0 29
Compare Mode to IVEware:  36 1 41

Number of NaN in data_NaN:  2443202
RF Different from Mode:  273351
RF Different from IVEware:  606751
Mode Different from IVEware:  738833


In [None]:
def Main():
    data = Get_Data()
    
#    data_Imputed = Impute_Full(data)
    data_Imputed = Impute_Round_Robin(data)
    data_Imputed.to_csv('../../Big_Files/CRSS_Imputed_All_12_22_22.csv')
#    display(data_Imputed.head(50))
    
    Check(data, data_Imputed)
    display(Audio(sound_file, autoplay=True))
    return 0
Main()