In [1]:
%%latex
\tableofcontents

<IPython.core.display.Latex object>

# Explanation

This second piece of the code gives us guidance in how to bin each of the categorical features into fewer categories.  Binary classification algorithms generally work best with fewer than ten categories per feature, but some of our features have hundreds of unique values.  

## Binning by Human Understanding of Meaning of Codes:  HOSPITAL

Some of the features, like HOSPITAL, we can bin by looking at what each value signifies and putting them together using human understanding of what the codes signify.  We are only interested in whether the crash person went to the hospital; we do not really care how the person got to the hospital, but the CRSS codes differentiate six ways a person might have gone to the hospital and two ways the information can be unknown.  We will bin CRSS codes 1-6 together as "Yes" and codes 8-9 as 'Unknown,' to be imputed later.  

| CRSS Attribute Code | Meaning | Our Bin |
|---|---|---|
| 0  | Not Transported for Treatment  | 0 |
| 1  | EMS Air  | 1 |
| 2  | Law Enforcement  | 1 |
| 3  | EMS Unknown Mode  | 1 |
| 4  | Transported Unknown Source  | 1 |
| 5  | EMS Ground  | 1 |
| 6  | Other  | 1 |
| 8  | Not Reported  | 'Unknown' |
| 9  | Reported as Unknown  | 'Unknown' |

## Binning of Ordered Codes:  HOUR

The crash hours are discrete but ordered.  When binning "similar" times together, we look for "similar" in terms of the likelihood that, at that time of day, a crash person will go to the hospital.  In the table below, crashes at midnight (HOUR = 0) account for 1.5% of crash persons, and 23% of those crash persons went to the hospital.  We see a significant drop between 4 and 7 am in the likelihood that a crash person will go to the hospital, about a 4% drop each hour.  The percentage of crash person going to the hospital stays about the same until 6pm, when it starts to rise again.  Where exactly to cut the bins is a somewhat arbitrary decision.  

| HOUR | % of Crash Persons | % to Hospital | Our Bin | Meaning |
|---|---|---|---|---|
| 0 | 1.344 | 23.0705 | 6 | Late Night |
| 1 | 1.0665 | 26.5419 | 6 | Late Night |
| 2 | 0.9483 | 26.6044 | 6 | Late Night |
| 3 | 0.7306 | 26.8372 | 6 | Late Night |
| 4 | 0.7089 | 25.6741 | 6 | Late Night |
| 5 | 1.1962 | 21.2548 | 0 | Early Morning |
| 6 | 2.4013 | 17.1788 | 0 | Early Morning |
| 7 | 4.6948 | 13.1665 | 1 | Morning |
| 8 | 4.5183 | 13.3792 | 1 | Morning |
| 9 | 3.8258 | 14.3828 | 1 | Morning |
| 10 | 4.079 | 14.8444 | 1 | Morning |
| 11 | 5.0904 | 14.1232 | 2 | Mid-Day |
| 12 | 6.2797 | 13.4761 | 2 | Mid-Day |
| 13 | 6.2852 | 14.0212 | 2 | Mid-Day |
| 14 | 7.0741 | 14.1841 | 2 | Mid-Day |
| 15 | 8.6077 | 12.9617 | 3 | Rush Hour |
| 16 | 8.6935 | 13.3526 | 3 | Rush Hour |
| 17 | 9.3121 | 12.6166 | 3 | Rush Hour |
| 18 | 7.0147 | 14.0458 | 4 | Early Evening |
| 19 | 4.8031 | 16.2731 | 4 | Early Evening |
| 20 | 3.7818 | 17.9284 | 5 | Evening |
| 21 | 3.2342 | 18.7128 | 5 | Evening |
| 22 | 2.4795 | 20.4524 | 5 | Evening |
| 23 | 1.8303 | 22.846 | 6 | Late Night |![image.png](attachment:image.png)

## Automated Binning:  BODY_TYP

The CRSS dataset in these six years differentiates 68 different vehicle body types.  Some of them, like "4: 4-Door Sedan, Hardtop" and "14: Compact Utility" are common, with 36% and 16% of crash persons, respectively.  Some like "21: Large Van" are less common (1%), and some are rare, like "32: Pickup With Slide-in Camper (2016-2017 Only)" (0.0007%).  

We want to put the 68 codes into about five bins by likelihood of going to the hospital.

To bin a feature like BODY_TYP, the code in this notebook orders the CRSS codes by proportion of crash persons going to the hospital, then assigns the codes to about five bins so that approximately the same number of crash persons are in each bin.  Large categories like "4: 4-Door Sedan, Hardtop" will be their own bin.  

Unsurprisingly, many of the codes in the most dangerous bin are motorcycles, and many of the codes in the least dangerous bin are large trucks.  

The table below shows some of the data that the code below considers when cutting the bins.  CRSS codes 4 and 14 are large enough to get their own bins.  Codes 20 and 34 are not large enough to get their own bins, but too large to be in the same bin.  This notebook ouputs such a table for each feature in $\LaTeX$ format.

| BODY_TYP | % of Crash Persons | % to Hospital | Our Bin |
|---|---|---|---|
| 86 | 0.0003 | 100.0000 | 0 |
| ... | | | |
| 1 | 0.6528 | 16.2667 | 0 |
| 2 | 3.0509 | 16.085 | 0 |
| 19 | 0.9264 | 15.9881 | 0 |
| 52 | 0.1562 | 15.9703 | 0 |
| 59 | 0.0309 | 15.9624 | 0 |
|||||
| 4 | 36.1961 | 15.9386 | 1 |
|||||
| 30 | 0.3609 | 15.7154 | 2 |
| 5 | 2.5745 | 14.8298 | 2 |
| 9 | 2.8609 | 14.6233 | 2 |
| 10 | 0.0142 | 14.2857 | 2 |
| 91 | 0.001 | 14.2857 | 2 |
| 6 | 5.2201 | 14.1666 | 2 |
| 16 | 0.2965 | 14.09 | 2 |
|||||
| 14 | 16.1724 | 13.823 | 3 |
|||||
| 22 | 0.0132 | 13.1868 | 4 |
| 20 | 4.1037 | 12.9021 | 4 |
| 40 | 0.0717 | 12.5506 | 4 |
|||||
| 34 | 9.824 | 11.7167 | 5 |
| 29 | 0.2 | 11.4576 | 5 |
| 15 | 5.4602 | 11.2749 | 5 |
| 31 | 1.4085 | 11.2358 | 5 |
| 17 | 0.0149 | 10.6796 | 5 |
| ... | | | |
| 41 | 0.0001 | 0.0000 | 5 |

The cell below is actual output from this notebook, in a format we could cut and paste into the next notebook.  The comments show the percentage of crash persons in each bin.

In [2]:
A = [
        ['0', [86,87,82,83,89,81,84,80,88,85,90,11,96,95,97,45,58,12,32,8,42,3,1,2,19,52,59,]], #  9.0438 %
        ['1', [4,]], #  36.1961 %
        ['2', [30,5,9,10,91,6,16,]], #  11.3281 %
        ['3', [14,]], #  16.1724 %
        ['4', [22,20,40,]], #  14.0126 %
        ['5', [34,29,15,31,17,39,55,28,21,93,92,48,50,7,51,61,67,63,62,66,65,78,64,72,60,71,73,94,41,]], #  13.2471 %
        ['Unknowns', [98, 99, 49, 79, ]]
    ]

# Setup
## Import Libraries

In [3]:
import sys, copy, math, time

print ('Python version: {}'.format(sys.version))

from IPython.display import display, HTML

from collections import Counter

import numpy as np
print ('NumPy version: {}'.format(np.__version__))
np.set_printoptions(suppress=True)

import pandas as pd
print ('Pandas version:  {}'.format(pd.__version__))
pd.set_option('display.max_rows', 500)

print ('Finished Importing Libraries')



Python version: 3.9.16 (main, Dec  7 2022, 10:02:13) 
[Clang 14.0.0 (clang-1400.0.29.202)]
NumPy version: 1.24.0
Pandas version:  1.5.2
Finished Importing Libraries


## Import Data

In [4]:
def Import_Data():
    print ('Import_Data()')
    filename = '../../Big_Files/CRSS_Merged_Raw_Data.csv'
    data = pd.read_csv(filename, index_col=None)
    
    print ('data.shape: ', data.shape)
    
    return data

#Import_Data()


# Tools

## Narrow_Dataset()

In [5]:
def Narrow_Dataset(data, Features):
    print ('Narrow_Dataset()')
    data_narrow = pd.DataFrame()
  

    for f in Features:
        data_narrow[f] = data[f]
        
    data_narrow = data_narrow.reindex(sorted(data_narrow.columns), axis=1)    
    
    print ()
    return data_narrow

## Feature_Names()

In [6]:
def Feature_Names(data, Named_Features):
    print ('Feature_Names')
    D = {}
    for f in Named_Features:
        g = f + 'NAME'
        A = pd.concat([data[f],data[g]], axis=1)
        A.drop_duplicates(inplace=True)
        A.dropna(inplace=True)
#        print (f)
#        print (len(A))
#        print (A.head())
#        print ()
        B = dict(zip(A[f],A[g]))
        D[f] = B
#        print (B)
#        print ()
#    print (D)
    print ()
    return D
        

## Remove_Unknowns_in_Feature()

In [7]:
def Remove_Unknowns_in_Feature(data, feature):
    
    Unknowns = {     
    # Accident
        'DAY_WEEK': [9],
        'HOUR': [99],
        'INT_HWY': [9],
        'LGT_COND': [8,9],
#        'MAN_COLL': [98,99],
        'MONTH': [],
        'REL_ROAD': [98,99],
        'RELJCT2': [98,99],
        'TYP_INT': [98,99],
        'WEATHER': [98,99],
    # Vehicle
        'BDYTYP_IM': [],
        'BODY_TYP': [98, 99, 49, 79],
        'BUS_USE': [98, 99],
        'DR_ZIP': [9998, 9999],
        'EMER_USE': [8, 9],
        'MAKE': [99],
        'MOD_YEAR': [9998, 9999],
        'MODEL': [],
        'NUMOCCS': [99],
        'VALIGN': [8, 9],
        'VNUM_LAN': [8, 9],
        'VPROFILE': [8, 9],
        'VSPD_LIM': [98, 99],
        'VSURCOND': [98, 99],
        'VTRAFCON': [97, 99],
        'VTRAFWAY': [8, 9],
    # Person
        'SEX_IM': [],
        'AGE': [998,999,],
        'HOSPITAL': [],
        'LOCATION': [98,99,],
        'PER_TYP': [],        
    }
    

    if feature in Unknowns.keys():
        print ('Remove_Unknowns_in_Feature ', feature, Unknowns[feature], len(data[data[feature].isin(Unknowns[feature])]), ' unknown')
        data_temp = data[~data[feature].isin(Unknowns[feature])]
        return data_temp, Unknowns[feature]
    else:
        data_temp = data
        return data_temp, []
    print ()
    


## Correlation()

In [8]:
def Correlation(data, target, feature, value, name):
    TN = 0
    FP = 0
    FN = 0
    TP = 0
    contingency_matrix = pd.crosstab(data[target], data[feature])
    cm = contingency_matrix.values.tolist()
    if len(cm)==2 and len(cm[0])==2:
        corr = cm[1][1] / (cm[0][1] + cm[1][1])
        per = (cm[0][1] + cm[1][1])/(cm[0][0] + cm[0][1] + cm[1][0] + cm[1][1])
    else:
        corr = 0
        per = 0
    per = round(per*100,4)
    corr = round(corr*100,4)
#    print ("    - ", feature)
#    print ("    - ", value)
#    print ("    - ", name)
#    print (contingency_matrix)
#    print ('        - per = ', per)
#    print ("        - corr = ", corr)
#    print ()
    return (per, corr)

def Correlation_by_Value(data, target, feature, Feature_Names_Dict, Unknowns):
# I decided against the np.unique because it treats each nan as a separate entry.
#    V = np.unique(data[feature].values) 
    V = data[feature].unique()
#    print (V)
    B = []

    for value in V:
        A = pd.DataFrame()
        A[feature] = data[feature].apply(lambda x: 1 if x==value else 0)
        A[target] = data[target]
        if feature in Feature_Names_Dict:
            if value in Feature_Names_Dict[feature]:
                name = Feature_Names_Dict[feature][value]
            else:
                name=str(value)
        else:
            name = str(value)
#        if len(name)>30:
#            name = name[:30]
        per, corr = Correlation(A, target, feature, value, name)
        B.append([feature, value, name, per, corr])
#    print (feature)
    B = sorted(B, key=lambda x:x[4], reverse=True)
    for b in B:
        c = b[1]
        try:
            c = int(c)
        except:
            c=c
        else:
            c = int(c)
#        print (c, end=',')
#    print ()
#    print ()

    # Print grouped into 100/p blocks of same size
    print ("    feature = '%s'" % feature)
    print ('    A = [')
    p = 20
    s = 0.0
    s2 = 0.0
    n=0
    print ("        ['%d', [" % n , end='')
    for b in B:
        t = s + b[3]
        if b[3]<10:
            s2 = s2 + b[3]
        q = int(s/p)
        r = int((t-0.001)/p)
        if r>q or b[3]>10:
            print ("]], # ", round(s2,4), '%')
            s2 = 0.0
            n += 1
            print ("        ['%d', [" % n , end='')
        s = t
        
        c = b[1]
        try:
            c = int(c)
        except:
            c=c
        else:
            c = int(c)
        print (c, end=',')
        if b[3]>10:
            print ("]], # ", round(b[3],4), '%')
            s2=0.0
            n += 1
            print ("        ['%d', [" % n , end='')
    print ("]], # ", round(s2,4), '%')
    print ("        ['Unknowns', [", end='')
    for u in Unknowns:
        print (u, end=', ')
    print ("]]" )
    print ('    ]')
    print ('    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)')
    print ()
    
    C = pd.DataFrame(B)
    C.columns = ['Feature', 'Code', 'Name', 'Per', 'Corr']
#    C.drop(C[C['Per'] < 0.1].index, inplace=True)
#    print (C)
    display(C)

    TeX = open('../Correlation/Correlation_' + feature + '.tex', 'w')
    E = [c for c in B if c[3]>=0.0]
    
        
    
    for c in E:
        a = c[0]
        b = c[1]
        d = c[2]
        e = "{:.4f}".format(c[3])
        f = "{:.4f}".format(c[4])
        TeX.write('\t & \\verb|%s| & %s & %s & %s & %s \\cr\n' % (a,b,d,e,f))
    

    TeX = open('../Correlation/Correlation_Ordered_' + feature + '.tex', 'w')
    E = sorted(B, key=lambda x:x[1], reverse=False)

    
    for c in E:
        a = c[0]
        b = c[1]
        d = c[2]
        e = "{:.4f}".format(c[3])
        f = "{:.4f}".format(c[4])
        TeX.write('\t & \\verb|%s| & %s & %s & %s & %s \\cr\n' % (a,b,d,e,f))
    

    print ()
    return B

def Correlation_All(data, target, Feature_Names_Dict):
    print ('Correlation_All')
    
    C = []
    for feature in data:
        data_temp, Unknowns = Remove_Unknowns_in_Feature(data, feature)
        U = data_temp[feature].unique()
#        print (feature, len(U))
        if len(U)<10000:
            B = Correlation_by_Value(
                data_temp, target, feature, Feature_Names_Dict, Unknowns
            )
            for b in B:
                C.append(b)
#            print ()
#        print ()
#    for c in C:
#        print (c)
#    print ()
    C = sorted(C, key=lambda x:x[4], reverse=True)
    D = pd.DataFrame(C)
    D.columns = ['Feature', 'Code', 'Name', 'Per', 'Corr']
    print (D)
    print ()
    
    D.drop(D[D['Per'] < 0.5].index, inplace=True)
    print (D)
    print ()
    
    TeX = open('../Correlation/Correlation.tex', 'w')
    E = [c for c in C if c[3]>=0.5]
    
    for c in E:
        a = c[0]
        b = c[1]
        d = c[2]
        e = "{:.4f}".format(c[3])
        f = "{:.4f}".format(c[4])
        TeX.write('\\verb|%s| & %s & %s & %s & %s \\cr\n' % (a,b,d,e,f))
    
    return 0

    

# Main()

In [9]:
def Main():
    target = 'HOSPITAL'
    data = Import_Data()
    
    Features = [
    # Accident Dataset
        'DAY_WEEK',
        'HOUR',
        'INT_HWY',
        'LGT_COND',
        'MONTH',
#        'PEDS',
        'PERMVIT',
        'PERNOTMVIT',
        'PJ',
        'PSU',
        'PVH_INVL',
        'REGION',
        'REL_ROAD',
        'RELJCT1',
        'RELJCT2',
        'SCH_BUS',
        'TYP_INT',
        'URBANICITY',
        'VE_FORMS',
        'VE_TOTAL',
        'WEATHER',
        'WRK_ZONE',
        'YEAR',
    # Vehicle Dataset
        'BODY_TYP',
        'BUS_USE',
        'DR_ZIP',
        'EMER_USE',
        'MAKE',
        'MOD_YEAR',
        'MODEL',
        'NUMOCCS',
        'VALIGN',
        'VNUM_LAN',
        'VPROFILE',
        'VSPD_LIM',
        'VSURCOND',
        'VTRAFCON',
        'VTRAFWAY',
    # Person Dataset
        'AGE',
        'HOSPITAL',
        'LOCATION',
        'PER_TYP',
        'SEX',
    ]

    data = Narrow_Dataset(data, Features)
    
    print ('Features in data, with Number of Unique Values and Number of Blank Values')
    for feature in data:
        U = data[feature].unique()
        s = data[feature].isna().sum()
        print (feature, len(U), s)
    print ()
        
#    Feature_Names_Dict = Feature_Names(data, Features)
    Feature_Names_Dict = {}

#    print (Feature_Names_Dict)
    
    Correlation_All(data, target, Feature_Names_Dict)


Main()

Import_Data()
data.shape:  (713566, 118)
Narrow_Dataset()

Features in data, with Number of Unique Values and Number of Blank Values
AGE 116 0
BODY_TYP 72 0
BUS_USE 9 0
DAY_WEEK 7 0
DR_ZIP 18508 0
EMER_USE 8 0
HOSPITAL 2 0
HOUR 25 0
INT_HWY 3 0
LGT_COND 9 0
LOCATION 1 0
MAKE 70 0
MODEL 141 0
MOD_YEAR 85 0
MONTH 12 0
NUMOCCS 65 0
PERMVIT 25 0
PERNOTMVIT 8 0
PER_TYP 3 0
PJ 424 0
PSU 60 0
PVH_INVL 12 0
REGION 4 0
RELJCT1 4 0
RELJCT2 15 0
REL_ROAD 13 0
SCH_BUS 2 0
SEX 4 0
TYP_INT 11 0
URBANICITY 2 0
VALIGN 7 0
VE_FORMS 14 0
VE_TOTAL 14 0
VNUM_LAN 10 0
VPROFILE 9 0
VSPD_LIM 20 0
VSURCOND 13 0
VTRAFCON 19 0
VTRAFWAY 9 0
WEATHER 13 0
WRK_ZONE 5 0
YEAR 6 0

Correlation_All
Remove_Unknowns_in_Feature  AGE [998, 999] 42380  unknown
    feature = 'AGE'
    A = [
        ['0', [106,119,105,117,100,108,118,94,91,98,90,92,116,101,82,89,87,83,97,88,86,84,95,85,81,77,79,76,78,80,74,96,75,71,93,70,72,73,57,63,62,54,66,68,67,60,58,69,65,61,64,51,]], #  20.5399 %
        ['1', [55,32,52,56,59,53,33,35,30

Unnamed: 0,Feature,Code,Name,Per,Corr
0,AGE,106,106,0.0001,100.0
1,AGE,119,119,0.001,71.4286
2,AGE,105,105,0.0006,50.0
3,AGE,117,117,0.0003,50.0
4,AGE,100,100,0.0013,33.3333
5,AGE,108,108,0.0004,33.3333
6,AGE,118,118,0.0004,33.3333
7,AGE,94,94,0.013,27.5862
8,AGE,91,91,0.0353,27.0042
9,AGE,98,98,0.0028,26.3158



Remove_Unknowns_in_Feature  BODY_TYP [98, 99, 49, 79] 24194  unknown
    feature = 'BODY_TYP'
    A = [
        ['0', [86,87,82,83,89,81,84,80,88,85,90,11,96,95,97,45,58,12,32,8,42,3,1,2,19,52,59,]], #  9.0438 %
        ['1', [4,]], #  36.1961 %
        ['2', [30,5,9,10,91,6,16,]], #  11.3281 %
        ['3', [14,]], #  16.1724 %
        ['4', [22,20,40,]], #  14.0126 %
        ['5', [34,29,15,31,17,39,55,28,21,93,92,48,50,7,51,61,67,63,62,66,65,78,64,72,60,71,73,94,41,]], #  13.2471 %
        ['Unknowns', [98, 99, 49, 79, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,BODY_TYP,86,86,0.0003,100.0
1,BODY_TYP,87,87,0.0009,83.3333
2,BODY_TYP,82,82,0.0119,69.5122
3,BODY_TYP,83,83,0.0421,67.5862
4,BODY_TYP,89,89,0.1387,66.7364
5,BODY_TYP,81,81,0.102,65.2916
6,BODY_TYP,84,84,0.1584,62.9121
7,BODY_TYP,80,80,2.3416,62.2476
8,BODY_TYP,88,88,0.0276,60.0
9,BODY_TYP,85,85,0.0122,52.381



Remove_Unknowns_in_Feature  BUS_USE [98, 99] 7042  unknown
    feature = 'BUS_USE'
    A = [
        ['0', [5,6,]], #  0.1823 %
        ['1', [0,]], #  99.5662 %
        ['2', [7,8,1,4,]], #  0.2515 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,BUS_USE,5,5,0.0178,30.9524
1,BUS_USE,6,6,0.1645,16.6093
2,BUS_USE,0,0,99.5662,15.2748
3,BUS_USE,7,7,0.0164,13.7931
4,BUS_USE,8,8,0.0076,9.2593
5,BUS_USE,1,1,0.2194,8.1935
6,BUS_USE,4,4,0.0081,5.2632



Remove_Unknowns_in_Feature  DAY_WEEK [9] 0  unknown
    feature = 'DAY_WEEK'
    A = [
        ['0', []], #  0.0 %
        ['1', [1,]], #  11.306 %
        ['2', []], #  0.0 %
        ['3', [7,]], #  13.834 %
        ['4', []], #  0.0 %
        ['5', [2,]], #  13.7551 %
        ['6', []], #  0.0 %
        ['7', [4,]], #  14.6777 %
        ['8', []], #  0.0 %
        ['9', [5,]], #  15.014 %
        ['10', []], #  0.0 %
        ['11', [3,]], #  14.3922 %
        ['12', []], #  0.0 %
        ['13', [6,]], #  17.0209 %
        ['14', []], #  0.0 %
        ['Unknowns', [9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,DAY_WEEK,1,1,11.306,18.2198
1,DAY_WEEK,7,7,13.834,16.7877
2,DAY_WEEK,2,2,13.7551,14.7241
3,DAY_WEEK,4,4,14.6777,14.4718
4,DAY_WEEK,5,5,15.014,14.4164
5,DAY_WEEK,3,3,14.3922,14.2369
6,DAY_WEEK,6,6,17.0209,14.0052



Remove_Unknowns_in_Feature  DR_ZIP [9998, 9999] 0  unknown
Remove_Unknowns_in_Feature  EMER_USE [8, 9] 5064  unknown
    feature = 'EMER_USE'
    A = [
        ['0', [6,5,]], #  0.197 %
        ['1', [0,]], #  99.6821 %
        ['2', [3,4,2,]], #  0.1208 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,EMER_USE,6,6,0.071,21.0736
1,EMER_USE,5,5,0.126,17.6932
2,EMER_USE,0,0,99.6821,15.2171
3,EMER_USE,3,3,0.0113,13.75
4,EMER_USE,4,4,0.0189,12.6866
5,EMER_USE,2,2,0.0906,11.215



Remove_Unknowns_in_Feature  HOSPITAL [] 0  unknown
    feature = 'HOSPITAL'
    A = [
        ['0', []], #  0.0 %
        ['1', [0,]], #  15.1291 %
        ['2', []], #  0.0 %
        ['3', [1,]], #  15.1291 %
        ['4', []], #  0.0 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,HOSPITAL,0,0,15.1291,100.0
1,HOSPITAL,1,1,15.1291,100.0



Remove_Unknowns_in_Feature  HOUR [99] 2084  unknown
    feature = 'HOUR'
    A = [
        ['0', [3,2,1,4,0,23,5,22,21,20,6,]], #  24.5247 %
        ['1', [19,10,9,14,]], #  20.0693 %
        ['2', [11,18,13,]], #  19.5796 %
        ['3', [12,8,16,]], #  17.9066 %
        ['4', [7,15,17,]], #  17.9198 %
        ['Unknowns', [99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,HOUR,3,3,0.7306,26.8372
1,HOUR,2,2,0.9483,26.6044
2,HOUR,1,1,1.0665,26.5419
3,HOUR,4,4,0.7089,25.6741
4,HOUR,0,0,1.344,23.0705
5,HOUR,23,23,1.8303,22.846
6,HOUR,5,5,1.1962,21.2548
7,HOUR,22,22,2.4795,20.4524
8,HOUR,21,21,3.2342,18.7128
9,HOUR,20,20,3.7818,17.9284



Remove_Unknowns_in_Feature  INT_HWY [9] 66  unknown
    feature = 'INT_HWY'
    A = [
        ['0', []], #  0.0 %
        ['1', [0,]], #  89.3153 %
        ['2', []], #  0.0 %
        ['3', [1,]], #  10.6847 %
        ['4', []], #  0.0 %
        ['Unknowns', [9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,INT_HWY,0,0,89.3153,15.1501
1,INT_HWY,1,1,10.6847,14.9564



Remove_Unknowns_in_Feature  LGT_COND [8, 9] 4049  unknown
    feature = 'LGT_COND'
    A = [
        ['0', [2,4,]], #  9.4447 %
        ['1', [3,]], #  15.6344 %
        ['2', [5,6,]], #  2.9786 %
        ['3', [1,]], #  71.9161 %
        ['4', [7,]], #  0.0261 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,LGT_COND,2,2,8.1829,22.7854
1,LGT_COND,4,4,1.2618,17.6812
2,LGT_COND,3,3,15.6344,16.9451
3,LGT_COND,5,5,2.4088,15.5052
4,LGT_COND,6,6,0.5698,15.1125
5,LGT_COND,1,1,71.9161,13.8767
6,LGT_COND,7,7,0.0261,12.973



Remove_Unknowns_in_Feature  LOCATION [98, 99] 0  unknown
    feature = 'LOCATION'
    A = [
        ['0', [0,]], #  0.0 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,LOCATION,0,0,0,0



Remove_Unknowns_in_Feature  MAKE [99] 13241  unknown
    feature = 'MAKE'
    A = [
        ['0', [71,74,76,73,72,50,77,53,98,64,65,43,21,9,92,22,52,14,24,18,37,63,]], #  28.0121 %
        ['1', [35,55,6,39,36,13,]], #  6.6778 %
        ['2', [20,]], #  12.6246 %
        ['3', [69,67,34,]], #  1.5972 %
        ['4', [49,]], #  11.6921 %
        ['5', []], #  0.0 %
        ['6', [12,]], #  13.267 %
        ['7', [19,41,30,58,]], #  8.0808 %
        ['8', [2,42,54,7,25,47,93,23,59,48,3,38,62,32,51,45,31,29,90,10,94,86,84,82,85,89,87,97,33,46,1,]], #  18.0481 %
        ['Unknowns', [99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,MAKE,71,71,0.0257,66.6667
1,MAKE,74,74,0.0026,66.6667
2,MAKE,76,76,0.3551,64.0933
3,MAKE,73,73,0.2804,62.3727
4,MAKE,72,72,0.9141,62.1368
5,MAKE,50,50,0.0297,60.0962
6,MAKE,77,77,0.0206,52.7778
7,MAKE,53,53,0.4529,46.9735
8,MAKE,98,98,0.6768,32.9325
9,MAKE,64,64,0.0019,30.7692



Remove_Unknowns_in_Feature  MODEL [] 0  unknown
    feature = 'MODEL'
    A = [
        ['0', [63,709,704,703,701,706,705,702,707,799,734,733,907,739,56,16,12,11,4,19,471,29,399,59,22,445,20,50,9,6,52,3,37,408,18,43,7,480,]], #  20.3837 %
        ['1', [2,21,13,17,406,1,36,38,15,988,25,444,39,31,48,32,424,47,27,35,998,]], #  20.5006 %
        ['2', [23,33,989,498,431,34,405,404,40,28,51,26,44,401,407,]], #  21.8534 %
        ['3', [402,499,46,54,24,42,442,49,443,425,473,55,403,41,53,398,441,472,446,423,]], #  21.123 %
        ['4', [481,421,14,999,422,45,470,5,463,482,8,983,461,10,462,870,981,58,982,483,883,809,57,882,426,880,881,806,898,850,884,808,804,997,890,732,908,466,805,598,902,731,599,904,474,60,427,]], #  16.1392 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,MODEL,63,63,0.0004,66.6667
1,MODEL,709,709,0.5265,65.9303
2,MODEL,704,704,0.0345,64.2276
3,MODEL,703,703,0.1806,62.8394
4,MODEL,701,701,0.0988,62.1277
5,MODEL,706,706,1.4322,61.7808
6,MODEL,705,705,0.4365,61.7335
7,MODEL,702,702,0.0203,60.6897
8,MODEL,707,707,0.0085,60.6557
9,MODEL,799,799,0.0013,55.5556



Remove_Unknowns_in_Feature  MOD_YEAR [9998, 9999] 19387  unknown
    feature = 'MOD_YEAR'
    A = [
        ['0', [1929,1947,1962,1951,1961,1968,1955,1953,1956,1975,1982,1974,1965,1978,1970,1960,1959,1950,1986,1966,1964,1973,1985,1983,1976,1981,1987,1980,1984,1977,1979,1971,1991,1993,1992,1988,1994,1990,1931,1948,1954,1998,1989,1996,1997,2002,1995,2001,2000,1999,2004,]], #  20.7522 %
        ['1', [2003,2005,2006,2007,2020,]], #  19.3813 %
        ['2', [2019,2009,2008,2017,2018,]], #  20.1999 %
        ['3', [2021,2016,1969,1940,2015,]], #  20.7722 %
        ['4', [2013,2014,2022,2012,2011,2010,1967,1972,1957,1952,1928,1932,1933,1963,1958,1934,1930,]], #  18.8937 %
        ['Unknowns', [9998, 9999, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,MOD_YEAR,1929,1929,0.0001,100.0
1,MOD_YEAR,1947,1947,0.0003,100.0
2,MOD_YEAR,1962,1962,0.0004,66.6667
3,MOD_YEAR,1951,1951,0.0007,60.0
4,MOD_YEAR,1961,1961,0.0003,50.0
5,MOD_YEAR,1968,1968,0.0048,48.4848
6,MOD_YEAR,1955,1955,0.001,42.8571
7,MOD_YEAR,1953,1953,0.001,42.8571
8,MOD_YEAR,1956,1956,0.0007,40.0
9,MOD_YEAR,1975,1975,0.0052,38.8889



Remove_Unknowns_in_Feature  MONTH [] 0  unknown
    feature = 'MONTH'
    A = [
        ['0', [7,6,]], #  26.0073 %
        ['1', [8,5,]], #  15.191 %
        ['2', [4,9,3,]], #  26.5887 %
        ['3', [10,2,]], #  14.4245 %
        ['4', [1,11,12,]], #  17.7886 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,MONTH,7,7,8.5052,16.6255
1,MONTH,6,6,8.3279,16.3584
2,MONTH,8,8,9.1742,15.8942
3,MONTH,5,5,8.0316,15.7038
4,MONTH,4,4,7.1594,15.6224
5,MONTH,9,9,9.0999,15.5111
6,MONTH,3,3,7.5735,14.7533
7,MONTH,10,10,9.9153,14.6625
8,MONTH,2,2,7.0085,14.3631
9,MONTH,1,1,7.416,14.3448



Remove_Unknowns_in_Feature  NUMOCCS [99] 24156  unknown
    feature = 'NUMOCCS'
    A = [
        ['0', [59,26,35,37,31,13,10,33,8,14,20,27,7,]], #  0.5728 %
        ['1', [2,]], #  24.9571 %
        ['2', [6,]], #  0.8224 %
        ['3', [1,]], #  54.4133 %
        ['4', [11,21,]], #  0.0338 %
        ['5', [3,]], #  10.4153 %
        ['6', [5,38,16,4,12,9,17,19,34,25,28,24,43,49,23,15,29,22,18,40,32,55,53,50,44,51,30,39,41,75,47,95,52,54,62,60,56,58,46,65,57,48,36,45,77,]], #  8.7846 %
        ['Unknowns', [99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,NUMOCCS,59,59,0.001,85.7143
1,NUMOCCS,26,26,0.0051,74.2857
2,NUMOCCS,35,35,0.0023,43.75
3,NUMOCCS,37,37,0.0016,27.2727
4,NUMOCCS,31,31,0.0022,26.6667
5,NUMOCCS,13,13,0.0088,22.9508
6,NUMOCCS,10,10,0.0273,22.8723
7,NUMOCCS,33,33,0.0015,20.0
8,NUMOCCS,8,8,0.1562,18.2916
9,NUMOCCS,14,14,0.0138,17.8947



    feature = 'PERMVIT'
    A = [
        ['0', [29,1,21,22,14,20,13,8,15,9,10,7,]], #  14.2472 %
        ['1', [4,]], #  14.3516 %
        ['2', []], #  0.0 %
        ['3', [3,]], #  23.4672 %
        ['4', []], #  9.0994 %
        ['5', [5,6,11,]], #  5.4105 %
        ['6', [2,]], #  33.1765 %
        ['7', [12,24,16,19,18,17,75,]], #  0.2478 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,PERMVIT,29,29,0.0041,96.5517
1,PERMVIT,1,1,8.6581,28.0313
2,PERMVIT,21,21,0.0029,19.0476
3,PERMVIT,22,22,0.0031,18.1818
4,PERMVIT,14,14,0.0471,17.8571
5,PERMVIT,20,20,0.014,17.0
6,PERMVIT,13,13,0.0893,16.9545
7,PERMVIT,8,8,1.4261,16.2146
8,PERMVIT,15,15,0.0441,15.873
9,PERMVIT,9,9,0.8072,15.1042



    feature = 'PERNOTMVIT'
    A = [
        ['0', [3,2,6,4,7,1,]], #  0.3414 %
        ['1', [0,]], #  99.6555 %
        ['2', [5,]], #  0.0029 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,PERNOTMVIT,3,3,0.013,21.5054
1,PERNOTMVIT,2,2,0.0556,20.1511
2,PERNOTMVIT,6,6,0.0007,20.0
3,PERNOTMVIT,4,4,0.0078,19.6429
4,PERNOTMVIT,7,7,0.0022,18.75
5,PERNOTMVIT,1,1,0.2621,16.3636
6,PERNOTMVIT,0,0,99.6555,15.1218
7,PERNOTMVIT,5,5,0.0029,14.2857



Remove_Unknowns_in_Feature  PER_TYP [] 0  unknown
    feature = 'PER_TYP'
    A = [
        ['0', [9,]], #  0.0226 %
        ['1', [2,]], #  26.5073 %
        ['2', []], #  0.0 %
        ['3', [1,]], #  73.4701 %
        ['4', []], #  0.0 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,PER_TYP,9,9,0.0226,21.7391
1,PER_TYP,2,2,26.5073,15.8586
2,PER_TYP,1,1,73.4701,14.8639



    feature = 'PJ'
    A = [
        ['0', [3087,149,147,2904,3069,3292,47,3090,1225,598,2800,610,2705,1645,2412,587,3089,2211,453,2171,2514,2764,2139,2537,1222,1688,1801,2679,4113,1741,2582,2722,189,307,2298,2775,85,172,2330,4047,1223,4144,4056,1362,2793,1231,299,173,96,1684,1634,1070,1838,3224,1750,256,1392,3262,1227,2160,1747,91,1219,1055,4147,2591,308,1802,2735,3106,171,3076,1460,542,2586,86,2513,4107,46,305,1766,1315,565,1308,606,1804,2001,313,1805,1230,3077,2881,1800,3247,718,2286,1692,2883,311,3073,1829,2035,295,839,1709,4135,2592,268,1757,1803,1921,234,4149,1678,2702,1051,618,209,]], #  20.3832 %
        ['1', [4152,2906,2905,2087,1053,2018,260,306,526,4125,1762,2749,4016,250,2607,2972,2292,205,2854,2973,1290,3122,4015,97,359,245,285,2803,970,458,1708,4150,1693,2670,297,4045,2809,261,1197,2811,206,1036,1811,267,1733,455,4151,1714,148,640,1695,2598,4153,161,174,591,625,322,4114,4138,1259,1764,4141,2365,3004,1723,4148,92,2034,2807,1746,3296,315,432,3139,]], #  20.2258 %
        

Unnamed: 0,Feature,Code,Name,Per,Corr
0,PJ,3087,3087,0.0041,51.7241
1,PJ,149,149,0.0011,50.0
2,PJ,147,147,0.0028,45.0
3,PJ,2904,2904,0.0048,44.1176
4,PJ,3069,3069,0.0352,43.4263
5,PJ,3292,3292,0.0395,42.9078
6,PJ,47,47,0.0762,42.2794
7,PJ,3090,3090,0.0077,40.0
8,PJ,1225,1225,0.0349,38.9558
9,PJ,598,598,0.0741,38.7524



    feature = 'PSU'
    A = [
        ['0', [15,75,57,40,52,64,76,80,34,68,66,50,60,24,63,17,55,]], #  20.6357 %
        ['1', [47,62,53,10,49,30,48,82,35,31,25,72,]], #  20.5044 %
        ['2', [65,56,32,45,83,78,67,12,22,]], #  19.1893 %
        ['3', [58,14,26,70,28,33,81,13,77,20,29,54,61,]], #  21.1101 %
        ['4', [27,39,41,51,59,38,37,46,44,]], #  18.5605 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,PSU,15,15,0.2454,30.2113
1,PSU,75,75,1.2499,28.8261
2,PSU,57,57,0.5729,26.4922
3,PSU,40,40,1.5638,25.764
4,PSU,52,52,0.9094,24.2102
5,PSU,64,64,0.817,23.0875
6,PSU,76,76,0.8271,23.0261
7,PSU,80,80,1.7029,22.8952
8,PSU,34,34,0.6524,22.5134
9,PSU,68,68,1.4294,22.2255



    feature = 'PVH_INVL'
    A = [
        ['0', [10,12,7,8,4,5,9,3,6,2,]], #  0.4125 %
        ['1', [0,]], #  97.5149 %
        ['2', [1,]], #  2.0727 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,PVH_INVL,10,10,0.0001,100.0
1,PVH_INVL,12,12,0.0003,100.0
2,PVH_INVL,7,7,0.0013,77.7778
3,PVH_INVL,8,8,0.0003,50.0
4,PVH_INVL,4,4,0.0255,37.9121
5,PVH_INVL,5,5,0.007,36.0
6,PVH_INVL,9,9,0.0004,33.3333
7,PVH_INVL,3,3,0.0816,32.3024
8,PVH_INVL,6,6,0.0034,29.1667
9,PVH_INVL,2,2,0.2926,27.2989



    feature = 'REGION'
    A = [
        ['0', []], #  0.0 %
        ['1', [1,]], #  10.9083 %
        ['2', []], #  0.0 %
        ['3', [3,]], #  55.2993 %
        ['4', []], #  0.0 %
        ['5', [2,]], #  17.8327 %
        ['6', []], #  0.0 %
        ['7', [4,]], #  15.9597 %
        ['8', []], #  0.0 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,REGION,1,1,10.9083,18.9098
1,REGION,3,3,55.2993,15.1991
2,REGION,2,2,17.8327,14.3436
3,REGION,4,4,15.9597,13.1802



    feature = 'RELJCT1'
    A = [
        ['0', []], #  0.0 %
        ['1', [0,]], #  74.3341 %
        ['2', []], #  0.0 %
        ['3', [8,]], #  21.458 %
        ['4', [1,9,]], #  4.2079 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,RELJCT1,0,0,74.3341,15.2418
1,RELJCT1,8,8,21.458,14.8344
2,RELJCT1,1,1,4.1716,14.7008
3,RELJCT1,9,9,0.0363,7.722



Remove_Unknowns_in_Feature  RELJCT2 [98, 99] 42273  unknown
    feature = 'RELJCT2'
    A = [
        ['0', [19,6,5,]], #  1.2531 %
        ['1', [2,]], #  28.3024 %
        ['2', [16,]], #  0.0027 %
        ['3', [1,]], #  36.8757 %
        ['4', [7,8,18,17,]], #  9.2455 %
        ['5', [3,]], #  22.6171 %
        ['6', [4,20,]], #  1.7036 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,RELJCT2,19,19,0.2395,34.3284
1,RELJCT2,6,6,0.0705,20.296
2,RELJCT2,5,5,0.9431,19.444
3,RELJCT2,2,2,28.3024,18.7445
4,RELJCT2,16,16,0.0027,16.6667
5,RELJCT2,1,1,36.8757,16.6403
6,RELJCT2,7,7,0.1861,14.7318
7,RELJCT2,8,8,7.4513,12.9268
8,RELJCT2,18,18,1.529,11.7498
9,RELJCT2,17,17,0.0791,10.1695



Remove_Unknowns_in_Feature  REL_ROAD [98, 99] 222  unknown
    feature = 'REL_ROAD'
    A = [
        ['0', [10,8,4,6,3,12,5,2,]], #  9.9969 %
        ['1', [1,]], #  88.4404 %
        ['2', [11,7,]], #  1.5626 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,REL_ROAD,10,10,0.0182,42.3077
1,REL_ROAD,8,8,0.1036,38.1597
2,REL_ROAD,4,4,7.4926,35.6646
3,REL_ROAD,6,6,0.1333,32.387
4,REL_ROAD,3,3,1.4766,31.1023
5,REL_ROAD,12,12,0.0199,30.2817
6,REL_ROAD,5,5,0.4449,29.3006
7,REL_ROAD,2,2,0.3078,23.2696
8,REL_ROAD,1,1,88.4404,13.0393
9,REL_ROAD,11,11,0.1086,12.9032



    feature = 'SCH_BUS'
    A = [
        ['0', []], #  0.0 %
        ['1', [0,]], #  99.5024 %
        ['2', [1,]], #  0.4976 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,SCH_BUS,0,0,99.5024,15.1521
1,SCH_BUS,1,1,0.4976,10.5322



    feature = 'SEX'
    A = [
        ['0', []], #  0.0 %
        ['1', [2,]], #  43.8701 %
        ['2', []], #  0.0 %
        ['3', [1,]], #  52.3111 %
        ['4', [8,9,]], #  3.8188 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,SEX,2,2,43.8701,16.7236
1,SEX,1,1,52.3111,14.7688
2,SEX,8,8,2.423,2.3771
3,SEX,9,9,1.3958,0.6526



Remove_Unknowns_in_Feature  TYP_INT [98, 99] 74693  unknown
    feature = 'TYP_INT'
    A = [
        ['0', [11,]], #  0.0045 %
        ['1', [1,]], #  57.3325 %
        ['2', [10,]], #  0.0388 %
        ['3', [3,]], #  11.9265 %
        ['4', [4,]], #  0.3267 %
        ['5', [2,]], #  29.8771 %
        ['6', [7,6,5,]], #  0.4939 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,TYP_INT,11,11,0.0045,17.2414
1,TYP_INT,1,1,57.3325,15.8711
2,TYP_INT,10,10,0.0388,15.7258
3,TYP_INT,3,3,11.9265,15.7044
4,TYP_INT,4,4,0.3267,15.6684
5,TYP_INT,2,2,29.8771,15.3884
6,TYP_INT,7,7,0.2212,15.2866
7,TYP_INT,6,6,0.2077,8.2894
8,TYP_INT,5,5,0.065,6.988



    feature = 'URBANICITY'
    A = [
        ['0', []], #  0.0 %
        ['1', [2,]], #  22.6908 %
        ['2', []], #  0.0 %
        ['3', [1,]], #  77.3092 %
        ['4', []], #  0.0 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,URBANICITY,2,2,22.6908,16.573
1,URBANICITY,1,1,77.3092,14.7053



Remove_Unknowns_in_Feature  VALIGN [8, 9] 43563  unknown
    feature = 'VALIGN'
    A = [
        ['0', [3,2,4,]], #  8.513 %
        ['1', [1,]], #  88.9692 %
        ['2', [0,]], #  2.5179 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VALIGN,3,3,3.4009,29.1451
1,VALIGN,2,2,3.7663,21.471
2,VALIGN,4,4,1.3458,17.9772
3,VALIGN,1,1,88.9692,14.5658
4,VALIGN,0,0,2.5179,8.5418



    feature = 'VE_FORMS'
    A = [
        ['0', [10,]], #  0.0059 %
        ['1', [1,]], #  13.723 %
        ['2', [9,8,13,15,7,6,5,4,]], #  3.651 %
        ['3', [3,]], #  11.0684 %
        ['4', []], #  0.0 %
        ['5', [2,]], #  71.5442 %
        ['6', [11,12,]], #  0.0076 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VE_FORMS,10,10,0.0059,38.0952
1,VE_FORMS,1,1,13.723,28.4193
2,VE_FORMS,9,9,0.0171,24.5902
3,VE_FORMS,8,8,0.0303,21.7593
4,VE_FORMS,13,13,0.0021,20.0
5,VE_FORMS,15,15,0.0063,20.0
6,VE_FORMS,7,7,0.067,18.8285
7,VE_FORMS,6,6,0.1857,18.6415
8,VE_FORMS,5,5,0.6577,16.8123
9,VE_FORMS,4,4,2.6848,15.2991



    feature = 'VE_TOTAL'
    A = [
        ['0', [10,]], #  0.0071 %
        ['1', [1,]], #  11.763 %
        ['2', [13,8,9,6,7,15,5,4,]], #  3.8895 %
        ['3', [3,]], #  11.5525 %
        ['4', [11,]], #  0.0062 %
        ['5', [2,]], #  72.7797 %
        ['6', [12,]], #  0.002 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VE_TOTAL,10,10,0.0071,37.2549
1,VE_TOTAL,1,1,11.763,30.9196
2,VE_TOTAL,13,13,0.0024,29.4118
3,VE_TOTAL,8,8,0.0359,24.6094
4,VE_TOTAL,9,9,0.0164,23.9316
5,VE_TOTAL,6,6,0.2063,21.3315
6,VE_TOTAL,7,7,0.0769,20.9472
7,VE_TOTAL,15,15,0.0063,20.0
8,VE_TOTAL,5,5,0.7083,17.966
9,VE_TOTAL,4,4,2.837,16.1381



Remove_Unknowns_in_Feature  VNUM_LAN [8, 9] 192305  unknown
    feature = 'VNUM_LAN'
    A = [
        ['0', []], #  0.0 %
        ['1', [2,]], #  44.5157 %
        ['2', []], #  0.0 %
        ['3', [4,]], #  14.9317 %
        ['4', []], #  0.0 %
        ['5', [3,]], #  18.4957 %
        ['6', []], #  0.0 %
        ['7', [5,]], #  11.2134 %
        ['8', [7,1,6,0,]], #  10.8435 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VNUM_LAN,2,2,44.5157,19.1318
1,VNUM_LAN,4,4,14.9317,14.2742
2,VNUM_LAN,3,3,18.4957,13.8262
3,VNUM_LAN,5,5,11.2134,13.7448
4,VNUM_LAN,7,7,1.435,12.5936
5,VNUM_LAN,1,1,2.4262,11.6391
6,VNUM_LAN,6,6,3.7459,11.2414
7,VNUM_LAN,0,0,3.2364,8.5418



Remove_Unknowns_in_Feature  VPROFILE [8, 9] 96835  unknown
    feature = 'VPROFILE'
    A = [
        ['0', [6,4,5,3,2,]], #  14.2509 %
        ['1', [1,]], #  83.0137 %
        ['2', [0,]], #  2.7354 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VPROFILE,6,6,2.6245,22.989
1,VPROFILE,4,4,0.1741,21.6946
2,VPROFILE,5,5,1.9676,21.3597
3,VPROFILE,3,3,1.2633,20.7547
4,VPROFILE,2,2,8.2214,15.056
5,VPROFILE,1,1,83.0137,15.0114
6,VPROFILE,0,0,2.7354,8.5418



Remove_Unknowns_in_Feature  VSPD_LIM [98, 99] 92357  unknown
    feature = 'VSPD_LIM'
    A = [
        ['0', [90,]], #  0.0002 %
        ['1', [55,]], #  11.1874 %
        ['2', [75,80,50,]], #  9.3987 %
        ['3', [70,65,]], #  5.8534 %
        ['4', [45,]], #  21.9134 %
        ['5', [60,]], #  2.0159 %
        ['6', [40,]], #  11.3393 %
        ['7', []], #  0.0 %
        ['8', [35,]], #  18.9864 %
        ['9', [25,30,20,0,15,5,10,]], #  19.3055 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VSPD_LIM,90,90,0.0002,100.0
1,VSPD_LIM,55,55,11.1874,21.7563
2,VSPD_LIM,75,75,0.3403,19.9149
3,VSPD_LIM,80,80,0.1354,19.7384
4,VSPD_LIM,50,50,3.6373,18.3403
5,VSPD_LIM,70,70,5.2857,17.661
6,VSPD_LIM,65,65,5.8534,16.4347
7,VSPD_LIM,45,45,21.9134,15.1027
8,VSPD_LIM,60,60,2.0159,14.6131
9,VSPD_LIM,40,40,11.3393,13.8911



Remove_Unknowns_in_Feature  VSURCOND [98, 99] 25039  unknown
    feature = 'VSURCOND'
    A = [
        ['0', [5,11,7,8,6,]], #  0.2858 %
        ['1', [1,]], #  81.8341 %
        ['2', []], #  0.0 %
        ['3', [2,]], #  13.44 %
        ['4', [10,4,3,0,]], #  4.4401 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VSURCOND,5,5,0.0084,37.931
1,VSURCOND,11,11,0.1057,25.8242
2,VSURCOND,7,7,0.0062,23.2558
3,VSURCOND,8,8,0.0298,20.4878
4,VSURCOND,6,6,0.1357,15.8458
5,VSURCOND,1,1,81.8341,15.4078
6,VSURCOND,2,2,13.44,15.0684
7,VSURCOND,10,10,0.1728,14.0336
8,VSURCOND,4,4,0.792,13.6439
9,VSURCOND,3,3,1.0251,11.193



Remove_Unknowns_in_Feature  VTRAFCON [97, 99] 56040  unknown
    feature = 'VTRAFCON'
    A = [
        ['0', [29,40,28,4,98,9,]], #  1.7918 %
        ['1', [0,]], #  62.0091 %
        ['2', [65,2,20,]], #  8.5536 %
        ['3', [3,]], #  25.6994 %
        ['4', [1,7,50,23,8,21,]], #  1.946 %
        ['Unknowns', [97, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VTRAFCON,29,29,0.0038,28.0
1,VTRAFCON,40,40,0.9175,22.5427
2,VTRAFCON,28,28,0.1253,20.6311
3,VTRAFCON,4,4,0.3157,18.5453
4,VTRAFCON,98,98,0.4205,17.2514
5,VTRAFCON,9,9,0.009,16.9492
6,VTRAFCON,0,0,62.0091,15.9617
7,VTRAFCON,65,65,0.1274,15.8711
8,VTRAFCON,2,2,0.6305,13.9894
9,VTRAFCON,20,20,7.7957,13.6756



Remove_Unknowns_in_Feature  VTRAFWAY [8, 9] 119954  unknown
    feature = 'VTRAFWAY'
    A = [
        ['0', []], #  0.0 %
        ['1', [1,]], #  44.9086 %
        ['2', [5,]], #  5.7073 %
        ['3', [2,]], #  17.2146 %
        ['4', []], #  0.0 %
        ['5', [3,]], #  23.9544 %
        ['6', [6,4,0,]], #  8.2151 %
        ['Unknowns', [8, 9, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,VTRAFWAY,1,1,44.9086,17.7776
1,VTRAFWAY,5,5,5.7073,14.9798
2,VTRAFWAY,2,2,17.2146,14.9323
3,VTRAFWAY,3,3,23.9544,14.1108
4,VTRAFWAY,6,6,2.7142,11.5876
5,VTRAFWAY,4,4,2.659,10.8211
6,VTRAFWAY,0,0,2.8419,8.5418



Remove_Unknowns_in_Feature  WEATHER [98, 99] 30047  unknown
    feature = 'WEATHER'
    A = [
        ['0', [5,3,12,6,]], #  0.5666 %
        ['1', [1,]], #  73.5658 %
        ['2', []], #  9.1569 %
        ['3', [2,]], #  0.0 %
        ['4', [10,]], #  15.0742 %
        ['5', [8,11,4,7,]], #  1.6367 %
        ['Unknowns', [98, 99, ]]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,WEATHER,5,5,0.3586,22.4398
1,WEATHER,3,3,0.1159,17.5505
2,WEATHER,12,12,0.0309,17.5355
3,WEATHER,6,6,0.0612,17.2249
4,WEATHER,1,1,73.5658,15.3569
5,WEATHER,2,2,9.1569,14.9307
6,WEATHER,10,10,15.0742,14.9231
7,WEATHER,8,8,0.0541,14.5946
8,WEATHER,11,11,0.0486,12.3494
9,WEATHER,4,4,1.5173,11.9757



    feature = 'WRK_ZONE'
    A = [
        ['0', [3,]], #  0.0252 %
        ['1', [0,]], #  98.0404 %
        ['2', [2,1,4,]], #  1.9344 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,WRK_ZONE,3,3,0.0252,18.3333
1,WRK_ZONE,0,0,98.0404,15.1916
2,WRK_ZONE,2,2,0.1145,14.0759
3,WRK_ZONE,1,1,1.0009,12.0274
4,WRK_ZONE,4,4,0.819,11.4819



    feature = 'YEAR'
    A = [
        ['0', []], #  0.0 %
        ['1', [2020,]], #  16.8736 %
        ['2', []], #  0.0 %
        ['3', [2021,]], #  17.1339 %
        ['4', []], #  0.0 %
        ['5', [2017,]], #  17.8534 %
        ['6', []], #  0.0 %
        ['7', [2016,]], #  15.209 %
        ['8', []], #  0.0 %
        ['9', [2019,]], #  17.3716 %
        ['10', []], #  0.0 %
        ['11', [2018,]], #  15.5585 %
        ['12', []], #  0.0 %
        ['Unknowns', []]
    ]
    data = Build_Individual_Feature_with_Dict(df_Per, data, feature, A)



Unnamed: 0,Feature,Code,Name,Per,Corr
0,YEAR,2020,2020,16.8736,16.7586
1,YEAR,2021,2021,17.1339,15.7212
2,YEAR,2017,2017,17.8534,15.2705
3,YEAR,2016,2016,15.209,14.8269
4,YEAR,2019,2019,17.3716,14.0757
5,YEAR,2018,2018,15.5585,14.0191



       Feature  Code  Name      Per   Corr
0          AGE   106   106   0.0001  100.0
1     BODY_TYP    86    86   0.0003  100.0
2     HOSPITAL     0     0  15.1291  100.0
3     HOSPITAL     1     1  15.1291  100.0
4     MOD_YEAR  1929  1929   0.0001  100.0
...        ...   ...   ...      ...    ...
1306        PJ  4029  4029   0.0006    0.0
1307        PJ  2411  2411   0.0013    0.0
1308        PJ  3209  3209   0.0034    0.0
1309  VE_FORMS    12    12   0.0020    0.0
1310  VE_TOTAL    12    12   0.0020    0.0

[1311 rows x 5 columns]

         Feature  Code  Name      Per      Corr
2       HOSPITAL     0     0  15.1291  100.0000
3       HOSPITAL     1     1  15.1291  100.0000
22         MODEL   709   709   0.5265   65.9303
29      BODY_TYP    80    80   2.3416   62.2476
30          MAKE    72    72   0.9141   62.1368
32         MODEL   706   706   1.4322   61.7808
83      REL_ROAD     4     4   7.4926   35.6646
108         MAKE    98    98   0.6768   32.9325
117     REL_ROAD     3   