# Categorical Encoders 
This notebook aims to provide a comprehensive view of all the categorical encoders available in scikit learn. A brief explanation of the working of the encoders along with short and readable examples. The example chosen is intentionally  a small dataset since a smaller dataset brings out the difference between the encoders quite well.

First,
#### Initialising Dummy Datasets

Small Dummy datasets provide an easy understanding of the output of each encoder. The below dataset has a dummy probability associated with few countries. The dataset consists of a list of countries with a random probability associated with them along with their binary targets.

In [1]:
import numpy as np
import pandas as pd
import pickle
import category_encoders as ce
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import confusion_matrix, accuracy_score


In [2]:
#Training set
tr=pd.DataFrame([['BR',0.9,1],
                 ['FR',0.6,0],
                 ['BR',0.7,1],
                 ['JP',0.2,0],
                 ['MA',0.4,0]], columns=['Country','prob','Target'])
#Test set
ts=pd.DataFrame([['BR',0.5,1],
                 ['FR',0.2,0],
                 ['BR',0.4,1],
                 ['ZA',0.3,0]], columns=['Country','prob','Target'])


In [3]:
def train_classifier(xtr,ytr,xts,yts):
    decisiontree = DecisionTreeClassifier()
    decisiontree.fit(xtr,ytr)
    y_pred=decisiontree.predict(xts)
    print("Target")
    print(yts.values)
    print("Predicted:", y_pred)
    print('Confusion Matrix \n  0 1')
    print(confusion_matrix(y_pred,yts))
    print("Accuracy Score")
    print(accuracy_score(y_pred,yts))
    



### Terminologies

 **Category** : A categorical column. Example, Destination.
 
 **Level** : Is a value in the categorical column. Example, US is a level in the Destination column.
 
 **Dummy encoding** - substitues values or numbers according to the number of levels
 
 **Contrast encoding** - compares levels against a reference level
 

## Major Encoder groups

### Classic Encoders 
Simple and easily understandable form of conversions

[Label Encoder](#Label-Encoder) - Ordinal 

[Count Encoder](#Count-Encoder) - Nominal, Ordinal

[Binary Encoder](#Binary-Encoder)  - Nominal, Ordinal

[Base N Encoder](#Base-N-Encoder) - Nominal, Ordinal

### Contrast Encoders
Encode against a reference level

[One Hot Encoder](#One-Hot-Encoder) - Nominal , Ordinal

[Sum Encoder](#Sum-Encoder) Nominal , Ordinal

[Helmert Encoder](#Helmert-Encoder) -Nominal , Ordinal

[Backward Difference Encoder](#Backward-Difference-Encoder) - Nominal , Ordinal

[Polynomial Encoder](#Polynomial-Encoder)  - Nominal , Ordinal

### Bayesian Encoders 
Encode target information

[Target Encoder](#Target-Encoder) -Nominal , Ordinal

[Leave One Out Encoder](#Leave-One-Out-Encoder) - Nominal , Ordinal

[CatBoost Encoder](#CatBoost-Encoder) - Nominal , Ordinal

[James Stein Encoder](#James-Stein-Encoder) - Nominal , Ordinal

[M Estimate](#M-Estimate) - Nominal , Ordinal

[Weight Of Evidence](#Weight-of-Evidence) - Nominal , Ordinal

### Label Encoder
Simply converts the levels to a list of numbers starting from 0 to n-1

Advantages : Quick to implement , easy to understand

Disadvantages : Infuses an unintended order in the dataset

### Count Encoder
Simply counts the number of occurences of the level

Advantages : Quick to implement , easy to understand

Disadvantages : Might skew the distribution if the level distribution is extreme. Special care is needed when new values are encountered in the test set.

In [4]:
counten= ce.CountEncoder(cols=['Country'])
counten.fit(tr.iloc[:,:-1],tr.iloc[:,-1:].values.ravel())
tr1=counten.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)

ts1=counten.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
#train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country  prob
0        2   0.9
1        1   0.6
2        2   0.7
3        1   0.2
4        1   0.4
Transformed Test data
   Country  prob
0      2.0   0.5
1      1.0   0.2
2      2.0   0.4
3      NaN   0.3


### One Hot Encoder

Converts each level to a binary number. Produces one feature per category.

Not suitable for high cardinal categorical features. Produces too many columns if the number of levels are high.
Tree based models are hit hard if there are lot of levels.

In [5]:
OHE = ce.OneHotEncoder(cols=['Country'])
tr1=OHE.fit_transform(tr)
xtr=tr1.iloc[:,:-1]
ytr=tr1.iloc[:,-1:]

print("Transformed Training data")
print(tr1)
ts1=OHE.transform(ts)
xts=ts1.iloc[:,:-1]
yts=ts1.iloc[:,-1:]

print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country_1  Country_2  Country_3  Country_4  prob  Target
0          1          0          0          0   0.9       1
1          0          1          0          0   0.6       0
2          1          0          0          0   0.7       1
3          0          0          1          0   0.2       0
4          0          0          0          1   0.4       0
Transformed Test data
   Country_1  Country_2  Country_3  Country_4  prob  Target
0          1          0          0          0   0.5       1
1          0          1          0          0   0.2       0
2          1          0          0          0   0.4       1
3          0          0          0          0   0.3       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [0 0 0 0]
Confusion Matrix 
  0 1
[[2 2]
 [0 0]]
Accuracy Score
0.5


### Sum Encoder 

Similar to One Hot Encoding except that it is a contrast type encoder where one value is held as a reference and is encoded as -1

Advantages : Lesser columns than OHE due to the reference value

Disadvantages : Still produces many columns. Does not encode unseen values.. Notice the 'NaN'

In [6]:

SumEn= ce.SumEncoder(cols=['Country'])
SumEn.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=SumEn.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(pd.concat([tr1,tr.iloc[:,-1:]],axis=1))


ts1=SumEn.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(pd.concat([ts1,tr.iloc[:,-1:]],axis=1))
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   intercept  Country_0  Country_1  Country_2  prob  Target
0          1        1.0        0.0        0.0   0.9       1
1          1        0.0        1.0        0.0   0.6       0
2          1        1.0        0.0        0.0   0.7       1
3          1        0.0        0.0        1.0   0.2       0
4          1       -1.0       -1.0       -1.0   0.4       0
Transformed Test data
   intercept  Country_0  Country_1  Country_2  prob  Target
0        1.0        1.0        0.0        0.0   0.5       1
1        1.0        0.0        1.0        0.0   0.2       0
2        1.0        1.0        0.0        0.0   0.4       1
3        1.0        0.0        0.0        0.0   0.3       0
4        NaN        NaN        NaN        NaN   NaN       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### Binary Encoder
Similar to one hot encoder but converts number to direct binary digits. 

Advantages: Better than OHE in that this will produce only 7 columns for 100 levels whereas OHE produces 100 columns for the same 100 levels. Helpful for high cardinal categories

Disadvantages : Not useful for lower cardinal categories since number of columns produced is almost equal to number of levels

In [7]:
binary = ce.BinaryEncoder(cols=['Country'])
tr1=binary.fit_transform(tr)
xtr=tr1.iloc[:,:-1]
ytr=tr1.iloc[:,-1:]

print("Transformed Training data")
print(tr1)
ts1=binary.transform(ts)
xts=ts1.iloc[:,:-1]
yts=ts1.iloc[:,-1:]

print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country_0  Country_1  Country_2  prob  Target
0          0          0          1   0.9       1
1          0          1          0   0.6       0
2          0          0          1   0.7       1
3          0          1          1   0.2       0
4          1          0          0   0.4       0
Transformed Test data
   Country_0  Country_1  Country_2  prob  Target
0          0          0          1   0.5       1
1          0          1          0   0.2       0
2          0          0          1   0.4       1
3          0          0          0   0.3       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [0 0 0 0]
Confusion Matrix 
  0 1
[[2 2]
 [0 0]]
Accuracy Score
0.5


### Base N Encoder
Base-N encoder encodes the categories into arrays of their base-N representation.
A base of 1 is equivalent to one-hot encoding, a base of 2 is equivalent to binary encoding.
N=number of actual categories is equivalent to vanilla ordinal encoding.

Advantages :
Number of columns can be controlled using N

Disadvantages:
Another parameter to control. Might take multiple runs to decide the best N for each category.


In [8]:
ce_basen = ce.BaseNEncoder(cols=['Country'],base=4)
tr1=ce_basen.fit_transform(tr)
xtr=tr1.iloc[:,:-1]
ytr=tr1.iloc[:,-1:]
print("Transformed Training data")
print(tr1)

ts1=ce_basen.transform(ts)
xts=ts1.iloc[:,:-1]
yts=ts1.iloc[:,-1:]
print("Transformed Test data")
print(ts1)

train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country_0  Country_1  prob  Target
0          0          1   0.9       1
1          0          2   0.6       0
2          0          1   0.7       1
3          0          3   0.2       0
4          1          0   0.4       0
Transformed Test data
   Country_0  Country_1  prob  Target
0          0          1   0.5       1
1          0          2   0.2       0
2          0          1   0.4       1
3          0          0   0.3       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [0 0 0 0]
Confusion Matrix 
  0 1
[[2 2]
 [0 0]]
Accuracy Score
0.5


### Helmert Encoder 

Compares each level of a categorical variable to the mean of the subsequent levels.

In [9]:
hashen= ce.HelmertEncoder(cols=['Country'])
hashen.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=hashen.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)


ts1=hashen.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   intercept  Country_0  Country_1  Country_2  prob
0          1       -1.0       -1.0       -1.0   0.9
1          1        1.0       -1.0       -1.0   0.6
2          1       -1.0       -1.0       -1.0   0.7
3          1        0.0        2.0       -1.0   0.2
4          1        0.0        0.0        3.0   0.4
Transformed Test data
   intercept  Country_0  Country_1  Country_2  prob
0          1       -1.0       -1.0       -1.0   0.5
1          1        1.0       -1.0       -1.0   0.2
2          1       -1.0       -1.0       -1.0   0.4
3          1        0.0        0.0        0.0   0.3
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### Target encoder 

Encodes mean value of the target for each level i.e., features are replaced with a blend of posterior probability of the target given particular categorical value and the prior probability of the target over all the training data.

Similar to encoding probability of the targets based on each level except that sklearn takes prior probability into account. This prior probability is used for smoothing which provides reasonable weights to less occuring levels and also does not skew the weights towards high occuring levels.

Advantages : Quick and does not add extra dimensionality

Disadvantages : Prone to overfitting. Does not encode unseen values. Notice the 'NaN'. Dependent on the target distribution of the training dataset

2 more hyperparameters to tune : 

min_samples_leaf : minimum samples to take category average into account

smoothing : smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0

In [10]:

TgtEn= ce.TargetEncoder(cols=['Country'])
TgtEn.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=TgtEn.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(pd.concat([tr1,tr.iloc[:,-1:]],axis=1))


ts1=TgtEn.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(pd.concat([ts1,tr.iloc[:,-1:]],axis=1))
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
    Country  prob  Target
0  0.838635   0.9       1
1  0.400000   0.6       0
2  0.838635   0.7       1
3  0.400000   0.2       0
4  0.400000   0.4       0
Transformed Test data
    Country  prob  Target
0  0.838635   0.5       1
1  0.400000   0.2       0
2  0.838635   0.4       1
3  0.400000   0.3       0
4       NaN   NaN       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### Leave One Out Encoding

This is very similar to target encoding but excludes the current row’s target when calculating the mean target value for a level.

Advantages : Reduces the effect of outliers.

Disadvantages : Prone to overfitting. Induces higher variance in the encoded values. Does not encode unseen values. Notice 'NaN'


In [11]:
#Target dependent
leaveout= ce.LeaveOneOutEncoder(cols=['Country'])
leaveout.fit(tr.iloc[:,:-1],tr.iloc[:,-1:].values.ravel())
tr1=leaveout.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(pd.concat([tr1,tr.iloc[:,-1:]],axis=1))


ts1=leaveout.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(pd.concat([ts1,tr.iloc[:,-1:]],axis=1))
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country  prob  Target
0      1.0   0.9       1
1      0.4   0.6       0
2      1.0   0.7       1
3      0.4   0.2       0
4      0.4   0.4       0
Transformed Test data
   Country  prob  Target
0      1.0   0.5       1
1      0.4   0.2       0
2      1.0   0.4       1
3      0.4   0.3       0
4      NaN   NaN       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### CatBoost Encoder

Similar to leave one out Encoder but calculates values 'on-the-fly'. Introduces a concept of "time" in a way that it calculates values on the fly. Precaution should be taken to introduce randomisation if dataset is ordered according to target. Runs many iterations to avoid overfitting.

Advantages : Prevents overfitting

Disadvantages : Might take longer since it runs on many iterations. Does not encode unseen values. Notice the 'NaN'

In [12]:
#Target dependent
catboost= ce.CatBoostEncoder(cols=['Country'])
catboost.fit(tr.iloc[:,:-1],tr.iloc[:,-1:].values.ravel())
tr1=catboost.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(pd.concat([tr1,tr.iloc[:,-1:]],axis=1))


ts1=catboost.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(pd.concat([ts1,tr.iloc[:,-1:]],axis=1))
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country  prob  Target
0      0.4   0.9       1
1      0.4   0.6       0
2      0.7   0.7       1
3      0.4   0.2       0
4      0.4   0.4       0
Transformed Test data
   Country  prob  Target
0      0.4   0.5       1
1      0.4   0.2       0
2      0.7   0.4       1
3      0.4   0.3       0
4      NaN   NaN       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [0 0 1 0]
Confusion Matrix 
  0 1
[[2 1]
 [0 1]]
Accuracy Score
0.75


### Backward Difference Encoder
The mean of the target for a level is compared with the mean of the target for the prior level. 

Produces n columns given n is the number of levels in the category

Advantages : encodes target information which will improve model's accuracy

Disadvantages : Produces many columns 

Refer [here](https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/#backward) for detailed workedout example

In [13]:
BD_en=ce.BackwardDifferenceEncoder(cols=['Country'])
tr1=BD_en.fit_transform(tr)
xtr=tr1.iloc[:,:-1]
ytr=tr1.iloc[:,-1:]
print("Transformed Training data")
print(tr1)
ts1=BD_en.transform(ts)
xts=ts1.iloc[:,:-1]
yts=ts1.iloc[:,-1:]
print("Transformed Test data")
print(ts1)

train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   intercept  Country_0  Country_1  Country_2  prob  Target
0          1      -0.75       -0.5      -0.25   0.9       1
1          1       0.25       -0.5      -0.25   0.6       0
2          1      -0.75       -0.5      -0.25   0.7       1
3          1       0.25        0.5      -0.25   0.2       0
4          1       0.25        0.5       0.75   0.4       0
Transformed Test data
   intercept  Country_0  Country_1  Country_2  prob  Target
0          1      -0.75       -0.5      -0.25   0.5       1
1          1       0.25       -0.5      -0.25   0.2       0
2          1      -0.75       -0.5      -0.25   0.4       1
3          1       0.00        0.0       0.00   0.3       0
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [0 0 0 0]
Confusion Matrix 
  0 1
[[2 2]
 [0 0]]
Accuracy Score
0.5


### James-Stein Encoder


Returns the weighted average of mean target value for that level and the mean target value.
Might not be effective as it works only for Normal distributions.
Requires target normalisation for binary targets which is similar to WOE 

More info [here](http://contrib.scikit-learn.org/category_encoders/jamesstein.html)

In [14]:

JSen= ce.JamesSteinEncoder(cols=['Country'])
JSen.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=JSen.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)

ts1=JSen.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country  prob
0      1.0   0.9
1      0.0   0.6
2      1.0   0.7
3      0.0   0.2
4      0.0   0.4
Transformed Test data
   Country  prob
0      1.0   0.5
1      0.0   0.2
2      1.0   0.4
3      0.4   0.3
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### M-Estimate

It is a simplified version of target encoder in that it only has 1 hyper parameter to tune (m). Higher values of m results in stronger shrinkage (reduces the effect of the stronger level) 


In [15]:

MEst= ce.MEstimateEncoder(cols=['Country'])
MEst.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=MEst.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)

ts1=MEst.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   Country  prob
0      0.8   0.9
1      0.2   0.6
2      0.8   0.7
3      0.2   0.2
4      0.2   0.4
Transformed Test data
   Country  prob
0      0.8   0.5
1      0.2   0.2
2      0.8   0.4
3      0.4   0.3
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### Polynomial Encoder

Trend analysis in that it is looking for the linear, quadratic and cubic trends in the categorical variable.
Suitable for ordinal variables in which the levels are equally spaced, like intervals or buckets


In [16]:
PolEs= ce.PolynomialEncoder(cols=['Country'])
PolEs.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=PolEs.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)

ts1=PolEs.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
   intercept  Country_0  Country_1  Country_2  prob
0          1  -0.670820        0.5  -0.223607   0.9
1          1  -0.223607       -0.5   0.670820   0.6
2          1  -0.670820        0.5  -0.223607   0.7
3          1   0.223607       -0.5  -0.670820   0.2
4          1   0.670820        0.5   0.223607   0.4
Transformed Test data
   intercept  Country_0  Country_1  Country_2  prob
0          1  -0.670820        0.5  -0.223607   0.5
1          1  -0.223607       -0.5   0.670820   0.2
2          1  -0.670820        0.5  -0.223607   0.4
3          1   0.000000        0.0   0.000000   0.3
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


### Weight of Evidence 
Target based encoder. It is the measure of the “strength” of a grouping technique to separate good and bad.
Supports only binomial targets 

Weight of evidence is calculated as follows. 

WoE = ln(a / b)

a = Distribution of Good Credit Outcomes

b = Distribution of Bad Credit Outcomes

In [18]:
WOE= ce.WOEEncoder(cols=['Country'])
WOE.fit(tr.iloc[:,:-1],tr.iloc[:,-1:])
tr1=WOE.transform(tr.iloc[:,:-1],tr.iloc[:,-1:])
xtr=tr1.iloc[:,:-1]
ytr=tr.iloc[:,-1:]
print("Transformed Training data")
print(tr1)


ts1=WOE.transform(ts.iloc[:,:-1],ts.iloc[:,-1:])
xts=ts1.iloc[:,:-1]
yts=ts.iloc[:,-1:]
print("Transformed Test data")
print(ts1)
train_classifier(xtr,ytr,xts,yts)

Transformed Training data
    Country  prob
0  1.321756   0.9
1  0.000000   0.6
2  1.321756   0.7
3  0.000000   0.2
4  0.000000   0.4
Transformed Test data
    Country  prob
0  1.321756   0.5
1  0.000000   0.2
2  1.321756   0.4
3  0.000000   0.3
Target
[[1]
 [0]
 [1]
 [0]]
Predicted: [1 0 1 0]
Confusion Matrix 
  0 1
[[2 0]
 [0 2]]
Accuracy Score
1.0


GLMM Encoders and Hashing encoder had trouble loading. Hashing encoder required re-installation of older version of Python

### References:
1. Scikit learn - http://contrib.scikit-learn.org/category_encoders/count.html
2. Worked out example for few of the encoding techniques : https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/
3. Source code : https://github.com/scikit-learn-contrib/category_encoders/tree/master/category_encoders
