<a href="https://colab.research.google.com/github/Dushyanttara/colab_research/blob/master/Ensemble_techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
!pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/b1/61/2b8106c8870601671d99ca94d8b8d180f2b740b7cdb95c930147508abcf9/catboost-0.23-cp36-none-manylinux1_x86_64.whl (64.7MB)
[K     |████████████████████████████████| 64.8MB 64kB/s 
Installing collected packages: catboost
Successfully installed catboost-0.23


In [0]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer, load_boston
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import BaggingRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingRegressor
import xgboost as xgb
import lightgbm as lgb
from catboost import CatBoostClassifier
from catboost import CatBoostRegressor

## Simple Ensemble techniques

### Max Voting
You can consider this as taking the mode of all the predictions.

Alternatively you can use "Voting Classifier" module in sklearn:

In [0]:
#Load dataset for breast cancer detection
X, y = load_breast_cancer(return_X_y=True)
x_train,x_test, y_train, y_test = train_test_split(X,y, random_state = 1, test_size = 0.2)

In [0]:
scaler = StandardScaler().fit(x_train)

In [0]:
print('scaler mean:', scaler.mean_, '\n\n','scaler scale :', scaler.scale_ )

scaler mean: [1.42134923e+01 1.93543736e+01 9.25727912e+01 6.64583077e+02
 9.63717582e-02 1.05058879e-01 8.96509136e-02 4.95899780e-02
 1.81131209e-01 6.27204835e-02 4.12704615e-01 1.22476747e+00
 2.91436725e+00 4.17422967e+01 7.01670989e-03 2.58046242e-02
 3.18617881e-02 1.18816593e-02 2.05152879e-02 3.82098967e-03
 1.64117868e+01 2.57051648e+01 1.08253319e+02 9.00190549e+02
 1.32138198e-01 2.56131407e-01 2.72104193e-01 1.15820371e-01
 2.88476484e-01 8.36364615e-02] 

 scaler scale : [3.61393415e+00 4.39478887e+00 2.49663565e+01 3.62204367e+02
 1.37309337e-02 5.19197492e-02 8.01752926e-02 3.93686243e-02
 2.72271211e-02 6.71035406e-03 2.87068914e-01 5.74298327e-01
 2.09734645e+00 4.85007708e+01 3.05760659e-03 1.80992367e-02
 2.91334637e-02 6.28200536e-03 8.26851979e-03 2.76617334e-03
 5.00827710e+00 6.28235900e+00 3.48114950e+01 5.94523660e+02
 2.21654518e-02 1.54650615e-01 2.04049117e-01 6.69564801e-02
 5.87803649e-02 1.66278736e-02]


In [0]:
#now use the same mean and scale to transform both train and test datasets
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

In [0]:

model1 = LogisticRegression(random_state =1)
model2 = DecisionTreeClassifier(random_state =1)
model = VotingClassifier(estimators= [('lr', model1), ('dt', model2)], voting = 'hard')
model.fit(x_train, y_train)
print('model score for max voting :\n', model.score(x_test, y_test))

model score for max voting :
 0.9649122807017544


In [0]:
y_pred = model.predict(x_test)
print(confusion_matrix(y_test, y_pred))

[[40  2]
 [ 2 70]]


### Averaging
Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.

In [0]:
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(x_train, y_train)
model2.fit(x_train, y_train)
model3.fit(x_train, y_train)

pred1 = model1.predict_proba(x_test)
pred2 = model2.predict_proba(x_test)
pred3 = model3.predict_proba(x_test)


final_pred = (pred1 + pred2 + pred3) / 3

In [0]:
final_pred1 = (final_pred > 0.5).astype(int)

In [0]:
confusion_matrix(y_test, final_pred1[:,1])

array([[39,  3],
       [ 1, 71]])

### Weighted Average

In [0]:
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(x_train, y_train)
model2.fit(x_train, y_train)
model3.fit(x_train, y_train)

pred1 = model1.predict_proba(x_test)
pred2 = model2.predict_proba(x_test)
pred3 = model3.predict_proba(x_test)

final_pred = (pred1 * 0.3 + pred2 * 0.3 + pred3*0.4)

In [0]:
final_pred1 = (final_pred > 0.5).astype(int)
confusion_matrix(y_test, final_pred1[:,1])

array([[40,  2],
       [ 0, 72]])

## Advanced Ensemble Techniques

### Stacking

In [0]:

#First we'll define stacking function which does k fold cross validation on the dataset
def Stacking(model, train, y, test, n_fold):
  folds  = StratifiedKFold(n_splits=n_fold, random_state=1)
  test_pred = np.empty((0, 1), float)
  train_pred = np.empty((0,1), float)
  for train_indices, val_indices in folds.split(train, y):
    x_train, x_val = train[train_indices], train[val_indices]
    y_train, y_val = y[train_indices], y[val_indices]
    
    model.fit(X=x_train, y=y_train)
    train_pred = np.append(train_pred, model.predict(x_val))
  test_pred = np.append(test_pred, model.predict(test))
  return test_pred, train_pred

In [0]:
#Now we'll create two base models -decision tree and KNN
model1 = DecisionTreeClassifier(random_state=1)

test_pred1, train_pred1 = Stacking(model= model1, n_fold = 10, train = x_train, test = x_test, y= y_train)

train_pred1 = pd.DataFrame(train_pred1)

test_pred1 = pd.DataFrame(test_pred1)



In [0]:
model2 = KNeighborsClassifier()

test_pred2, train_pred2 = Stacking(model = model2, n_fold= 10, train = x_train, test= x_test, y = y_train)

train_pred2 = pd.DataFrame(train_pred2)

test_pred2 = pd.DataFrame(test_pred2)



In [0]:
#create a third model, logistic regression, on the predictions of the decision tree and knn models

df = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis =1)

model = LogisticRegression(random_state=1)
model.fit(df, y_train)
model.score(df_test, y_test)

0.956140350877193

In [0]:
y_pred = model.predict(df_test)
confusion_matrix(y_test, y_pred)

array([[37,  5],
       [ 0, 72]])

In [0]:
df_test.shape

In [0]:
y_test.shape

In [0]:
df.shape

In [0]:
test_pred2.shape

In [0]:
x_test.shape

In [0]:
x_train.shape

In [0]:
train_pred1.shape

### Blending

In [0]:
x_train,x_val, y_train, y_val = train_test_split(x_train, y_train, test_size = 0.2, random_state = 1)

In [0]:
model1 = DecisionTreeClassifier()
model1.fit(x_train, y_train)
val_pred1 = model1.predict(x_val)
test_pred1 = model1.predict(x_test)
val_pred1 = pd.DataFrame(val_pred1)
test_pred1 = pd.DataFrame(test_pred1)

model2 = KNeighborsClassifier()
model2.fit(x_train,y_train)
val_pred2=model2.predict(x_val)
test_pred2=model2.predict(x_test)
val_pred2=pd.DataFrame(val_pred2)
test_pred2=pd.DataFrame(test_pred2)



In [0]:
x_val = pd.DataFrame(x_val)
x_test = pd.DataFrame(x_test)

In [0]:
df_val=pd.concat([x_val, val_pred1,val_pred2],axis=1)
df_test=pd.concat([x_test, test_pred1,test_pred2],axis=1)

In [0]:
model = LogisticRegression()
model.fit(df_val,y_val)
model.score(df_test,y_test)

0.9649122807017544

In [0]:
y_pred = model.predict(df_test)
confusion_matrix(y_test, y_pred)

array([[38,  4],
       [ 0, 72]])

## Bagging & Boosting

In [0]:
#reading the dataset
df = pd.read_csv('train.csv')

#filling missing values
df['Gender'].fillna('Male', inplace = True)
df.dropna(inplace=True)

In [0]:
df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
5,LP001011,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y


In [0]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [0]:
X.shape

(492, 12)

In [0]:
y.shape

(492,)

In [0]:
#create dummies
X=pd.get_dummies(X)

In [0]:
#split dataset into train and test

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)

In [0]:
x_train.head()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Loan_ID_LP001003,Loan_ID_LP001005,Loan_ID_LP001006,Loan_ID_LP001008,Loan_ID_LP001011,Loan_ID_LP001013,Loan_ID_LP001014,Loan_ID_LP001018,Loan_ID_LP001020,Loan_ID_LP001024,Loan_ID_LP001028,Loan_ID_LP001029,Loan_ID_LP001030,Loan_ID_LP001032,Loan_ID_LP001036,Loan_ID_LP001038,Loan_ID_LP001043,Loan_ID_LP001046,Loan_ID_LP001047,Loan_ID_LP001050,Loan_ID_LP001066,Loan_ID_LP001068,Loan_ID_LP001073,Loan_ID_LP001086,Loan_ID_LP001095,Loan_ID_LP001097,Loan_ID_LP001098,Loan_ID_LP001100,Loan_ID_LP001112,Loan_ID_LP001114,Loan_ID_LP001116,Loan_ID_LP001119,Loan_ID_LP001120,Loan_ID_LP001131,Loan_ID_LP001138,...,Loan_ID_LP002912,Loan_ID_LP002916,Loan_ID_LP002917,Loan_ID_LP002925,Loan_ID_LP002926,Loan_ID_LP002928,Loan_ID_LP002931,Loan_ID_LP002933,Loan_ID_LP002936,Loan_ID_LP002938,Loan_ID_LP002940,Loan_ID_LP002941,Loan_ID_LP002945,Loan_ID_LP002948,Loan_ID_LP002953,Loan_ID_LP002958,Loan_ID_LP002959,Loan_ID_LP002961,Loan_ID_LP002964,Loan_ID_LP002974,Loan_ID_LP002978,Loan_ID_LP002979,Loan_ID_LP002983,Loan_ID_LP002984,Loan_ID_LP002990,Gender_Female,Gender_Male,Married_No,Married_Yes,Dependents_0,Dependents_1,Dependents_2,Dependents_3+,Education_Graduate,Education_Not Graduate,Self_Employed_No,Self_Employed_Yes,Property_Area_Rural,Property_Area_Semiurban,Property_Area_Urban
376,8750,4996.0,130.0,360.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,1,0,1,0,0
17,3510,0.0,76.0,360.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1
585,4283,3000.0,172.0,84.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,1,0,1,0,0
416,2600,0.0,160.0,360.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,1
78,3167,4000.0,180.0,300.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,1,0,0,1,0


In [0]:
np.any(np.isnan(x_test))

False

In [0]:
np.any(np.isnan(x_train))

False

### Bagging
We create n subsets(bags) of observations from the original dataset, **with replacement**. The size of subsets created for bagging may be less than the original set. Independent models are created on these subsets.

#### Bagging Meta Estimator
1. Random subsets are created from the original dataset(Bootstrapping)
2. Subset of the dataset includes all features
3. A user-specified base estimator is fitted on each of these smaller sets
4. Predictions from each model are combined to get the final result

In [0]:
model = BaggingClassifier(DecisionTreeClassifier(random_state=1))

In [0]:
model.fit(x_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                        class_weight=None,
                                                        criterion='gini',
                                                        max_depth=None,
                                                        max_features=None,
                                                        max_leaf_nodes=None,
                                                        min_impurity_decrease=0.0,
                                                        min_impurity_split=None,
                                                        min_samples_leaf=1,
                                                        min_samples_split=2,
                                                        min_weight_fraction_leaf=0.0,
                                                        presort='deprecated',
                                                        random_state=1,
   

In [0]:
model.score(x_test, y_test)

0.8040540540540541

In [0]:
y_pred = model.predict(x_test)
confusion_matrix(y_test, y_pred)

array([[ 19,  25],
       [  4, 100]])

#### Random Forest
Random forest is another ensemble machine learning algo that follows bagging technique. The base estimator in random forest are decision trees. Unlike BME, RF randomly selects a set of features which are used to decide the best split at each node of the decision tree. 
Steps:

1. Random subsets are created from original dataset(Bootstrap)
2. At each node in the decision tree, only a random set of features are considered to decide the best split.
3. A decision tree model is fitted on each of the subsets
4. The final prediction is calculated by averaging the predictions from all the decision trees

*Note: The decision trees in random forest can be built on a subset of data and features. Particularly, the sklearn model of random forest uses all features for decision tree and a subset of features are randomly selected for splitting at each node.*

In [0]:
model = RandomForestClassifier(random_state=1)

model.fit(x_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=1, verbose=0,
                       warm_start=False)

In [0]:
model.score(x_test, y_test)

0.8243243243243243

In [0]:
y_pred = model.predict(x_test)
confusion_matrix(y_test, y_pred)

array([[ 19,  25],
       [  1, 103]])

### Boosting
Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on previous models. 
The final model(strong learner) is the weighted mean of all the models(week learners)

#### AdaBoost
Adaptive Boosting or AdaBoost is one of the simplest boosting algorithms. usually decision trees are used for modeling. Multiple sequential models are created, each correcting the errors from the last model.AdaBoost assigns weights to the observations which are incorrectly predicted and the subsequent model works to predict these values correctly.
Steps:
1. Initially, all observations in the dataset are given equal weights
2. A model is built on a subset of data
3. Using this model, predictions are made on the whole dataset
4. Errors are calculated by comparing the predictions and actual values.
5. While creating the next model, higher weights are given to the data points which were predicted incorrectly. 
6. Weights can be determined using the error value. For instance, higher the error more is the weight assigned to the observation. 
7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

In [0]:
#model = AdaBoostClassifier(random_state=1)
model = AdaBoostClassifier(random_state=1, n_estimators= 1000, learning_rate= 0.001)
model.fit(x_train, y_train)

AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
                   learning_rate=0.001, n_estimators=1000, random_state=1)

In [0]:
model.score(x_test, y_test)

0.8243243243243243

In [0]:
y_pred = model.predict(x_test)
confusion_matrix(y_test, y_pred)

array([[ 19,  25],
       [  1, 103]])

#### GBM

In [0]:
model = GradientBoostingClassifier(learning_rate=0.01, random_state=1)
model.fit(x_train, y_train)

GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.01, loss='deviance', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=1, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

In [0]:
model.score(x_test, y_test)

0.8243243243243243

In [0]:
y_pred = model.predict(x_test)
confusion_matrix(y_test, y_pred)

array([[ 19,  25],
       [  1, 103]])

#### XGBM
It is also known as regularized boosting technique

In [0]:
model = xgb.XGBClassifier(random_state=1, learning_rate= 0.01)
model.fit(x_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.01, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=1,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [0]:
model.score(x_test, y_test)

0.7905405405405406

In [0]:
y_pred = model.predict(x_test)
confusion_matrix(y_test, y_pred)

array([[19, 25],
       [ 6, 98]])

#### Light GBM
When you have a large dataset, usually this performs really well and fast. 
The major difference is how the trees grow. In LGBM trees grow leaf wise, while in other algorithms tree grows level wise.

In [0]:
y_train = (y_train == 'Y').astype(int)
y_test = (y_test == 'Y').astype(int)

In [0]:
train_data = lgb.Dataset(x_train, label = y_train)
#define params
params = {'learning_rate': 0.001}
model = lgb.train(params, train_data, 100)

y_pred = model.predict(x_test)
y_pred = (y_pred > 0.5).astype(int)


#### CatBoost

In [0]:
#reading the dataset
df = pd.read_csv('train.csv')
df = df.iloc[:,1:]
#filling missing values
df['Gender'].fillna('Male', inplace = True)
df.dropna(inplace=True)

In [0]:
#df.iloc[:,1:].head()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
1,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
5,Male,Yes,2,Graduate,Yes,5417,4196.0,267.0,360.0,1.0,Urban,Y


In [0]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [0]:
#split dataset into train and test

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)

In [0]:
x_train.head()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
376,Male,Yes,3+,Graduate,No,8750,4996.0,130.0,360.0,1.0,Rural
17,Female,No,0,Graduate,No,3510,0.0,76.0,360.0,0.0,Urban
585,Male,Yes,1,Graduate,No,4283,3000.0,172.0,84.0,1.0,Rural
416,Female,No,1,Graduate,No,2600,0.0,160.0,360.0,1.0,Urban
78,Male,Yes,3+,Graduate,No,3167,4000.0,180.0,300.0,0.0,Semiurban


In [0]:
model = CatBoostClassifier()
categorical_features_indices = np.where(df.dtypes != np.float)[0]
model.fit(x_train, y_train, cat_features = ([0,1,2,3,4,10]), eval_set = (x_test, y_test))

model.score(x_test, y_test)

Learning rate set to 0.024347
0:	learn: 0.6797082	test: 0.6817102	best: 0.6817102 (0)	total: 51.4ms	remaining: 51.3s
1:	learn: 0.6673778	test: 0.6705005	best: 0.6705005 (1)	total: 53.8ms	remaining: 26.9s
2:	learn: 0.6554170	test: 0.6605707	best: 0.6605707 (2)	total: 56.2ms	remaining: 18.7s
3:	learn: 0.6455497	test: 0.6510864	best: 0.6510864 (3)	total: 58.5ms	remaining: 14.6s
4:	learn: 0.6365487	test: 0.6419451	best: 0.6419451 (4)	total: 59.9ms	remaining: 11.9s
5:	learn: 0.6269127	test: 0.6323059	best: 0.6323059 (5)	total: 62.3ms	remaining: 10.3s
6:	learn: 0.6171128	test: 0.6249543	best: 0.6249543 (6)	total: 64.7ms	remaining: 9.17s
7:	learn: 0.6095292	test: 0.6177577	best: 0.6177577 (7)	total: 66.4ms	remaining: 8.24s
8:	learn: 0.6019682	test: 0.6105984	best: 0.6105984 (8)	total: 68.7ms	remaining: 7.56s
9:	learn: 0.5939462	test: 0.6025812	best: 0.6025812 (9)	total: 70.9ms	remaining: 7.02s
10:	learn: 0.5866590	test: 0.5963500	best: 0.5963500 (10)	total: 73.1ms	remaining: 6.57s
11:	learn: 

0.8243243243243243

In [0]:
y_pred = model.predict(x_test)

In [0]:
y_pred = (y_pred == 'Y')

In [0]:
y_pred = y_pred.astype(int)

In [0]:
y_test = (y_test == 'Y').astype(int)

In [0]:
confusion_matrix(y_test, y_pred)

array([[ 19,  25],
       [  1, 103]])

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/?

https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/

https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/

https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/

https://arxiv.org/pdf/1603.02754.pdf

https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

https://stats.stackexchange.com/questions/354484/why-does-xgboost-have-a-learning-rate

https://xgboost.readthedocs.io/en/latest/tutorials/model.html

https://towardsdatascience.com/selecting-optimal-parameters-for-xgboost-model-training-c7cd9ed5e45e

https://towardsdatascience.com/from-zero-to-hero-in-xgboost-tuning-e48b59bfaf58

https://www.youtube.com/watch?v=IRdEjvuh3OI -- **cation** video is dammmmmn slow, watch at your own risk

https://medium.com/analytics-vidhya/math-behind-gbm-and-xgboost-d00e8536b7de

