#### Table of Contents
Introduction to Ensemble Learning
Basic Ensemble Techniques:
1. Max Voting
2. Averaging
3. Weighted Average

#### Advanced Ensemble Techniques
1. Stacking
2. Bagging
3. Boosting

#### Algorithms based on Bagging and Boosting
1. Bagging meta-estimator
2. Random Forest
3. AdaBoost
4. GBM
5. XGB
6. Light GBM
7. CatBoost

#### 2. Simple Ensemble Techniques
In this section, we will look at a few simple but powerful techniques, namely:

Max Voting
Averaging
Weighted Averaging

##### 1. Max Voting
The max voting method is generally used for classification problems. In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as a ‘vote’. The predictions which we get from the majority of the models are used as the final prediction.

In [None]:
import pandas as pd
import numpy as np
import os

df = pd.read_csv('data/bank_processed_data.csv')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df.head()

In [None]:
from sklearn.model_selection import train_test_split
 
X = df.drop('deposit_cat', 1)
y = df.deposit_cat


X_train, X_test, y_train, y_test = train_test_split(X,y)

In [None]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import tree

model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)


model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')

model.fit(X_train,y_train)
model.score(X_test,y_test)

##### 2. Averaging

Similar to the max voting technique, multiple predictions are made for each data point in averaging. In this method, we take an average of predictions from all the models and use it to make the final prediction. Averaging can be used for making predictions in regression problems or while calculating probabilities for classification problems.

For example, in the below case, the averaging method would take the average of all the values.

i.e. (5+4+5+4+4)/5 = 4.4

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1 = model1.predict_proba(X_test)
pred2 = model2.predict_proba(X_test)
pred3 = model3.predict_proba(X_test)

finalpred = (pred1+pred2+pred3)/3
print (finalpred)

##### 3. Weighted Average

In [None]:
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

model1.fit(X_train,y_train)
model2.fit(X_train,y_train)
model3.fit(X_train,y_train)

pred1 = model1.predict_proba(X_test)
pred2 = model2.predict_proba(X_test)
pred3 = model3.predict_proba(X_test)

finalpred = (pred1*0.3+pred2*0.3+pred3*0.4)

print (finalpred)

In [None]:
 model3.predict_proba([X_test.iloc[-3]])

In [None]:
 model3.predict([X_test.iloc[0]])

#### 3. Advanced Ensemble techniques
Now that we have covered the basic ensemble techniques, let’s move on to understanding the advanced techniques.

##### 1. Stacking
Stacking is an ensemble learning technique that uses predictions from multiple models (for example decision tree, knn or svm) to build a new model. This model is used for making predictions on the test set. Below is a step-wise explanation for a simple stacked ensemble:

The train set is split into 10 parts.

<img src="images/image-11.png">

A base model (suppose a decision tree) is fitted on 9 parts and predictions are made for the 10th part. This is done for each part of the train set.

<img src="images/image-10.png">

The base model (in this case, decision tree) is then fitted on the whole train dataset.
Using this model, predictions are made on the test set.

<img src="images/image-2.png">

Steps 2 to 4 are repeated for another base model (say knn) resulting in another set of predictions for the train set and test set.

<img src="images/image-3.png">

The predictions from the train set are used as features to build a new model

<img src="images/image12.png">

This model is used to make final predictions on the test prediction set.

##### We first define a function to make predictions on n-folds of train and test dataset. This function returns the predictions for train and test for each model.

In [None]:
from sklearn.model_selection import StratifiedKFold

def Stacking(model,train,y,test,n_fold):
    
    folds=StratifiedKFold(n_splits=n_fold,random_state=1)
    
    test_pred=np.empty((test.shape[0],1),float)
    train_pred=np.empty((0,1),float)
    
    for train_indices,val_indices in folds.split(train,y.values):
        
        x_train,x_val = train.iloc[train_indices],train.iloc[val_indices]
        y_train,y_val = y.iloc[train_indices],y.iloc[val_indices]

        model.fit(X=x_train,y=y_train)
        train_pred = model.predict(x_val)
        test_pred = model.predict(test)
        
    return test_pred,train_pred, y_val#, y_train

Now we’ll create two base models – decision tree and knn.

In [None]:
model1 = tree.DecisionTreeClassifier(random_state=1)

test_pred1 ,train_pred1, y_val_1 = Stacking(model=model1,n_fold=10, train=X_train,test=X_test,y=y_train)

train_pred1 = pd.DataFrame(train_pred1)
test_pred1 = pd.DataFrame(test_pred1)

In [None]:
model2 = KNeighborsClassifier()

test_pred2 ,train_pred2, y_val_2 =Stacking(model=model2,n_fold=10,train=X_train,test=X_test,y=y_train)

train_pred2 = pd.DataFrame(train_pred2)
test_pred2 = pd.DataFrame(test_pred2)

Create a third model, logistic regression, on the predictions of the decision tree and knn models.

In [None]:
df_train = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)
y_test_val = y_val_1

model = LogisticRegression(random_state=1)
model.fit(df_train,y_test_val)
model.score(df_test, y_test)

In [None]:
df_test

In order to simplify the above explanation, the stacking model we have created has only two levels. The decision tree and knn models are built at level zero, while a logistic regression model is built at level one. Feel free to create multiple levels in a stacking model.

### Bagging
The idea behind bagging is combining the results of multiple models (for instance, all decision trees) to get a generalized result. Here’s a question: If you create all the models on the same set of data and combine it, will it be useful? There is a high chance that these models will give the same result since they are getting the same input. So how can we solve this problem? One of the techniques is bootstrapping.

Bootstrapping is a sampling technique in which we create subsets of observations from the original dataset, with replacement. The size of the subsets is the same as the size of the original set.

Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair idea of the distribution (complete set). The size of subsets created for bagging may be less than the original set.

<img src="images/capture.png">

1. Multiple subsets are created from the original dataset, selecting observations with replacement.
2. A base model (weak model) is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions from all the models.

#### Boosting
Before we go further, here’s another question for you: If a data point is incorrectly predicted by the first model, and then the next (probably all models), will combining the predictions provide better results? Such situations are taken care of by boosting.

Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model. Let’s understand the way boosting works in the below steps.

1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.

<img src="images/capture2.png">

5. Errors are calculated using the actual values and predicted values.
6. The observations which are incorrectly predicted, are given higher weights.
(Here, the three misclassified blue-plus points will be given higher weights)
7. Another model is created and predictions are made on the dataset.
(This model tries to correct the errors from the previous model)

<img src="images/capture3.png">

8. Similarly, multiple models are created, each correcting the errors of the previous model.
9. The final model (strong learner) is the weighted mean of all the models (weak learners).

Thus, the boosting algorithm combines a number of weak learners to form a strong learner. The individual models would not perform well on the entire dataset, but they work well for some part of the dataset. Thus, each model actually boosts the performance of the ensemble.

### 4. Algorithms based on Bagging and Boosting
Bagging and Boosting are two of the most commonly used techniques in machine learning. In this section, we will look at them in detail. Following are the algorithms we will be focusing on:

Bagging algorithms:

1. Bagging meta-estimator
2. Random forest

Boosting algorithms:
1. AdaBoost
2. GBM
3. XGBM
4. Light GBM
5. CatBoost

For all the algorithms discussed in this section, we will follow this procedure:

Introduction to the algorithm
Sample code
Parameters
For this article, I have used the Loan Prediction Problem.

In [None]:
#importing important packages
import pandas as pd
import numpy as np

#reading the dataset
df=pd.read_csv("data/train_ctrUa4K.csv")

#filling missing values
df['Gender'].fillna('Male', inplace=True)
df = df.fillna(0)

In [None]:
df.head()

In [None]:
#split dataset into train and test

from sklearn.model_selection import train_test_split

X = df[[ 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area']]

y =  df['Loan_Status']

X =  pd.get_dummies(X)

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

##### 4.1. Bagging meta-estimator
Bagging meta-estimator is an ensembling algorithm that can be used for both classification (BaggingClassifier) and regression (BaggingRegressor) problems. It follows the typical bagging technique to make predictions. Following are the steps for the bagging meta-estimator algorithm:

1. Random subsets are created from the original dataset (Bootstrapping).
2. The subset of the dataset includes all features.
3. A user-specified base estimator is fitted on each of these smaller sets.
4. Predictions from each model are combined to get the final result.

In [93]:
from sklearn.ensemble import BaggingClassifier
from sklearn import tree

model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1), n_jobs=-1)

model.fit(x_train, y_train)
model.score(x_test,y_test)

0.7027027027027027

#### Parameters used in the  algorithms:

##### 1. base_estimator:
It defines the base estimator to fit on random subsets of the dataset.
When nothing is specified, the base estimator is a decision tree.

##### 2. n_estimators:
It is the number of base estimators to be created.
The number of estimators should be carefully tuned as a large number would take a very long time to run, while a very small number might not provide the best results.

##### 3. max_samples:
This parameter controls the size of the subsets.
It is the maximum number of samples to train each base estimator.

##### 4. max_features:
Controls the number of features to draw from the whole dataset.
It defines the maximum number of features required to train each base estimator.

##### 5. n_jobs:
The number of jobs to run in parallel.
Set this value equal to the cores in your system.
If -1, the number of jobs is set to the number of cores.

##### 6. random_state:
It specifies the method of random split. When random state value is same for two models, the random selection is same for both models.
This parameter is useful when you want to compare different models.

#### AdaBoost
Adaptive boosting or AdaBoost is one of the simplest boosting algorithms. Usually, decision trees are used for modelling. Multiple sequential models are created, each correcting the errors from the last model. AdaBoost assigns weights to the observations which are incorrectly predicted and the subsequent model works to predict these values correctly.

Below are the steps for performing the AdaBoost algorithm:

1. Initially, all observations in the dataset are given equal weights.
2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual values.
5. While creating the next model, higher weights are given to the data points which were predicted incorrectly.
6. Weights can be determined using the error value. For instance, higher the error more is the weight assigned to the observation.
7. This process is repeated until the error function does not change, or the maximum limit of the number of estimators is reached.

In [94]:
from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(random_state=1)
model.fit(x_train, y_train)
model.score(x_test,y_test)

0.7567567567567568

#### Parameters

##### 1. base_estimators:
It helps to specify the type of base estimator, that is, the machine learning algorithm to be used as base learner.

##### 2. n_estimators:
It defines the number of base estimators.
The default value is 10, but you should keep a higher value to get better performance.

##### 3. learning_rate:
This parameter controls the contribution of the estimators in the final combination.
There is a trade-off between learning_rate and n_estimators.

##### 4. max_depth:
Defines the maximum depth of the individual estimator.
Tune this parameter for best performance.

##### 5. n_jobs
Specifies the number of processors it is allowed to use.
Set value to -1 for maximum processors allowed.

##### 6. random_state :
An integer value to specify the random data split.
A definite value of random_state will always produce same results if given with same parameters and training data.

#### Gradient Boosting (GBM)
Gradient Boosting or GBM is another ensemble machine learning algorithm that works for both regression and classification problems. GBM uses the boosting technique, combining a number of weak learners to form a strong learner. Regression trees used as a base learner, each subsequent tree in series is built on the errors calculated by the previous tree.

<img src="images/gb1.png">

<img src="images/gb2.png">

In [95]:
from sklearn.ensemble import GradientBoostingClassifier

model= GradientBoostingClassifier(learning_rate=0.01,random_state=1)

model.fit(x_train, y_train)
model.score(x_test,y_test)

0.7891891891891892

#### Parameters

##### 1. min_samples_split
Defines the minimum number of samples (or observations) which are required in a node to be considered for splitting.
Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.

##### 2. min_samples_leaf
Defines the minimum samples required in a terminal or leaf node.
Generally, lower values should be chosen for imbalanced class problems because the regions in which the minority class will be in the majority will be very small.

##### 3. min_weight_fraction_leaf
Similar to min_samples_leaf but defined as a fraction of the total number of observations instead of an integer.

##### 4. max_depth
The maximum depth of a tree.
Used to control over-fitting as higher depth will allow the model to learn relations very specific to a particular sample.
Should be tuned using CV.

##### 5. max_leaf_nodes
The maximum number of terminal nodes or leaves in a tree.
Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
If this is defined, GBM will ignore max_depth.

##### 6. max_features
The number of features to consider while searching for the best split. These will be randomly selected.
As a thumb-rule, the square root of the total number of features works great but we should check up to 30-40% of the total number of features.
Higher values can lead to over-fitting but it generally depends on a case to case scenario.

#### XGBoost
XGBoost (extreme Gradient Boosting) is an advanced implementation of the gradient boosting algorithm. XGBoost has proved to be a highly effective ML algorithm, extensively used in machine learning competitions and hackathons. XGBoost has high predictive power and is almost 10 times faster than the other gradient boosting techniques. It also includes a variety of regularization which reduces overfitting and improves overall performance. Hence it is also known as ‘regularized boosting‘ technique.

Let us see how XGBoost is comparatively better than other techniques:

##### 1. Regularization:
Standard GBM implementation has no regularisation like XGBoost.
Thus XGBoost also helps to reduce overfitting.

##### 2. Parallel Processing:
XGBoost implements parallel processing and is faster than GBM .
XGBoost also supports implementation on Hadoop.

##### 3. High Flexibility:
XGBoost allows users to define custom optimization objectives and evaluation criteria adding a whole new dimension to the model.

##### 4. Handling Missing Values:
XGBoost has an in-built routine to handle missing values.

##### 5. Tree Pruning:
XGBoost makes splits up to the max_depth specified and then starts pruning the tree backwards and removes splits beyond which there is no positive gain.

##### 6. Built-in Cross-Validation:
XGBoost allows a user to run a cross-validation at each iteration of the boosting process and thus it is easy to get the exact optimum number of boosting iterations in a single run.

Since XGBoost takes care of the missing values itself, you do not have to impute the missing values. You can skip the step for missing value imputation from the code mentioned above. Follow the remaining steps as always and then apply xgboost as below.

##### Working on bank dataset

In [96]:
import pandas as pd
import numpy as np
import os

df = pd.read_csv('data/bank_processed_data.csv')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
df.head()

Unnamed: 0,age,balance,duration,campaign,previous,default_cat,housing_cat,loan_cat,deposit_cat,recent_pdays,...,marital_divorced,marital_married,marital_single,education_primary,education_secondary,education_tertiary,education_unknown,poutcome_failure,poutcome_success,poutcome_unknown
0,59,2343,1042,1,0,0,1,0,1,0.0001,...,0,1,0,0,1,0,0,0,0,1
1,56,45,1467,1,0,0,0,0,1,0.0001,...,0,1,0,0,1,0,0,0,0,1
2,41,1270,1389,1,0,0,1,0,1,0.0001,...,0,1,0,0,1,0,0,0,0,1
3,55,2476,579,1,0,0,1,0,1,0.0001,...,0,1,0,0,1,0,0,0,0,1
4,54,184,673,2,0,0,0,0,1,0.0001,...,0,1,0,0,0,1,0,0,0,1


In [97]:
from sklearn.model_selection import train_test_split
 
X = df.drop('deposit_cat', 1)
y = df.deposit_cat

X_train, X_test, y_train, y_test = train_test_split(X,y)

In [98]:
import xgboost as xgb

model = xgb.XGBClassifier(random_state=1,learning_rate=0.01)
model.fit(X_train, y_train)
model.score(X_test,y_test)

0.7728412755284844

#### Parameters

##### 1. nthread
This is used for parallel processing and the number of cores in the system should be entered..
If you wish to run on all cores, do not input this value. The algorithm will detect it automatically.

##### 2. eta
Analogous to learning rate in GBM.
Makes the model more robust by shrinking the weights on each step.

##### 3. min_child_weight
Defines the minimum sum of weights of all observations required in a child.
Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.

##### 4. max_depth
It is used to define the maximum depth.
Higher depth will allow the model to learn relations very specific to a particular sample.

##### 5. max_leaf_nodes
The maximum number of terminal nodes or leaves in a tree.
Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
If this is defined, GBM will ignore max_depth.

##### 6. gamma
A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.

##### 7. subsample
Same as the subsample of GBM. Denotes the fraction of observations to be randomly sampled for each tree.
Lower values make the algorithm more conservative and prevent overfitting but values that are too small might lead to under-fitting.

##### 8. colsample_bytree
It is similar to max_features in GBM.
Denotes the fraction of columns to be randomly sampled for each tree.

#### Light GBM
Before discussing how Light GBM works, let’s first understand why we need this algorithm when we have so many others (like the ones we have seen above). Light GBM beats all the other algorithms when the dataset is extremely large. Compared to the other algorithms, Light GBM takes lesser time to run on a huge dataset.

LightGBM is a gradient boosting framework that uses tree-based algorithms and follows leaf-wise approach while other algorithms work in a level-wise approach pattern. The images below will help you understand the difference in a better way.

<img src="images/lgb.png">

In [99]:
import lightgbm as lgb

train_data = lgb.Dataset(X_train,label=y_train)

#define parameters
params = {'learning_rate':0.001}

model= lgb.train(params, train_data, 100) 

y_pred=model.predict(X_test)

for i in range(0,185):
    if y_pred[i]>=0.5: 
           y_pred[i]=1
    else: 
           y_pred[i]=0

In [None]:
#!pip install lightgbm

<img src="images/lgb2.png">

#### CatBoost
Handling categorical variables is a tedious process, especially when you have a large number of such variables. When your categorical variables have too many labels (i.e. they are highly cardinal), performing one-hot-encoding on them exponentially increases the dimensionality and it becomes really difficult to work with the dataset.

CatBoost can automatically deal with categorical variables and does not require extensive data preprocessing like other machine learning algorithms.

In [100]:
from catboost import CatBoostClassifier

model = CatBoostClassifier()
#categorical_features_indices = np.where(df.dtypes != np.float)[0]

model.fit(X_train,y_train,cat_features=([ 0,  1, 2, 3, 4, 10]),eval_set=(X_test, y_test))
model.score(X_test,y_test)

Learning rate set to 0.097531
0:	learn: 0.6605438	test: 0.6630435	best: 0.6630435 (0)	total: 82.4ms	remaining: 1m 22s
1:	learn: 0.6274847	test: 0.6338798	best: 0.6338798 (1)	total: 185ms	remaining: 1m 32s
2:	learn: 0.5972543	test: 0.6069457	best: 0.6069457 (2)	total: 299ms	remaining: 1m 39s
3:	learn: 0.5747952	test: 0.5853265	best: 0.5853265 (3)	total: 387ms	remaining: 1m 36s
4:	learn: 0.5568137	test: 0.5684063	best: 0.5684063 (4)	total: 517ms	remaining: 1m 42s
5:	learn: 0.5386751	test: 0.5519826	best: 0.5519826 (5)	total: 610ms	remaining: 1m 41s
6:	learn: 0.5272078	test: 0.5417834	best: 0.5417834 (6)	total: 721ms	remaining: 1m 42s
7:	learn: 0.5171783	test: 0.5325956	best: 0.5325956 (7)	total: 824ms	remaining: 1m 42s
8:	learn: 0.5068132	test: 0.5232998	best: 0.5232998 (8)	total: 959ms	remaining: 1m 45s
9:	learn: 0.4999544	test: 0.5180653	best: 0.5180653 (9)	total: 1.08s	remaining: 1m 46s
10:	learn: 0.4938210	test: 0.5130244	best: 0.5130244 (10)	total: 1.19s	remaining: 1m 46s
11:	learn:

94:	learn: 0.4277773	test: 0.4672891	best: 0.4671371 (92)	total: 7.9s	remaining: 1m 15s
95:	learn: 0.4277354	test: 0.4673260	best: 0.4671371 (92)	total: 7.95s	remaining: 1m 14s
96:	learn: 0.4274220	test: 0.4673706	best: 0.4671371 (92)	total: 8.03s	remaining: 1m 14s
97:	learn: 0.4266345	test: 0.4670343	best: 0.4670343 (97)	total: 8.15s	remaining: 1m 15s
98:	learn: 0.4263442	test: 0.4670401	best: 0.4670343 (97)	total: 8.22s	remaining: 1m 14s
99:	learn: 0.4260776	test: 0.4668694	best: 0.4668694 (99)	total: 8.32s	remaining: 1m 14s
100:	learn: 0.4254152	test: 0.4666101	best: 0.4666101 (100)	total: 8.43s	remaining: 1m 15s
101:	learn: 0.4250122	test: 0.4664450	best: 0.4664450 (101)	total: 8.52s	remaining: 1m 15s
102:	learn: 0.4250035	test: 0.4664332	best: 0.4664332 (102)	total: 8.56s	remaining: 1m 14s
103:	learn: 0.4249826	test: 0.4663916	best: 0.4663916 (103)	total: 8.6s	remaining: 1m 14s
104:	learn: 0.4244347	test: 0.4663079	best: 0.4663079 (104)	total: 8.69s	remaining: 1m 14s
105:	learn: 0

186:	learn: 0.4045692	test: 0.4634698	best: 0.4634177 (184)	total: 18s	remaining: 1m 18s
187:	learn: 0.4045339	test: 0.4634957	best: 0.4634177 (184)	total: 18.1s	remaining: 1m 18s
188:	learn: 0.4044866	test: 0.4635052	best: 0.4634177 (184)	total: 18.2s	remaining: 1m 18s
189:	learn: 0.4042885	test: 0.4634322	best: 0.4634177 (184)	total: 18.3s	remaining: 1m 18s
190:	learn: 0.4039486	test: 0.4634606	best: 0.4634177 (184)	total: 18.4s	remaining: 1m 17s
191:	learn: 0.4038063	test: 0.4634748	best: 0.4634177 (184)	total: 18.5s	remaining: 1m 17s
192:	learn: 0.4033085	test: 0.4634437	best: 0.4634177 (184)	total: 18.6s	remaining: 1m 17s
193:	learn: 0.4029899	test: 0.4633095	best: 0.4633095 (193)	total: 18.7s	remaining: 1m 17s
194:	learn: 0.4027073	test: 0.4632842	best: 0.4632842 (194)	total: 18.8s	remaining: 1m 17s
195:	learn: 0.4026663	test: 0.4632935	best: 0.4632842 (194)	total: 18.9s	remaining: 1m 17s
196:	learn: 0.4024444	test: 0.4632560	best: 0.4632560 (196)	total: 19s	remaining: 1m 17s
197

277:	learn: 0.3906584	test: 0.4632028	best: 0.4629062 (211)	total: 26.9s	remaining: 1m 9s
278:	learn: 0.3905626	test: 0.4632013	best: 0.4629062 (211)	total: 26.9s	remaining: 1m 9s
279:	learn: 0.3905393	test: 0.4632086	best: 0.4629062 (211)	total: 27s	remaining: 1m 9s
280:	learn: 0.3903794	test: 0.4633500	best: 0.4629062 (211)	total: 27.1s	remaining: 1m 9s
281:	learn: 0.3903294	test: 0.4633768	best: 0.4629062 (211)	total: 27.2s	remaining: 1m 9s
282:	learn: 0.3900686	test: 0.4633793	best: 0.4629062 (211)	total: 27.3s	remaining: 1m 9s
283:	learn: 0.3900684	test: 0.4633850	best: 0.4629062 (211)	total: 27.4s	remaining: 1m 8s
284:	learn: 0.3899141	test: 0.4635226	best: 0.4629062 (211)	total: 27.5s	remaining: 1m 8s
285:	learn: 0.3897886	test: 0.4633885	best: 0.4629062 (211)	total: 27.6s	remaining: 1m 8s
286:	learn: 0.3897878	test: 0.4633939	best: 0.4629062 (211)	total: 27.7s	remaining: 1m 8s
287:	learn: 0.3896206	test: 0.4635339	best: 0.4629062 (211)	total: 27.8s	remaining: 1m 8s
288:	learn: 

369:	learn: 0.3829627	test: 0.4634973	best: 0.4629062 (211)	total: 35.5s	remaining: 1m
370:	learn: 0.3829597	test: 0.4634993	best: 0.4629062 (211)	total: 35.7s	remaining: 1m
371:	learn: 0.3829364	test: 0.4634931	best: 0.4629062 (211)	total: 35.8s	remaining: 1m
372:	learn: 0.3828350	test: 0.4634218	best: 0.4629062 (211)	total: 35.9s	remaining: 1m
373:	learn: 0.3828330	test: 0.4634336	best: 0.4629062 (211)	total: 36s	remaining: 1m
374:	learn: 0.3826389	test: 0.4635189	best: 0.4629062 (211)	total: 36.1s	remaining: 1m
375:	learn: 0.3826125	test: 0.4635716	best: 0.4629062 (211)	total: 36.2s	remaining: 1m
376:	learn: 0.3824952	test: 0.4636016	best: 0.4629062 (211)	total: 36.3s	remaining: 1m
377:	learn: 0.3824699	test: 0.4635853	best: 0.4629062 (211)	total: 36.4s	remaining: 59.9s
378:	learn: 0.3822440	test: 0.4633492	best: 0.4629062 (211)	total: 36.6s	remaining: 59.9s
379:	learn: 0.3822334	test: 0.4633529	best: 0.4629062 (211)	total: 36.7s	remaining: 59.9s
380:	learn: 0.3819665	test: 0.463221

461:	learn: 0.3746848	test: 0.4641573	best: 0.4629062 (211)	total: 45.9s	remaining: 53.4s
462:	learn: 0.3746483	test: 0.4641228	best: 0.4629062 (211)	total: 46s	remaining: 53.4s
463:	learn: 0.3746426	test: 0.4641281	best: 0.4629062 (211)	total: 46.2s	remaining: 53.4s
464:	learn: 0.3744694	test: 0.4640937	best: 0.4629062 (211)	total: 46.3s	remaining: 53.2s
465:	learn: 0.3743263	test: 0.4640599	best: 0.4629062 (211)	total: 46.4s	remaining: 53.1s
466:	learn: 0.3742822	test: 0.4641796	best: 0.4629062 (211)	total: 46.5s	remaining: 53s
467:	learn: 0.3742767	test: 0.4641710	best: 0.4629062 (211)	total: 46.6s	remaining: 53s
468:	learn: 0.3742637	test: 0.4642071	best: 0.4629062 (211)	total: 46.7s	remaining: 52.9s
469:	learn: 0.3741993	test: 0.4643047	best: 0.4629062 (211)	total: 46.8s	remaining: 52.8s
470:	learn: 0.3741681	test: 0.4643116	best: 0.4629062 (211)	total: 47s	remaining: 52.7s
471:	learn: 0.3739705	test: 0.4644271	best: 0.4629062 (211)	total: 47.1s	remaining: 52.7s
472:	learn: 0.3736

553:	learn: 0.3666140	test: 0.4647916	best: 0.4629062 (211)	total: 56.8s	remaining: 45.7s
554:	learn: 0.3665776	test: 0.4648119	best: 0.4629062 (211)	total: 56.8s	remaining: 45.6s
555:	learn: 0.3663540	test: 0.4648478	best: 0.4629062 (211)	total: 57s	remaining: 45.5s
556:	learn: 0.3661270	test: 0.4647205	best: 0.4629062 (211)	total: 57.1s	remaining: 45.4s
557:	learn: 0.3661136	test: 0.4646952	best: 0.4629062 (211)	total: 57.2s	remaining: 45.3s
558:	learn: 0.3658917	test: 0.4647286	best: 0.4629062 (211)	total: 57.3s	remaining: 45.2s
559:	learn: 0.3657580	test: 0.4647818	best: 0.4629062 (211)	total: 57.4s	remaining: 45.1s
560:	learn: 0.3655920	test: 0.4647233	best: 0.4629062 (211)	total: 57.5s	remaining: 45s
561:	learn: 0.3655072	test: 0.4647366	best: 0.4629062 (211)	total: 57.6s	remaining: 44.9s
562:	learn: 0.3654753	test: 0.4647327	best: 0.4629062 (211)	total: 57.7s	remaining: 44.8s
563:	learn: 0.3652902	test: 0.4648053	best: 0.4629062 (211)	total: 57.8s	remaining: 44.7s
564:	learn: 0.

647:	learn: 0.3599625	test: 0.4651218	best: 0.4629062 (211)	total: 1m 5s	remaining: 35.8s
648:	learn: 0.3598757	test: 0.4651610	best: 0.4629062 (211)	total: 1m 5s	remaining: 35.7s
649:	learn: 0.3598562	test: 0.4651564	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.6s
650:	learn: 0.3597644	test: 0.4651378	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.5s
651:	learn: 0.3597394	test: 0.4651414	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.4s
652:	learn: 0.3596767	test: 0.4652258	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.3s
653:	learn: 0.3596757	test: 0.4652257	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.2s
654:	learn: 0.3593514	test: 0.4654881	best: 0.4629062 (211)	total: 1m 6s	remaining: 35.1s
655:	learn: 0.3593107	test: 0.4655487	best: 0.4629062 (211)	total: 1m 6s	remaining: 35s
656:	learn: 0.3592677	test: 0.4654911	best: 0.4629062 (211)	total: 1m 6s	remaining: 34.9s
657:	learn: 0.3592267	test: 0.4654967	best: 0.4629062 (211)	total: 1m 6s	remaining: 34.8s
658:	learn: 

738:	learn: 0.3545760	test: 0.4672995	best: 0.4629062 (211)	total: 1m 15s	remaining: 26.7s
739:	learn: 0.3545714	test: 0.4672802	best: 0.4629062 (211)	total: 1m 15s	remaining: 26.6s
740:	learn: 0.3545119	test: 0.4672914	best: 0.4629062 (211)	total: 1m 15s	remaining: 26.5s
741:	learn: 0.3544419	test: 0.4672688	best: 0.4629062 (211)	total: 1m 15s	remaining: 26.4s
742:	learn: 0.3543125	test: 0.4673239	best: 0.4629062 (211)	total: 1m 16s	remaining: 26.3s
743:	learn: 0.3540553	test: 0.4673286	best: 0.4629062 (211)	total: 1m 16s	remaining: 26.2s
744:	learn: 0.3540073	test: 0.4673333	best: 0.4629062 (211)	total: 1m 16s	remaining: 26.1s
745:	learn: 0.3536085	test: 0.4671180	best: 0.4629062 (211)	total: 1m 16s	remaining: 26s
746:	learn: 0.3534887	test: 0.4673300	best: 0.4629062 (211)	total: 1m 16s	remaining: 25.9s
747:	learn: 0.3534510	test: 0.4673183	best: 0.4629062 (211)	total: 1m 16s	remaining: 25.8s
748:	learn: 0.3534392	test: 0.4673188	best: 0.4629062 (211)	total: 1m 16s	remaining: 25.7s
7

830:	learn: 0.3491255	test: 0.4676898	best: 0.4629062 (211)	total: 1m 24s	remaining: 17.2s
831:	learn: 0.3491249	test: 0.4676910	best: 0.4629062 (211)	total: 1m 24s	remaining: 17.1s
832:	learn: 0.3490673	test: 0.4676802	best: 0.4629062 (211)	total: 1m 24s	remaining: 17s
833:	learn: 0.3490383	test: 0.4676648	best: 0.4629062 (211)	total: 1m 24s	remaining: 16.9s
834:	learn: 0.3490095	test: 0.4676037	best: 0.4629062 (211)	total: 1m 24s	remaining: 16.8s
835:	learn: 0.3489497	test: 0.4676512	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.7s
836:	learn: 0.3489496	test: 0.4676552	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.6s
837:	learn: 0.3488097	test: 0.4677163	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.5s
838:	learn: 0.3488089	test: 0.4677191	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.4s
839:	learn: 0.3488033	test: 0.4677118	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.3s
840:	learn: 0.3487212	test: 0.4676249	best: 0.4629062 (211)	total: 1m 25s	remaining: 16.2s
8

921:	learn: 0.3437622	test: 0.4688189	best: 0.4629062 (211)	total: 1m 32s	remaining: 7.83s
922:	learn: 0.3437553	test: 0.4688164	best: 0.4629062 (211)	total: 1m 32s	remaining: 7.73s
923:	learn: 0.3437376	test: 0.4687897	best: 0.4629062 (211)	total: 1m 32s	remaining: 7.63s
924:	learn: 0.3436969	test: 0.4688435	best: 0.4629062 (211)	total: 1m 32s	remaining: 7.53s
925:	learn: 0.3436132	test: 0.4687311	best: 0.4629062 (211)	total: 1m 32s	remaining: 7.43s
926:	learn: 0.3435493	test: 0.4687178	best: 0.4629062 (211)	total: 1m 33s	remaining: 7.33s
927:	learn: 0.3435193	test: 0.4687858	best: 0.4629062 (211)	total: 1m 33s	remaining: 7.23s
928:	learn: 0.3434729	test: 0.4687662	best: 0.4629062 (211)	total: 1m 33s	remaining: 7.13s
929:	learn: 0.3433627	test: 0.4687391	best: 0.4629062 (211)	total: 1m 33s	remaining: 7.02s
930:	learn: 0.3433620	test: 0.4687350	best: 0.4629062 (211)	total: 1m 33s	remaining: 6.92s
931:	learn: 0.3433553	test: 0.4687298	best: 0.4629062 (211)	total: 1m 33s	remaining: 6.82s

0.793980652096023

In [None]:
#!pip install catboost

<img src="images/cb.png">

#### End Notes
Ensemble modeling can exponentially boost the performance of your model and can sometimes be the deciding factor between first place and second! In this article, we covered various ensemble learning techniques and saw how these techniques are applied in machine learning algorithms. 