## Contents

* What is boosting ?
* Boosting algorithm
  * Adaptive Boosting(Ada boost)
  * Gradient Boosting(GBM)
  * Extreme Gradient Boosting(XGBoost)
* Algorithm main Parameters
* Busilding Model
* Finetuning Models


### What is Boosting ?

* If bagging is wisdom of crowds then boosting is wisdom of crowds with each idividual given weight based on their expertise

* Boosting in general decreases Bias error and builds strong predictive models

* Boosting is an iterative technique. We adjust the weight of the observation based on previous classification

* If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa

### Boosting algorithm

![](b1.png)
![](b2.png)
![](b3.png)
![](b4.png)
![](b5.png)
![](b6.png)

** Ada Boosting **

Adaptive Boosting : Above method is of Ada Boost technique. Here we give high weight to misclassified records


** Gradient Boosting **

* Simillar to Ada boosting algorithm
* The approach is same but there are slight modification during re-weighted sampling
* We update the weights based on misclassification rate and gradient decent
* It give better result in some class of problem like regression

*Gradient Descent:*
Gradient descent tries to optimize the loss function by tuning different values of coefficients to minimize the error.

** XGBoost **

* Follows the principle of gradient boosting
* It uses a more regularized model formalization to control over-fitting, which gives better performance
* It uses improved convergence techniques, vector and matrix type data structure for faster result
* Better support for multicore processing which reduces overall training time
* For a given dataset you are less likely to get memory error while using xgboost when compared to GBM

## Algorithm main Parameters

XGBoost parameters can be divided into three categories :

* General Parameters: Controls the booster type in the model which eventually drives overall functioning
* Booster Parameters: Controls the performance of the selected booster
* Learning Task Parameters: Sets and evaluates the learning process of the booster from the given data

** General Parameters **
* Booster[default=gbtree]
    * Sets the booster type (gbtree, gblinear or dart) to use. For classification problems, you can use gbtree, dart. 
* nthread[default=maximum cores available] 
    * Activates parallel computation. Generally, people don't change it as using maximum cores leads to the fastest computation.


** Booster Parameters **

* n_estimators[default=100]
    * It controls the maximum number of iterations. For classification, it is similar to the number of trees to grow.
    * Should be tuned using CV
* gamma[default=0][range: (0,Inf)]
    * It controls regularization (or prevents overfitting). 
    * The optimal value of gamma depends on the data set and other parameter values.
    * Higher the value, higher the regularization. Regularization means penalizing large coefficients which don't improve the model's performance. default = 0 means no regularization.
    * Tune trick: Start with 0 and check CV error rate. If you see train error >>> test error, bring gamma into action.
    * Higher the gamma, lower the difference in train and test CV. If you have no clue what value to use, use gamma=5 and see the performance. 
    * Remember that gamma brings improvement when you want to use shallow (low max_depth) trees.
* max_depth[default=3][range: (0,Inf)]
    * It controls the depth of the tree.
    * Larger the depth, more complex the model; higher chances of overfitting. There is no standard value for max_depth. 
    * Larger data sets require deep trees to learn the rules from data.

* min_child_weight[default=1][range:(0,Inf)]
    * In classification, if the leaf node has a minimum sum of instance weight (calculated by second order partial derivative) lower than min_child_weight, the tree splitting stops.
    * In simple words, it blocks the potential feature interactions to prevent overfitting. Should be tuned using CV.
* subsample[default=1][range: (0,1)]
    * It controls the number of samples (observations) supplied to a tree.
    * Typically, its values lie between (0.5-0.8)
* colsample_bytree[default=1][range: (0,1)]
    * It control the number of features (variables) supplied to a tree
    * Typically, its values lie between (0.5,0.9)

** Learning Task Parameters **

* Objective[default=reg:linear]
    * reg:linear - for linear regression
    * binary:logistic - logistic regression for binary classification. It returns class probabilities
    * multi:softmax - multiclassification using softmax objective. It returns predicted class labels. It requires setting num_class parameter denoting number of unique prediction classes.
    * multi:softprob - multiclassification using softmax objective. It returns predicted class probabilities.
* eval_metric [no default, depends on objective selected]
    * These metrics are used to evaluate a model's accuracy on validation data. For classification, default metric is error.
    * Available error functions are as follows:
        * mae - Mean Absolute Error (used in regression)
        * Logloss - Negative loglikelihood (used in classification)
        * AUC - Area under curve (used in classification)
        * error - Binary classification error rate [#wrong cases/#all cases]
        * mlogloss - multiclass logloss (used in classification)
