In [6]:
from IPython.display import HTML
css_file = './custom.css'
HTML(open(css_file, "r").read())

# Gradient Boosted Trees

© 2018 Daniel Voigt Godoy

In [2]:
from intuitiveml.supervised.regression.GradientBoostedTrees import *
from intuitiveml.utils import *

## 1. Definition

From the Scikit-Learn [website](https://scikit-learn.org/stable/modules/ensemble.html):

    The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
    
    Two families of ensemble methods are usually distinguished:
    
    In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.
    Examples: Bagging methods, Forests of randomized trees, …

    By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.
    Examples: AdaBoost, Gradient Tree Boosting, …
    
Let's stick with the ***second*** family this time, which ***Gradient Boosted Trees*** are a member of.

Once again, the main idea is to train a whole ***bunch of Decision Trees***. But, this time, it is ***not*** going to ***simply average them***. This is not ***bagging***, it is...

### 1.1 (Gradient) Boosting

This time, it will start by training ***ONE tree*** on the ***original dataset*** and evaluate its predictions, computing the corresponding ***residuals*** (errors).

Then, for the ***NEXT*** tree, it will ***not*** use the ***original dataset*** anymore, but the ***residuals from the previous tree***!

So, instead of averaging the predictions of individual trees, it ***adds them up*** to get the final predictions.

Unlike bagging, which is highly parallelizable, ***boosting*** is a ***sequential*** process.

This is ***Gradient Boosting*** in a nutshell!

P.S.: There is actually more to it... [XGBoost](https://github.com/dmlc/xgboost) is one of the most popular and succesful algorithms and it uses many more improvements, but the underlying idea is still the same.

## 2. Experiment

Time to try it yourself! 

This time, it is a regression problem!

You have 5 data points with values between 1160 and 2000. This is your ***response**.

Each point is associated with a single numerical value between 750 and 950. This is your ***feature***.

For a regression, the initial step is to compute the ***average*** of all points and use it to compute the ***residuals*** to train the first tree. We'll see the reasoning behind this in the ***Linear Regression*** lesson.

The sliders below allow you to train one (shown as zero in the slider) or multiple Decision Trees and choose the maximum depth they are allowed to have.

For each new trained tree, it will show both ***residuals from the previous step*** and corresponding ***fitted tree*** on the left. On the right, it will ***add up*** the predictions of all trees up to that one.

Use the sliders to play with different configurations and answer the questions below.

In [3]:
xreg = np.array([750., 800., 850., 900., 950.])
yreg = np.array([1160., 1200., 1280., 1450., 2000.])
mydtr = plotDecision(x=xreg, y=yreg)

vb = VBox(build_figure_boost(mydtr), layout={'align_items': 'center'})

In [4]:
vb

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'green', 'line': {'color': 'black', 'width': 2…

#### Questions

1. What happens to the ***level of residuals*** as you increase the number of trees (keeping depth = 1)?
2. What happens if you ***increase the depth*** of the trees? Why?
3. Which one is best to use in GBTs, ***shallow*** or ***deep*** trees? Why

## 3. Scikit-Learn

[Gradient Tree Boosting](https://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting)

Please check Aurelién Geron's "Hand-On Machine Learning with Scikit-Learn and Tensorflow" notebook on Ensemble Methods [here](http://nbviewer.jupyter.org/github/ageron/handson-ml/blob/master/07_ensemble_learning_and_random_forests.ipynb).

## 4. More Resources

[Difference between Bagging and Boosting](https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/)

[Complete Guide to Parameter Tuning in XGBoost](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/)

[Interpretable Machine Learning with XGBoost](https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27)

[How to explain gradient boosting](https://explained.ai/gradient-boosting/index.html)

[Mastering The New Generation of Gradient Boosting](https://towardsdatascience.com/https-medium-com-talperetz24-mastering-the-new-generation-of-gradient-boosting-db04062a7ea2)

#### This material is copyright Daniel Voigt Godoy and made available under the Creative Commons Attribution (CC-BY) license ([link](https://creativecommons.org/licenses/by/4.0/)). 

#### Code is also made available under the MIT License ([link](https://opensource.org/licenses/MIT)).

In [5]:
from IPython.display import HTML
HTML('''<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>''')