# Gradient Boosting (GB)
---

1. Gradient boosting approximates the given distribution by building stage-wise weak learners.
    1. GB is a generalized additive model of n weak learners. 
        $$ G(x) = g_{1}(x) + \dots + g_{n}(x) $$
        where $G(x)$ is the final gradient boosting model and $g(x)$ is one type of weak learners. 
    1. The weak learner $g(x)$ can be any regression model (output a real number). The regression tree is the most commonly used weak leaner in Gradient Boosting. 
1. Each new weak leaner compensates the errors made by the sum of the previous weak learners by fitting only the residuals. 
    1. $g_{1}(x) \dots g_{n}(x) $ are the same weak leaner (regression tree) trained on different training sets. 
1. Residuals (gradients) can be seen as the gaps between the current model and the final distribution.

## Gradient boosting regression tree (GBRT) algorithm
---

Given a loss function $L(\cdot)$, a training set $X = \{\mathbf{x_{i}}\}$, $\mathbf{y} = \{y_{i}\}$, a learning rate $\alpha$, and a number of iterations $M$, the algorithm to train a GBRT is as follows:
1. Intiailize $G(x)$ by fitting CART on $D$ 
1. For $m = 1 \dots M$,
    1. Evaluate the loss over the current $G(x)$
    1. Calculate the gradient of the loss w.r.t the labels to get the **residuals** $\tilde{\mathbf{y}}$:
        $$ \tilde{\mathbf{y}} = \frac{\partial L(G(X), \mathbf{y})}{\partial \mathbf{y}}$$
        Note $\tilde{\mathbf{y}}$ has the same shape as $\mathbf{y}$.
    1. Use $X$ and **residuals** $\tilde{\mathbf{y}}$ as the new training set to train a CART $g(x)$.
    1. Add the new weak leaner into the current model:
        $$ G(x) = G(x) + \alpha g(x) $$

## References
---

1. https://web.njit.edu/~usman/courses/cs675_spring20/BoostedTree.pdf