**Gradient Boosting** 

The **principle** behind boosting algorithms is that **first we build a model on the training dataset, then a second model is built to rectify the errors present in the first model.** 

This procedure is continued until and unless the **errors are minimized**, and the dataset is predicted correctly. 

In particular, we start with a weak model and subsequently, each new model is fit on a modified version of the original dataset.

**Notes :**
* A weak learner is a model that performs at least slightly better than a random model.

* Decision trees are generally used as weak learners in gradient boost.

* Unlike AdaBoost where decision trees with only one level (decision stumps) are used, decision trees in Gradient Boost generally consists of some 3-7 levels.

#### **Steps :**

Following are the steps involved :

1. Make a first guess for y_train and y_test, using the average of y_train.

\begin{align}
y_{train_{p0}} = \frac{1}{n}\sum_{i=1}^n y_{train_t}
\end{align}

\begin{align}
y_{test_{p0}} = y_{train_{p0}}
\end{align}

2. Calculate the residuals from the training dataset. 

\begin{align}
r_0 = y_{train} - y_{train_{p0}}
\end{align}

3. Fit a weak learner to the residuals minimizing the loss function. Let's call it *f0*.

\begin{align}
r_0 = f_0(X_{train})
\end{align}

4. Increment the predicted y's. 

\begin{align}
y_{train_{p1}} = y_{train_{p0}} + \alpha f_0(X_{train})
\end{align}

\begin{align}
y_{test_{p1}} = y_{test_{p0}} + \alpha f_0(X_{test})
\end{align}

where, α is the learning rate.

5. Repeat steps 2 through 4 until you reach the number of boosting rounds.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
def GradBoost(model, X_train, y_train, X_test, y_test, 
              boosting_rounds, learning_rate:float = 0.1):
    
    # make initial guess on y_test_hat and y_train_hat 
    # the guess is mean of X_train
    y_hat_train = np.repeat(np.mean(y_train), len(y_train))

    y_hat_test = np.repeat(np.mean(y_train), len(y_test))

    # calculate residual from training data
    # for initial guess

    residuals = y_train - y_hat_train

    # iterate over boosting rounds
    # fit model for residuals as target variable
    for i in range(boosting_rounds):
        model = model.fit(X_train, residuals)

        # increment predicted training y with interval
        # learning_rate * model
        y_hat_train += (learning_rate * model.predict(X_train))

        # preform same for test set
        y_hat_test += (learning_rate * model.predict(X_test))

        # calculate residual for next round
        residuals = y_train - y_hat_train
    
    return y_hat_train, y_hat_test