# Gradient Boosting Regression

## Hypothesis function

$$
h(\mathbf{x}^{(i)}) = h_0(\mathbf{x}^{(i)}) + \alpha_1h_1(\mathbf{x}^{(i)}) + \cdots + \alpha_s h_s(\mathbf{x}^{(i)})
$$

1. No alpha is applied to the first predictor, because the learning is "sequential"
2. In addition, all alpha shares the same number.  Here, alpha is like the learning rate in regression.

## Main idea: fitting the residuals

$$h_0(\mathbf{x}^{(i)}) + \text{residual}_0 = y^{(i)} $$
$$ \text{residual}_0 =  y^{(i)} - h_0(\mathbf{x}^{(i)}) $$

## Summary of steps

1. Initialize the model as simply mean or some constant
2. Predict and calculate the residual
3. Let the next model fit the residual
4. Predict using the combined models and calculate the residual
5. Let the next model fit this residual
6. Simply repeat 4-5 until stopping criteria is reached

## 1. Scratch

In [1]:
from scipy.special import expit
from sklearn.tree import DecisionTreeRegressor
from sklearn.dummy import DummyRegressor

In [2]:
def grad(y, h):
    return y - h

In [3]:
def predict(X, models):
    learning_rate = 0.1
    f0 = models[0].predict(X)
    boosting = sum(learning_rate * model.predict(X) for model in models[1:])
    return f0 + boosting

In [4]:
def fit(X, y, models):
    models_trained = []
    
    #using DummyRegressor is a good technique for starting model
    first_model = DummyRegressor(strategy='mean')
    first_model.fit(X, y)
    models_trained.append(first_model)
    
    #fit the estimators
    for i, model in enumerate(models):
        y_pred = predict(X, models_trained)
        residual = grad(y, y_pred)
        model.fit(X, residual)
        models_trained.append(model)
    return models_trained

## Let's use our scratch code!

In [5]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor

X, y = load_diabetes(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [6]:
n_estimators = 200
tree_params = {'max_depth': 1}
models = [DecisionTreeRegressor(**tree_params) for _ in range(n_estimators)]

In [7]:
#fit the models
models = fit(X_train, y_train, models)

In [8]:
#predict
y_pred = predict(X_test, models)


In [9]:
#print metrics
print("Our MSE: ", mean_squared_error(y_test, y_pred))

Our MSE:  2714.1889891700657
