
# <font color='blue'> Gradient Boosting</font>

Gradient Boosting or Gradient Boosted Regression Trees (GBRT) is a non-parametric statistical learning technique used for classification and regression problems. 

Gradient Boosting = Gradient Descent + Boosting.

Three steps are performed in the construction of the model:

1. Generates a regressor
2. Compute residual error
3. Learn to predict the residue

In [1]:
from IPython.display import Image
Image(url = 'GB1.png')

In [2]:
from IPython.display import Image
Image(url = 'GB2.png')

## How Gradient Boosting works:

1. Gradient Boosting Performs a set of predictions (y)
2. Calculates the error of the predictions (j)
3.Tries to adjust y reducing the error (via alpha)
4.For each base estimator, the gradient of the loss function is estimate
5.Subsequent estimators estimate the residual error of previous estimators
6.Apply gradient descent to reduce j
7.Sum the results of the estimators, giving weight at each step according to the value of alpha

http://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting

## Gradient Boosting Classifier

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

In [3]:
# importing modules
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier

# setting dataset to x and y
X, y = make_hastie_10_2(random_state = 0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]

# creating classifier
clf = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, random_state = 0)

# training classifier
clf.fit(X_train, y_train)

# calculating accuracy
clf.score(X_test, y_test)                 

0.9117

## Gradient Boosting Regressor

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

In [4]:
# importing modules
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor

# setting dataset to x and y
X, y = make_friedman1(n_samples = 1200, random_state = 0, noise = 1.0)
X_train, X_test = X[:200], X[200:]
y_train, y_test = y[:200], y[200:]

# creating regressor
est = GradientBoostingRegressor(n_estimators = 100, learning_rate = 0.1, max_depth = 1, random_state = 0, loss = 'ls')

# fitting regressor
est.fit(X_train, y_train)

# Calculating the average squared error
mean_squared_error(y_test, est.predict(X_test))    

5.009154859960319