In [1]:
from sklearn.datasets import load_boston

In [2]:
train_inputs, train_outputs = load_boston(return_X_y = True)

Loaded Boston housing data consisting of 506 samples. The input is a collection of thirteen home features (see https://scikit-learn.org/stable/datasets/index.html#boston-dataset for details), and the output is a home price (in thousands of dollars). 

In [3]:
train_inputs.shape

(506, 13)

In [4]:
train_outputs.shape

(506,)

In [5]:
from sklearn.preprocessing import StandardScaler

In [6]:
scaler = StandardScaler()

In [7]:
train_inputs = scaler.fit_transform(train_inputs)

In [8]:
train_inputs.mean(axis = 0)

array([-8.78743718e-17, -6.34319123e-16, -2.68291099e-15,  4.70199198e-16,
        2.49032240e-15, -1.14523016e-14, -1.40785495e-15,  9.21090169e-16,
        5.44140929e-16, -8.86861950e-16, -9.20563581e-15,  8.16310129e-15,
       -3.37016317e-16])

In [9]:
train_inputs.std(axis = 0)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Centered and scaled each of the input features using a sklearn.preprocessing.StandardScaler object.  

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
train_inputs, test_inputs, train_outputs, test_outputs =\
train_test_split(train_inputs, train_outputs, test_size = .33)

In [12]:
train_inputs.shape

(339, 13)

In [13]:
test_inputs.shape

(167, 13)

Split the data into 339 training samples and 167 testing samples using the train_test_split helper function from sklearn.model_selection.

In [14]:
from sklearn.ensemble import GradientBoostingRegressor

In [15]:
model = GradientBoostingRegressor()

In [16]:
model.get_params()

{'alpha': 0.9,
 'ccp_alpha': 0.0,
 'criterion': 'friedman_mse',
 'init': None,
 'learning_rate': 0.1,
 'loss': 'ls',
 'max_depth': 3,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_iter_no_change': None,
 'presort': 'deprecated',
 'random_state': None,
 'subsample': 1.0,
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': 0,
 'warm_start': False}

In [17]:
model.fit(train_inputs,train_outputs)

GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

In [18]:
model.score(test_inputs, test_outputs)

0.9035869433464284

GradientBoostingRegressor develops an additive model in a greedy forward fashion.  At each stage, a regression tree is fit to the functional gradient of an arbitrary loss function evaluated at the current hypothesis function.

The loss function is encoded in the 'loss' parameter, with the (non-robust) default loss = 'ls' giving the standard L^2 error.  The 'n_estimators' parameter controls the number of estimators in the sum, while the 'alpha' parameter serves as a regularization (shrinkage) hyperparameter. 

Trees are built in a greedy fashion according to the 'criterion' parameter, with tree size controlled by the 'max_depth' parameter.  Other parameters governing the tree structure include 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'min_impurity_split', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', all of which appear as parameters in the sklearn.tree.DecisionTreeRegressor class.

Early stopping criterion can also be specified via 'early_stopping', 'n_iter_no_change', 'validation_fraction', 'tol' parameters.

Note that the 'subsamples' parameter allows for stochastic gradient boosting.  The default, subsamples = 1.0, is ordinary gradient boosting.

Performance of the model is indicated by the .score() method, which returns the R^2 value of an (output, input) collection of data.