Gradient Boosting is an advanced machine learning technique. It builds multiple weak predictive models (usually decision trees) sequentially. Each new model gradually minimizes the loss function (error) of the whole system.

#### Three components are involved: 
<br> an additive model that adds weak learners to minimize the loss function, 
<br> a loss function that has to be optimized, and 
<br> a weak learner that needs to generate predictions. Every new tree fixes the mistakes made by the ones before it.

Evaluation Metrics
<br> For Classification: Accuracy, Precision, Recall, F1 Score. </br>
For Regression: Mean Squared Error (MSE), R-squared.

Age, sex, body mass index, average blood pressure, and six blood serum measures are among the characteristics that are included in the Diabetes dataset. One year after baseline, a quantitative assessment of the disease’s development is the goal variable.

In [5]:
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

In [3]:
# Load the Diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

In [6]:
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [7]:
# Creating and training the Gradient Boosting model
gb_model = GradientBoostingRegressor(random_state=42)
gb_model.fit(X_train, y_train)

In [8]:
# Predicting the test set results
y_pred_gb = gb_model.predict(X_test)

In [9]:
# Evaluating the model
mse_gb = mean_squared_error(y_test, y_pred_gb)
r2_gb = r2_score(y_test, y_pred_gb)

In [10]:
print("MSE:", mse_gb)
print("R2 score:", r2_gb)

MSE: 2898.4366729135227
R2 score: 0.4529343796683364


The R-squared value of 0.45 suggests that nearly 45% of the variance in the target variable is explained by the model, which is decent for a complex task like this.

The MSE gives us an idea of the average squared difference between the observed actual outcomes and the outcomes predicted by the model.