There are a number of excellent reasons:

Gradient boosting is the best: its accuracy and performance are unmatched for tabular supervised learning tasks.
Gradient boosting is highly versatile: it can be used in many important tasks such as regression, classification, ranking, and survival analysis.
Gradient boosting is interpretable: unlike black-box algorithms like neural networks, gradient boosting does not sacrifice interpretability for performance. It works like a Swiss watch and yet, with patience, you can teach how it works to a school kid.
Gradient boosting is well-implemented: it is not one of those algorithms that have little practical value. Various gradient boosting libraries like XGBoost and LightGBM in Python are used by hundreds of thousands of people.
Gradient boosting wins: since 2015, professionals have used it to consistently win tabular competitions on platforms like Kaggle.

import Libraries

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [2]:
# Load the diamonds dataset from Seaborn
diamonds = sns.load_dataset("diamonds")

# Split data into features and target
X = diamonds.drop("cut", axis=1)
y = diamonds["cut"]

In [3]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
   X, y, test_size=0.2, random_state=42
)

In [4]:
# Define categorical and numerical features
categorical_features = X.select_dtypes(
   include=["object"]
).columns.tolist()

numerical_features = X.select_dtypes(
   include=["float64", "int64"]
).columns.tolist()

Define preprocessing steps for categorical and numerical features

In [5]:
preprocessor = ColumnTransformer(
   transformers=[
       ("cat", OneHotEncoder(), categorical_features),
       ("num", StandardScaler(), numerical_features),
   ]
)

Create a Gradient Boosting Classifier pipeline

In [6]:
pipeline = Pipeline(
   [
       ("preprocessor", preprocessor),
       ("classifier", GradientBoostingClassifier(random_state=42)),
   ]
)


CV and training

In [7]:
# Perform 5-fold cross-validation
cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5)

# Fit the model on the training data
pipeline.fit(X_train, y_train)

# Predict on the test set
y_pred = pipeline.predict(X_test)

# Generate classification report
report = classification_report(y_test, y_pred)


Report the final results

In [8]:
print(f"Mean Cross-Validation Accuracy: {cv_scores.mean():.4f}")
print("\nClassification Report:")
print(report)



Mean Cross-Validation Accuracy: 0.7621

Classification Report:
              precision    recall  f1-score   support

        Fair       0.90      0.91      0.91       335
        Good       0.81      0.63      0.71      1004
       Ideal       0.82      0.91      0.86      4292
     Premium       0.70      0.86      0.77      2775
   Very Good       0.66      0.41      0.51      2382

    accuracy                           0.76     10788
   macro avg       0.78      0.74      0.75     10788
weighted avg       0.75      0.76      0.75     10788

