XGBoost is an open-source software library for gradient boosting on decision trees. It is designed for efficient and scalable handling of large datasets and is particularly useful for machine learning competitions and other high-performance machine learning tasks. It can be used as a standalone classifier, or as an advanced component in a larger machine learning pipeline. XGBoost is widely used in industry and academia due to its high performance and ease of use.

XGBoost to train a classifier on a built-in dataset from scikit-learn, and plotting the training loss:

The below code does the following:

- Import the necessary libraries: xgboost, scikit-learn's load_iris function, train_test_split function and matplotlib for plotting
- Load the Iris dataset and split it into training and test sets.
- Convert the data into an XGBoost-compatible format using xgb.DMatrix
- Define the parameters for the XGBoost model.
- Train the model by specifying the training dataset, evaluation dataset, number of rounds and early stopping rounds
- Plot the training loss using matplotlib

Note that this is a basic example and you may want to tune the parameters and try different variations to get the best performance for your specific use case.

In [None]:
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the data into an XGBoost-compatible format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Define the parameters for the XGBoost model
params = {
    'objective': 'multi:softmax',  # Specify the learning task and the corresponding learning objective
    'num_class': 3,  # Number of classes in the dataset
    'tree_method': 'auto',  # Use the efficient tree building algorithm
    'nthread': -1,  # Use all available CPU threads
    'silent': 1,  # Don't print messages
}

# Train the model
evallist = [(dtest, 'eval'), (dtrain, 'train')]  # Specify the datasets for evaluation during training
num_round = 100  # Number of rounds (iterations) to run
bst = xgb.train(params, dtrain, num_round, evallist, early_stopping_rounds=10)

# Plot the training loss
plt.plot(bst.get_score(importance_type='gain'))
plt.show()



Here's an example of using grid search to find the best hyperparameters for an XGBoost classifier on a built-in dataset from scikit-learn, and plotting the training loss:

The above code does the following:

- Import the necessary libraries: xgboost, scikit-learn's load_iris function, train_test_split function, GridSearchCV and matplotlib for plotting
- Load the Iris dataset and split it into training and test sets.
- Convert the data into an XGBoost-compatible format using xgb.DMatrix
- Define the parameters for the XGBoost model.
- Run a grid search using the GridSearchCV function to

In [None]:
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
import matplotlib.pyplot as plt

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the data into an XGBoost-compatible format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Define the parameters for the XGBoost model
param_grid = {
    'objective': ['multi:softmax'],  # Specify the learning task and the corresponding learning objective
    'num_class': [3],  # Number of classes in the dataset
    'tree_method': ['auto'],  # Use the efficient tree building algorithm
    'nthread': [-1],  # Use all available CPU threads
    'silent': [1],  # Don't print messages
    'learning_rate': [0.1, 0.2, 0.3],
    'max_depth': [3, 4, 5, 6],
    'subsample': [0.5, 0.6, 0.7, 0.8, 0.9],
    'colsample_bytree': [0.5, 0.6, 0.7, 0.8, 0.9],
    'n_estimators': [50, 100, 200]
}

# Define the model
model = xgb.XGBClassifier()

# Run GridSearchCV
grid_search = GridSearchCV(model, param_grid, cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train)

# Print the best parameters
print(grid_search.best_params_)

# Train the model with the best parameters
params = grid_search.best_params_
num_round = 100  # Number of rounds (iterations) to run
bst = xgb.train(params, dtrain, num_round, evallist, early_stopping_rounds=10)

# Plot the training loss
plt.plot(bst.get_score(importance_type='gain'))
plt.show()
