<img src="https://cdn.comet.ml/img/notebook_logo.png">

[Comet](https://www.comet.com/site/products/ml-experiment-tracking/?utm_campaign=ray_train&utm_medium=colab) is an MLOps Platform that is designed to help Data Scientists and Teams build better models faster! Comet provides tooling to track, Explain, Manage, and Monitor your models in a single place! It works with Jupyter Notebooks and Scripts and most importantly it's 100% free to get started!

[Ray Train](https://docs.ray.io/en/latest/train/train.html) abstracts away the complexity of setting up a distributed training system.

Instrument your runs with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.

[Find more information about our integration with Ray Train](https://www.comet.ml/docs/v2/integrations/ml-frameworks/ray/)

Get a preview for what's to come. Check out a completed experiment created from this notebook [here](https://www.comet.com/examples/comet-example-ray-train-xgboost/43c968fda9e74260996f8cafb5b9f32c).

This example is based on the [following Ray Train XGBoost example](https://docs.ray.io/en/latest/train/distributed-xgboost-lightgbm.html).

# Install Dependencies

In [None]:
%pip install -U comet_ml "ray[air]>=2.1.0" xgboost_ray "pandas!=2.2.0"

# Initialize Comet

In [None]:
import comet_ml
import comet_ml.integration.ray

comet_ml.init(project_name="comet-example-ray-train-xgboost")

# Import Dependencies

In [None]:
import os
import ray
from ray.air.config import RunConfig, ScalingConfig
from ray.train import Result
from ray.train.xgboost import XGBoostTrainer

# Prepare your dataset

In [None]:
# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
# Split data into train and validation.
train_dataset, valid_dataset = dataset.train_test_split(
    test_size=0.3, shuffle=True, seed=536
)

# Define the function that schedule the distributed job

In [None]:
def train_xgboost(
    num_workers: int = 2, use_gpu: bool = False, num_boost_round: int = 20
) -> Result:
    config = {}
    callback = comet_ml.integration.ray.CometTrainLoggerCallback(config)

    trainer = XGBoostTrainer(
        scaling_config=ScalingConfig(
            # Number of workers to use for data parallelism.
            num_workers=num_workers,
            # Whether to use GPU acceleration. Set to True to schedule GPU workers.
            use_gpu=use_gpu,
        ),
        label_column="target",
        num_boost_round=num_boost_round,
        params={
            # XGBoost specific params (see the `xgboost.train` API reference)
            "objective": "binary:logistic",
            # uncomment this and set `use_gpu=True` to use GPU for training
            # "tree_method": "gpu_hist",
            "eval_metric": ["logloss", "error"],
            # Make the build reproducible
            "random_state": 536,
        },
        datasets={"train": train_dataset, "valid": valid_dataset},
        run_config=RunConfig(callbacks=[callback]),
    )
    result = trainer.fit()
    return result

# Train the model

Ray will wait indefinitely if we request more num_workers that the available resources, the code below ensure we never request more CPU than available locally.

In [None]:
ideal_num_workers = 2

available_local_cpu_count = os.cpu_count() - 1
num_workers = min(ideal_num_workers, available_local_cpu_count)

if num_workers < 1:
    num_workers = 1

train_xgboost(num_workers, use_gpu=False, num_boost_round=10)