<img src="https://cdn.comet.ml/img/notebook_logo.png">

[Comet](https://www.comet.com/site/products/ml-experiment-tracking/?utm_campaign=seldon-xgboost&utm_medium=colab) is an MLOps Platform that is designed to help Data Scientists and Teams build better models faster! Comet provides tooling to track, Explain, Manage, and Monitor your models in a single place! It works with Jupyter Notebooks and Scripts and most importantly it's 100% free to get started!

[Seldon](https://www.seldon.io/solutions/open-source-projects/core) is an open source platform to deploy your machine learning models on Kubernetes at massive scale.

Get a preview for what's to come. Check out a completed experiment created from this notebook [here](https://www.comet.com/examples/comet-example-xgboost-seldon/bbca733c72b346809cd1a0aaccdc9a11).
 
You will need to install [S2I](https://github.com/openshift/source-to-image) in order to complete this example.

First, we make sure we have all dependencies installed:

In [None]:
%pip install comet_ml pandas pip scikit-learn seldon_core xgboost graphviz

## Train locally

In [None]:
#### Import Comet ####
from comet_ml import Experiment, init

#### Import Dependencies ####
import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd
import os

Let's create a Comet Experiment so we can track both XGBoost hyperparameters, metrics, and save the trained model.

In [None]:
init()

experiment = Experiment(project_name="comet-example-xgboost-seldon")

Then load and prepare the data:

In [None]:
#### Load and configure boston housing dataset ####
california = fetch_california_housing()
data = pd.DataFrame(california.data)
data.columns = california.feature_names
data["Price"] = california.target
X, y = data.iloc[:, :-1], data.iloc[:, -1]

#### Split data into train and test sets ####
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

Then train the model:

In [None]:
#### Define hyperparameters for model ####
param = {
    "objective": "reg:squarederror",
    "colsample_bytree": 0.3,
    "learning_rate": 0.1,
    "max_depth": 5,
    "alpha": 10,
    "n_estimators": 50,
}

#### Initialize XGBoost Regressor ####
xg_reg = xgb.XGBRegressor(eval_metric="rmse", **param)

#### Train model ####
xg_reg.fit(
    X_train,
    y_train,
    eval_set=[(X_train, y_train), (X_test, y_test)],
)

Finally, save the model to Comet.ml:

In [None]:
os.makedirs("output", exist_ok=True)

xg_reg.save_model("output/0001.model")

model_name = "XGBoost Model (California)"

experiment.log_model(model_name, "output/0001.model")

experiment.end()

In [None]:
experiment.display(tab="assets")

Now you have a choice; **do one of the following**:


1. You can register the experiment model via the Comet User Interface:

You can check out the experiment assets above. To register an experiment model as a registry model, click on the `+ Register` link, then click `Register new model`, and click the registered model to exactly the name and model number. Also, note the name of your workspace (usually your comet id). Refers to [the documentation](https://www.comet.ml/docs/user-interface/models/) for more information.

2. You can register the experiment model via the following code:

**BEGINNING OF OPTIONAL CODE**

To put an experiment model into the workspace registry, we will use the following helper function:

In [None]:
def register_model(experiment_id, model_name, registry_name):
    from comet_ml import API

    api = API()
    api_experiment = api.get_experiment_by_key(experiment_id)

    try:
        existing_models = api.get_registry_model_versions(
            workspace=api_experiment.workspace, registry_name=registry_name
        )
        max_model_version = max(existing_models)

        new_model_version = max_model_version.split(".")
        new_model_version[0] = str(int(new_model_version[0]) + 1)
        new_model_version = ".".join(new_model_version)
    except Exception:
        new_model_version = "1.0.0"

    api_experiment.register_model(
        model_name, registry_name=registry_name, version=new_model_version
    )

    return api_experiment.workspace, new_model_version

We need to pass the Experiment ID, the name of the model in the Experiment and the standardized name in the model registry:

In [None]:
WORKSPACE, registered_version = register_model(
    experiment.id, model_name, "xgboost-model-california"
)

Finally, we see that the workspace registry has the model, albeit by a standardized name, 'xgboost-model-boston':

In [None]:
from comet_ml import API

api = API()
api.get_registry_model_names(WORKSPACE)

**END OF OPTIONAL CODE**

## Download the Model

Now that we have a trained model logged to Comet, let's see how to retrieve it and wrap it with Seldon.

To retrieve the model, you can use the following command:

In [None]:
import sys

! comet models download \
    --workspace "$WORKSPACE" \
    --model-name "xgboost-model-california" \
    --model-version "$registered_version"

Then we need to define few files for building Seldon compatible Docker container, first the Model Python file:

In [None]:
%%writefile MyModel.py
import xgboost as xgb
import numpy as np
from typing import Dict, List, Union, Iterable


class MyModel:

    def __init__(self):
        """
        Add any initialization parameters.
        These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
        """
        self._model = xgb.Booster(model_file="model/0001.model")

    def predict(
        self, X: np.ndarray, names: Iterable[str], meta: Dict = None
    ) -> Union[np.ndarray, List, str, bytes]:
        """
        Return a prediction.

        Parameters
        ----------
        X : array-like
        feature_names : array of feature names (optional)
        """
    
        dmatrix = xgb.DMatrix(X)
        result: np.ndarray = self._model.predict(dmatrix)
        return result


Checking the syntax of the model file:

In [None]:
import sys

!{sys.executable} MyModel.py

Then the Python dependencies:

In [None]:
%%writefile requirements.txt
xgboost
pip
seldon_core

And finally the definition file for Seldon:

In [None]:
%%bash
mkdir -p .s2i

In [None]:
%%writefile .s2i/environment
MODEL_NAME=MyModel
API_TYPE=REST
SERVICE_TYPE=MODEL
PERSISTENCE=0

## Build the Docker image

The recommended way of building Seldon images is to use [s2i](https://github.com/openshift/source-to-image) with official ready-to-use definition images. Please refer to [the Seldon documentation](https://docs.seldon.io/projects/seldon-core/en/latest/python/python_wrapping_s2i.html) for more information.

In [None]:
!s2i build . seldonio/seldon-core-s2i-python3:1.16.0-dev comet_ml/xgboost_seldon:0.1

## Test locally

Once the Docker image has been built, we can start it locally and test it:

In [None]:
!docker run --name "xgboost_predictor" -d --rm -p 9000:9000 comet_ml/xgboost_seldon:0.1

Send some random features that conform to the contract:

In [None]:
!curl -X POST http://localhost:9000/api/v1.0/predictions -H 'Content-Type: application/json' -d '{"data": {"names": ["message"], "ndarray": [[3.7917, 40.0, 4.959798994974874, 1.0301507537688441, 1039.0, 2.6105527638190953, 38.24, -122.64]]}}'

In [None]:
!docker rm xgboost_predictor --force

## Push to production

Once you validated locally that your model is correctly predicting, you need to push your Docker image to a Kubernetes Cluster where Seldon is installed.

Installing and configuring Seldon is out of the scope of this notebook but you can, you can refer to the [Seldon-Core installation page](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/install.html).

Once your Kubernetes cluster is ready, you can follow one of the [cloud-specific example notebooks](https://docs.seldon.io/projects/seldon-core/en/latest/examples/notebooks.html#cloud-specific-examples) to learn how to push the built Docker image and deploy it to your cluster.