# XGBoost ModelBuilder

This notebook was tested with the `conda_python3` kernel on an Amazon SageMaker notebook instance of type `m5`.

In [None]:
!pip install boto3 sagemaker -U --quiet

# SageMaker Model Builder experience

In the new experience, we have introduced a few new constructs. Here we will focus on the following: 

1. ModelBuilder
2. SchemaBuilder
3. InferenceSpec

In the following section, we will define these constructs and provide examples to elaborate on each one.

4.1 ModelBuilder:

ModelBuilder is a Python class that takes a framework model (such as XGBoost or PyTorch) or an Inference Spec (more details below) and converts them into a SageMaker deployable model. ModelBuilder provides a `build` function that generates the artifacts for deployment. The model artifact generated is specific to the model server, which is also customizable as one of the inputs.

```python
Class definition:

class ModelBuilder(
    model_path: str | None = '/tmp/sagemaker/model-builder/' + uuid.uuid1().hex,
    role_arn: str | None = None,
    sagemaker_session: Session | None = None,
    name: str | None = 'model-name-' + uuid.uuid1().hex,
    mode: Mode | None = Mode.SAGEMAKER_ENDPOINT,
    shared_libs: List[str] = lambda : [],
    dependencies: Dict[str, Any] | None = lambda : { "auto": False },
    env_vars: Dict[str, str] | None = lambda : {},
    log_level: int | None = logging.DEBUG,
    content_type: str | None = None,
    accept_type: str | None = None,
    s3_model_data_url: str | None = None,
    instance_type: str | None = "ml.c5.xlarge",
    schema_builder: str | None = None,
    model: Any | None = None,
    inference_spec: InferenceSpec = None,
    image_uri: str | None = None,
    model_server: str | None = None
)
```
Example:

The above class file provide all the options for customization. However to deploy the framework model, the model builder just expects model, input, output and the role. 

```python
model_builder = ModelBuilder(
    model=model,  # Pass in the actual model object. It's "predict" method will be invoked in the endpoint.
    schema_builder=SchemaBuilder(input, output), # Pass in a "SchemaBuilder" which will use the sample test input and output objects to infer the serialization needed.
    role_arn=role, # Pass in the role arn or update intelligent defaults.
    )
```

4.2 SchemaBuilder:

The SchemaBuilder enables you to define the input and output for your endpoint. It allows the SchemaBuilder to generate the corresponding marshalling functions for serializing and deserializing the input and output. For further details, please consult the notebook or refer to the video.

Class definition:
```python
class SchemaBuilder(
    sample_input: Any,
    sample_output: Any,
    input_translator: CustomPayloadTranslator = None,
    output_translator: CustomPayloadTranslator = None
)
```
Example:

The CustomPayloadTranslator class provides all the options for customization. However, for [common inference data format](https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html), you can just provide the sample input/output for the SchemaBuilder.
```python
input = "How is the demo going?"
output = "Comment la démo va-t-elle?"
schema = SchemaBuilder(input, output)
```

4.3 InferenceSpec

In the case you want to specify custom function to load and invoke the model instead of the framework model function, then you can pass the inference spec with your implementation in `load` and `invoke` function. 

class definition:
```python
class InferenceSpec(abc.ABC):
    @abc.abstractmethod
    def load(self, model_dir: str):
        pass

    @abc.abstractmethod
    def invoke(self, input_object: object, model: object):
        pass
```
Example:
```python
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        return pipeline("translation_en_to_fr", model="t5-small")
        
    def invoke(self, input, model):
        return model(input)
   
inf_spec = MyInferenceSpec()

```

In this example, we are using ModelBuilder to deploy an XGBoost model directly. You can use `Mode` to switch between local testing and deploying to a SageMaker Endpoint. 

### Prerequisites: Local model training and testing

We first use this notebook to train an XGBoost model and test the model inference locally.


In [None]:
from sagemaker import get_execution_role, Session, image_uris
import boto3

sagemaker_session = Session()
region = boto3.Session().region_name

# get execution role
# please use execution role if you are using notebook instance or update the role arn if you are using a different role
execution_role = get_execution_role() if get_execution_role() is not None else "your-role-arn"

#### XGBoost model training


In [None]:
#Install required packages
!sudo yum install -y aws-cli tar
!pip3 install xgboost==1.7.6 scikit-learn boto3

In [None]:
# clean up any working directories
!rm -rf ./working_dir/models

# Setup a working directory for the demo
model_dir = "./working_dir/models/xgboost_demo"
!mkdir -p {model_dir}

In [None]:
# Build the model (locally or in sagemaker) & store it in the working directory
import json
import pathlib
import shutil
import numpy as np
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


# load data
dataset = loadtxt('pima-indians-diabetes.data.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

# Train the model
model = XGBClassifier()
model.fit(X_train, y_train)
model.save_model(model_dir + "/my_model.xgb")

# Make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]



### SageMaker ModelBuilder: Local deployment

Now we will use SageMaker ModelBuilder class to prepare the model for local and remote deployment.

In [None]:
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.mode.function_pointers import Mode


model_builder_local = ModelBuilder(
    model=model,  # Pass in the actual model object. It's "predict" method will be invoked in the endpoint.
    schema_builder=SchemaBuilder(X_test, y_pred), # Pass in a "SchemaBuilder" which will use the sample test input and output objects to infer the serialization needed.
    role_arn=execution_role, # Pass in the role arn or update intelligent defaults.
    mode=Mode.LOCAL_CONTAINER, # the model will be deployed locally. 
    dependencies={
        "custom": [
            "boto3<1.29.0",
            "numpy", 
            "pandas",
            "scipy",

        ],
    }
)



In [None]:
# Build the model according to the model server specification and save it to as files in the working directory
xgb_local_builder = model_builder_local.build()

In [None]:
# deploy is an existing method in the model object, however we have enabled live loggging for easier debugging.
# note: all the serialization and deserialization is handled by the model builder.
predictor_local = xgb_local_builder.deploy(
    # instance_type='ml.c5.xlarge',
    # initial_instance_count=1
)

In [None]:
# Make prediction for test data. 
predictor_local.predict(X_test)

### SageMaker ModelBuilder: Deploy to a SageMaker Endpoint

Now we have tested the model prediction locally, we can continue to deploy the model to a SageMaker endpoint.

In [None]:
model_builder = ModelBuilder(
    model=model,  # Pass in the actual model object. It's "predict" method will be invoked in the endpoint.
    schema_builder=SchemaBuilder(X_test, y_pred), # Pass in a "SchemaBuilder" which will use the sample test input and output objects to infer the serialization needed.
    role_arn=execution_role, # Pass in the role arn or update intelligent defaults.
    mode=Mode.SAGEMAKER_ENDPOINT,
    dependencies={
        "custom": [
            "boto3<1.29.0",
            "numpy",
            "pandas",
            "scipy",
        ],
    }

)

In [None]:
# Build the model according to the model server specification and save it to as files in the working directory
xgb_builder = model_builder.build()

In [None]:
# deploy is an existing method in the model object, however we have enabled live loggging for easier debugging.
# note: all the serialization and deserialization is handled by the model builder.
predictor = xgb_builder.deploy(
    instance_type='ml.c5.xlarge',
    initial_instance_count=1
)

In [None]:
# Make prediction for test data. 
predictor.predict(X_test)

## Clean up

In [None]:
predictor.delete_model()
predictor.delete_endpoint()
predictor_local.delete_predictor()