## Use boto3 to train and update an existing SageMaker Endpoint with a newly trained Scikit-Learn Random Forest Model

In this notebook we show how to use Amazon SageMaker to develop, train, and deploy a Scikit-Learn based ML model (Random Forest). Then we show how to update an existing SageMaker Endpoint with a newly trained model, <b>while there is no availability loss</b>.

This is done using `boto3`, A low-level client representing Amazon SageMaker Service. Using `boto3` is highly useful when using SageMaker Services from other resources than notebooks, such as Lambda functions, Airflow, or Jenkins.

You'll execute the following steps using `boto3`:
 - Prepare the training/testing data and write a Script Mode script.
 - Launch the 1st training job.
 - Deploy the 1st model to a SageMaker Endpoint, and make few inference requests.
 - Launch the 2nd training job.
 - Update the SageMaker Endpoint with the 2nd model, and make few inference requests.
 - Optional cleanup.


More info on boto3 can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#client).

More info on Scikit-Learn can be found on the [Scikit-Learn documentation page](https://scikit-learn.org/stable/index.html).

We use the [California housing dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html), present in Scikit-Learn.

More info on the dataset:

This dataset was obtained from the `StatLib` repository. http://lib.stat.cmu.edu/datasets/

The target variable is the median house value for California districts.

This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

It can be downloaded/loaded using the `sklearn.datasets.fetch_california_housing` function.
 
 
**This sample is provided for demonstration purposes, make sure to conduct appropriate testing if deriving this code for your own use-cases!**

## Import Python libraries

Now you import Python libraries like `sklearn`, `pandas`, `numpy`, and `boto3`.

We also import `sagemaker` which is the high level SageMaker Python SDK. This is for the purpose of getting the default SageMaker bucket and execution role. Apart from that, all SageMaker functionality will be demonstrated using `boto3`.

In [None]:
import datetime
import time
import tarfile

import boto3
import pandas as pd
import numpy as np
from sagemaker import get_execution_role
import sagemaker
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing


sm_boto3 = boto3.client("sagemaker")

sess = sagemaker.Session()

region = sess.boto_session.region_name

bucket = sess.default_bucket()  # this could also be a hard-coded bucket name

print("Using bucket " + bucket)

## Prepare data
We load a dataset from `sklearn`, split it and send it to S3

In [None]:
# we use the California housing dataset
data = fetch_california_housing()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.25, random_state=42
)

trainX = pd.DataFrame(X_train, columns=data.feature_names)
trainX["target"] = y_train

testX = pd.DataFrame(X_test, columns=data.feature_names)
testX["target"] = y_test

In [None]:
trainX.head()

In [None]:
trainX.to_csv("california_train.csv")
testX.to_csv("california_test.csv")

In [None]:
# send data to S3. SageMaker will take training data from s3
trainpath = sess.upload_data(
    path="california_train.csv", bucket=bucket, key_prefix="sagemaker/sklearn-california"
)

testpath = sess.upload_data(
    path="california_test.csv", bucket=bucket, key_prefix="sagemaker/sklearn-california"
)

## Writing a *Script Mode* script
The below script contains both training and inference functionality and can run both in SageMaker Training hardware or locally (desktop, SageMaker notebook, on premise, etc). Detailed guidance can be found on the [Scikit-learn SageMaker Python SDK documentation page](https://sagemaker.readthedocs.io/en/stable/using_sklearn.html#preparing-the-scikit-learn-training-script).

In [None]:
%%writefile script.py

import argparse
import joblib
import os

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor


# inference functions ---------------
def model_fn(model_dir):
    clf = joblib.load(os.path.join(model_dir, "model.joblib"))
    return clf


if __name__ == "__main__":

    print("extracting arguments")
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    # to simplify the demo we don't use all sklearn RandomForest hyperparameters
    parser.add_argument("--n-estimators", type=int, default=10)
    parser.add_argument("--min-samples-leaf", type=int, default=3)

    # Data, model, and output directories
    parser.add_argument("--model-dir", type=str, default=os.environ.get("SM_MODEL_DIR"))
    parser.add_argument("--train", type=str, default=os.environ.get("SM_CHANNEL_TRAIN"))
    parser.add_argument("--test", type=str, default=os.environ.get("SM_CHANNEL_TEST"))
    parser.add_argument("--train-file", type=str, default="california_train.csv")
    parser.add_argument("--test-file", type=str, default="california_test.csv")
    parser.add_argument(
        "--features", type=str
    )  # in this script we ask user to explicitly name features
    parser.add_argument(
        "--target", type=str
    )  # in this script we ask user to explicitly name the target

    args, _ = parser.parse_known_args()

    print("reading data")
    train_df = pd.read_csv(os.path.join(args.train, args.train_file))
    test_df = pd.read_csv(os.path.join(args.test, args.test_file))

    print("building training and testing datasets")
    X_train = train_df[args.features.split()]
    X_test = test_df[args.features.split()]
    y_train = train_df[args.target]
    y_test = test_df[args.target]

    # train
    print("training model")
    model = RandomForestRegressor(
        n_estimators=args.n_estimators, min_samples_leaf=args.min_samples_leaf, n_jobs=-1
    )

    model.fit(X_train, y_train)

    # print abs error
    print("validating model")
    abs_err = np.abs(model.predict(X_test) - y_test)

    # print couple perf metrics
    for q in [10, 50, 90]:
        print("AE-at-" + str(q) + "th-percentile: " + str(np.percentile(a=abs_err, q=q)))

    # persist model
    path = os.path.join(args.model_dir, "model.joblib")
    joblib.dump(model, path)
    print("model persisted at " + path)
    print(args.min_samples_leaf)

## SageMaker Training

### Launching a training with `boto3`
`boto3` is more verbose yet gives more visibility in the low-level details of Amazon SageMaker

In [None]:
# first compress the code and send to S3

source = "source.tar.gz"
project = "scikitlearn-california-train-from-boto3"

tar = tarfile.open(source, "w:gz")
tar.add("script.py")
tar.close()

s3 = boto3.client("s3")
s3.upload_file(source, bucket, project + "/" + source)

When using `boto3` to launch a training job, we must explicitly point to a docker image.

In [None]:
from sagemaker import image_uris

FRAMEWORK_VERSION = "1.0-1"

training_image = image_uris.retrieve(
    framework="sklearn",
    region=region,
    version=FRAMEWORK_VERSION,
    py_version="py3",
    instance_type="ml.c5.xlarge",
)
print(training_image)

## Launch the 1st training job

This will start a model training job. After training completes, Amazon SageMaker saves the resulting model artifacts to an Amazon S3 location that you specify.

If you choose to host your model using Amazon SageMaker hosting services, you can use the resulting model artifacts as part of the model. You can also use the artifacts in a machine learning service other than Amazon SageMaker, provided that you know how to use them for inference.

More info on `create_training_job` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job).

In [None]:
training_job_1_name = "sklearn-boto3-1-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

training_job_1_response = sm_boto3.create_training_job(
    TrainingJobName=training_job_1_name,
    HyperParameters={
        "n_estimators": "300",
        "min_samples_leaf": "3",
        "sagemaker_program": "script.py",
        "features": "MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude",
        "target": "target",
        "sagemaker_submit_directory": "s3://" + bucket + "/" + project + "/" + source,
    },
    AlgorithmSpecification={
        "TrainingImage": training_image,
        "TrainingInputMode": "File",
        "MetricDefinitions": [
            {"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"},
        ],
    },
    RoleArn=get_execution_role(),
    InputDataConfig=[
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": trainpath,
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
        },
        {
            "ChannelName": "test",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": testpath,
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
        },
    ],
    OutputDataConfig={"S3OutputPath": "s3://" + bucket + "/sagemaker-sklearn-artifact/"},
    ResourceConfig={"InstanceType": "ml.c5.xlarge", "InstanceCount": 1, "VolumeSizeInGB": 10},
    StoppingCondition={"MaxRuntimeInSeconds": 86400},
    EnableNetworkIsolation=False,
)

training_job_1_response

## Wait for the 1st training job to end

In [None]:
import boto3
import time

client = boto3.client("sagemaker")

training_job_1_details = client.describe_training_job(TrainingJobName=training_job_1_name)

while training_job_1_details["TrainingJobStatus"] == "InProgress":
    training_job_1_details = client.describe_training_job(TrainingJobName=training_job_1_name)
    print(training_job_1_details["TrainingJobStatus"])
    time.sleep(15)

training_job_1_details

## Create a Model for the 1st training job

This will create a model in Amazon SageMaker. In the request, you name the model and describe a primary container. For the primary container, you specify the Docker image that contains inference code, artifacts (from prior training), and a custom environment map that the inference code uses when you deploy the model for predictions.

More info on `create_model` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model).

In [None]:
model_1_name = "sklearn-model-1-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_model_1_response = client.create_model(
    ModelName=model_1_name,
    PrimaryContainer={
        "Image": training_job_1_details["AlgorithmSpecification"]["TrainingImage"],
        "Mode": "SingleModel",
        "ModelDataUrl": training_job_1_details["ModelArtifacts"]["S3ModelArtifacts"],
        "Environment": {
            "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
            "SAGEMAKER_PROGRAM": training_job_1_details["HyperParameters"]["sagemaker_program"],
            "SAGEMAKER_REGION": region,
            "SAGEMAKER_SUBMIT_DIRECTORY": training_job_1_details["HyperParameters"][
                "sagemaker_submit_directory"
            ],
        },
    },
    ExecutionRoleArn=get_execution_role(),
)

create_model_1_response

Now you'll describe the model that you created using the `describe_model` API.

More info on `describe_model` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_model).

In [None]:
client.describe_model(ModelName=model_1_name)

## Create an Endpoint Config from 1st model

This will create an endpoint configuration that Amazon SageMaker hosting services uses to deploy models. In the configuration, you identify one or more models, created using the `CreateModel` API, to deploy and the resources that you want Amazon SageMaker to provision. Then you call the `CreateEndpoint` API.

More info on `create_endpoint_config` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config).

In [None]:
endpoint_config_1_name = "sklearn-endpoint-config-1-" + datetime.datetime.now().strftime(
    "%Y-%m-%d-%H-%M-%S"
)

endpoint_config_1_response = client.create_endpoint_config(
    EndpointConfigName=endpoint_config_1_name,
    ProductionVariants=[
        {
            "VariantName": "AllTrafficVariant",
            "ModelName": model_1_name,
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.large",
            "InitialVariantWeight": 1,
        },
    ],
)

endpoint_config_1_response

## Deploy the 1st Endpoint Config to a real-time endpoint

This will create an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models. Note that you have already created the endpoint configuration with the `CreateEndpointConfig` API in the previous step.

More info on `create_endpoint` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint).

In [None]:
endpoint_name = "sklearn-endpoint-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_1_name,
)

create_endpoint_response

## Wait for Endpoint to be ready

In [None]:
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)

describe_endpoint_response

## Invoke Endpoint with `boto3`

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint.

For an overview of Amazon SageMaker, [see How It Works](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works.html).

Amazon SageMaker strips all POST headers except those supported by the API. Amazon SageMaker might add additional headers. You should not rely on the behavior of headers outside those enumerated in the request syntax.

Calls to `InvokeEndpoint` are authenticated by using AWS Signature Version 4. For information, see Authenticating Requests (AWS Signature Version 4) in the Amazon S3 API Reference.

A customer's model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.

More info on `invoke_endpoint` can be found on the [Boto3 SageMakerRuntime documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint).

In [None]:
runtime = boto3.client("sagemaker-runtime")

In [None]:
# csv serialization
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=testX[data.feature_names].to_csv(header=False, index=False).encode("utf-8"),
    ContentType="text/csv",
)

print(response["Body"].read())

## Launch the 2nd training job

This will start the 2nd model training job. After training completes, you'll create a Model, an Endpoint Configuration, and update the existing Endpoint with the newly created Endpoint Configuration.

In [None]:
training_job_2_name = "sklearn-boto3-2-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

training_job_2_response = sm_boto3.create_training_job(
    TrainingJobName=training_job_2_name,
    HyperParameters={
        "n_estimators": "300",
        "min_samples_leaf": "3",
        "sagemaker_program": "script.py",
        "features": "MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude",
        "target": "target",
        "sagemaker_submit_directory": "s3://" + bucket + "/" + project + "/" + source,
    },
    AlgorithmSpecification={
        "TrainingImage": training_image,
        "TrainingInputMode": "File",
        "MetricDefinitions": [
            {"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"},
        ],
    },
    RoleArn=get_execution_role(),
    InputDataConfig=[
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": trainpath,
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
        },
        {
            "ChannelName": "test",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": testpath,
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
        },
    ],
    OutputDataConfig={"S3OutputPath": "s3://" + bucket + "/sagemaker-sklearn-artifact/"},
    ResourceConfig={"InstanceType": "ml.c5.xlarge", "InstanceCount": 1, "VolumeSizeInGB": 10},
    StoppingCondition={"MaxRuntimeInSeconds": 86400},
    EnableNetworkIsolation=False,
)

training_job_2_response

## Wait for the 2nd training job to end

In [None]:
import boto3
import time

client = boto3.client("sagemaker")

training_job_2_details = client.describe_training_job(TrainingJobName=training_job_2_name)

while training_job_2_details["TrainingJobStatus"] == "InProgress":
    training_job_2_details = client.describe_training_job(TrainingJobName=training_job_2_name)
    print(training_job_2_details["TrainingJobStatus"])
    time.sleep(15)

training_job_2_details

## Create a Model for the 2nd training job

In [None]:
model_2_name = "sklearn-model-2-" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

create_model_2_response = client.create_model(
    ModelName=model_2_name,
    PrimaryContainer={
        "Image": training_job_2_details["AlgorithmSpecification"]["TrainingImage"],
        "Mode": "SingleModel",
        "ModelDataUrl": training_job_2_details["ModelArtifacts"]["S3ModelArtifacts"],
        "Environment": {
            "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
            "SAGEMAKER_PROGRAM": training_job_2_details["HyperParameters"]["sagemaker_program"],
            "SAGEMAKER_REGION": region,
            "SAGEMAKER_SUBMIT_DIRECTORY": training_job_2_details["HyperParameters"][
                "sagemaker_submit_directory"
            ],
        },
    },
    ExecutionRoleArn=get_execution_role(),
)

create_model_2_response

In [None]:
client.describe_model(ModelName=model_2_name)

## Create an Endpoint Config from 2nd model

In [None]:
endpoint_config_2_name = "sklearn-endpoint-config-2-" + datetime.datetime.now().strftime(
    "%Y-%m-%d-%H-%M-%S"
)

endpoint_config_2_response = client.create_endpoint_config(
    EndpointConfigName=endpoint_config_2_name,
    ProductionVariants=[
        {
            "VariantName": "AllTrafficVariant",
            "ModelName": model_2_name,
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c5.large",
            "InitialVariantWeight": 1,
        },
    ],
)

endpoint_config_2_response

## Update the real-time endpoint with the 2nd Endpoint Config

This will deploy the new `EndpointConfig` specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous `EndpointConfig` (there is no availability loss).

More info on `update_endpoint` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.update_endpoint).

In [None]:
update_endpoint_response = client.update_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_2_name
)

update_endpoint_response

## Wait for Endpoint to be ready

Navigating to the SageMaker Endpoints, in `SageMaker Components and registries` tab, you'll see the endpoint in `Updating` status.

In [None]:
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Updating":
    describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)

describe_endpoint_response

## Invoke Endpoint with `boto3`

In [None]:
runtime = boto3.client("sagemaker-runtime")

In [None]:
# csv serialization
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=testX[data.feature_names].to_csv(header=False, index=False).encode("utf-8"),
    ContentType="text/csv",
)

print(response["Body"].read())

## Clean up

Endpoints should be deleted when no longer in use, since (per the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/)) they're billed by time deployed.

In [None]:
sm_boto3.delete_endpoint(EndpointName=endpoint_name)