# Deploying a custom R model behind a single model endpoint

This notebook runs through the steps to deploy a custom R model behind a single model SageMaker endpoint for online inference. 

The same custom container can be used for both training and inference, however this example covers an inference only container. 

## Inference container requirements & process

* For inference, SageMaker runs container as ```docker run image serve```. So we can use the serve argument to differentiate between training and inference if we have both in the same container
* For the inference container, we must have created a model which includes specifying the S3 location of the trained model, which must point to tar.gz file. This gets loaded into the /opt/ml/model directorty in the container
* The container serves requests by implementing /invocations and /ping endpoints on port 8080
* /ping should respond with 200 status code and empty body. This signals container is ready to accept inference requests. 

See more here - https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html

## Deployment Process

### 1. Build the custom R model image

First, we define the Docker image that implements the required endpoints and directory structure. 

In this example, the custom image uses an R based xgb model to predict species based on the Iris dataset. 

In the r_xgb_iris_image directory:
* Dockerfile defines the image
* deploy.R includes the logic to load and run data through the model
* endpoints.R implements the web server endpoints we need to provide for use with SageMaker. This uses the R plumber package. 
* build_and_push_docker.sh is a helper script to aid in publishing the image to Amazon Elastic Container Registry
* xgb.model is the pretrained model that we tar.gz and upload to S3 and then use to create a model. We load the model into the container in this example so that we can test the container locally but this is not required. When deployed with SageMaker, the model is loaded from S3 


In [None]:
!docker build -t r-iris-inference ./r_xgb_iris_image/

### 2. Launch the inference container

Next, we can test the container by running it locally. Here we map the local port 5000 to the container port 8080 that is serving our endpoints. 

We pass the serve argument to simulate how SageMaker runs the image for inference. 

In [None]:
!docker run -d --rm -p 5000:8080 r-iris-inference:latest serve

View the running container

In [None]:
!docker container list

You can stop the container when finished testing

In [None]:
!docker stop brave_bassi 

### 3. Load sample data

We load the iris dataset and extract just the features. We then convert to a list of features as this is how our invocation endpoint expects to receive data. 

In [None]:
import pandas as pd

In [None]:
column_names = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Label"]
iris = pd.read_csv(
    "s3://sagemaker-sample-files/datasets/tabular/iris/iris.data", names=column_names
)
iris_features = iris[["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]]
example_inputs = iris_features.values.tolist()

### 4. Test the local inference container

We run two tests:
1. Invoke /ping endpoint to check for HTTP 200 response
2. Invoke /invocations endpoint to check for predicted class response 

In [None]:
import requests

In [None]:
requests.get('http://localhost:5000/ping')

In [None]:
# the key "features" is important since we look for this value in our inference container
payload = {"features": example_inputs}
response = requests.post(f"http://localhost:5000/invocations", json=payload)

In [None]:
response.content

### 5. Deploy the inference container behind a SageMaker endpoint

Now we have confirmed locally that the inference container behaves as expected, we can deploy it behind a SageMaker endpoint. 
This involves several steps:
1. Deploy the custom inference image to ECR repo
2. tar and zip model artifacts (in this case xgb.model) and upload to S3
3. Create a SageMaker model that references the trained model artifact in S3
4. Create an endpoint configuration 
5. Create the endpoint

#### Deploy to ECR

In [None]:
# deploy the custom inference image to ECR. this can be done using the helper script, note ecr permissions are required 
# pass image name and tag as arguments 
! ./r_xgb_iris_image/build_and_push_docker.sh r-iris-inference latest

#### Upload model artifacts to S3

In [None]:
# upload artifacts to s3
! cd r_xgb_iris_image && tar czf xgb.tar.gz xgb.model

In [None]:
bucket_name = "<bucket-name>"
bucket_key = "models/iris/xgb.tar.gz"

In [None]:
import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file('r_xgb_iris_image/xgb.tar.gz', bucket_name, bucket_key)

#### Create the SageMaker model

In [None]:
import sagemaker
from sagemaker import Session

aws_region = boto3.Session().region_name
sagemaker_role = sagemaker.get_execution_role()
account_id = boto3.client('sts').get_caller_identity().get('Account')

In [None]:
sm_client = boto3.client("sagemaker", region_name=aws_region)

model_url = f"s3://{bucket_name}/{bucket_key}"

image_uri = f"{account_id}.dkr.ecr.{aws_region}.amazonaws.com/r-iris-inference:latest"
container = {"Image": image_uri, "ModelDataUrl": model_url, "Mode": "SingleModel"}
model_name = "r-iris-inference"
create_model_response = sm_client.create_model(
    ModelName=model_name, 
    ExecutionRoleArn=sagemaker_role, 
    PrimaryContainer=container
)

#### Create an endpoint configuration

In [None]:
endpoint_config_name = "r-iris-inference-endpoint-config"
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[{
        "InstanceType": "ml.m4.xlarge", # the best instance type will depend on your use case and model
        "InitialInstanceCount": 1,
        "ModelName": model_name,
        "VariantName": "AllTraffic",
    }],
)

#### Create endpoint

In [None]:
endpoint_name = "r-iris-endpoint"

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, 
    EndpointConfigName=endpoint_config_name
)

# create our waiter to let us know when the endpoint is in service
print("Waiting for {} endpoint to be in service...".format(endpoint_name))
waiter = sm_client.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=endpoint_name)

### 6. Test the endpoint

Once the endpoint has come into service, we can send some of our sample data for inference

In [None]:
import json

runtime_client = boto3.client('sagemaker-runtime')

payload = {"features": example_inputs}

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

if response['ResponseMetadata']['HTTPStatusCode'] == 200:
    print("Healthy")
else: 
    print("Not healthy")
    
print("Response: ", json.loads(response['Body'].read()))