## Inference Recommender

You can utilize Locust for quick tests to scale up to high TPS, but if you would like to get a holistic load test across various instances and hyperparameter combinations Inference Recommender simplifies this process. For Inference Recommender we give two objects: the model tarball and the sample payload tarball. Please also you have your inference script in this same directory if you have one.

### Create and Register Model

In [None]:
import time
model_package_group_name = "xgboost-multiple-models" + str(round(time.time()))

In [None]:
model_url = 'Add your model URL here'

In [None]:
import sagemaker
role = sagemaker.get_execution_role()
session = sagemaker.Session()

In [None]:
from sagemaker.model import Model
from sagemaker import image_uris

model = Model(
    model_data=model_url,
    entry_point="inference.py",
    role=role,
    image_uri = sagemaker.image_uris.retrieve(framework="xgboost", region="us-east-1", version="1.0-1", py_version="py3", 
                                              image_scope='inference'),
    sagemaker_session=session
    )

#### Register Model (Optional Step)

Can catalog your models in Model Registry

In [None]:
model_package = model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    model_package_group_name=model_package_group_name,
    image_uri=model.image_uri,
    approval_status="Approved",
    framework="XGBOOST"
)

### Upload Payload to S3

In [None]:
# replace with your sample payload
payload = '{"input": ".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0", "models": ["xgboost-model-0", "xgboost-model-93", "xgboost-model-69", "xgboost-model-50", "xgboost-model-51", "xgboost-model-52", "xgboost-model-53", "xgboost-model-54", "xgboost-model-55"]}'

In [None]:
with open("payload.json", "w") as outfile:
    outfile.write(payload)

In [None]:
payload_archive_name = "payload.tar.gz"

In [None]:
!tar -cvzf {payload_archive_name} payload.json

In [None]:
sample_payload_url = session.upload_data(
    path=payload_archive_name, key_prefix="xgboost-payload"
)

In [None]:
sample_payload_url = 's3://sagemaker-us-east-1-474422712127/xgboost-payload/payload.tar.gz'

In [None]:
sample_payload_url

### Run a Default IR Job

In [None]:
# Can grab the job name when this API starts
model_package.right_size(
    sample_payload_url=sample_payload_url,
    supported_content_types=["application/json"],
    supported_instance_types=["ml.c5.2xlarge", "ml.c5.4xlarge", "ml.c5.9xlarge", "ml.c5.18xlarge", "ml.r5d.24xlarge",
                             "ml.r5d.2xlarge", "ml.r5d.4xlarge", "ml.m5d.2xlarge", "ml.m5d.4xlarge", "ml.m5d.24xlarge"],
    framework="XGBOOST",
)

In [None]:
import boto3
sm_client = boto3.client(service_name='sagemaker')

In [None]:
job_name = 'Add your job name here' #this will be listed when you kick off a job with the right_size call
inference_recommendation_res = sm_client.describe_inference_recommendations_job(JobName=job_name)
print(inference_recommendation_res['InferenceRecommendations'])

In [None]:
data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommendation_res['InferenceRecommendations']
]

In [None]:
import pandas as pd
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df

### Advanced Job

In [1]:
from sagemaker.parameter import CategoricalParameter 
from sagemaker.inference_recommender.inference_recommender_mixin import (  
    Phase,  
    ModelLatencyThreshold 
) 

# Adjust this as needed
hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.c5.2xlarge", "ml.c5.4xlarge"]), 
        'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3']),
        "SAGEMAKER_NUM_MODEL_WORKERS": CategoricalParameter(['2', '3'])
    } 
] 

phases = [ 
    Phase(duration_in_seconds=120, initial_number_of_users=2, spawn_rate=2), 
    Phase(duration_in_seconds=120, initial_number_of_users=6, spawn_rate=2) 
] 

model_latency_thresholds = [ 
    ModelLatencyThreshold(percentile="P95", value_in_milliseconds=800) 
]

In [None]:
model_package.right_size( 
    sample_payload_url=sample_payload_url, 
    supported_content_types=["application/json"], 
    framework="XGBoost", 
    job_duration_in_seconds=3600, 
    hyperparameter_ranges=hyperparameter_ranges, 
    phases=phases, # TrafficPattern 
    max_invocations=100, # StoppingConditions 
    model_latency_thresholds=model_latency_thresholds
)

In [None]:
# Enter default or advanced job name here, this value is emitted at the top of the right size API call
job_name = 'Enter your advanced job name here'
inference_recommendation_res = sm_client.describe_inference_recommendations_job(JobName=job_name)


data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommendation_res['InferenceRecommendations']
]

In [None]:
import pandas as pd
df = pd.DataFrame(data)
dropFilter = df.filter(["VariantName"])
df.drop(dropFilter, inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head()