# [Model Partner] Package a machine learning model for listing on Vulcan
* conda_pytorch_p310

The following diagram provides an overview of the ML model packaging process. In the diagram  step 1 you will store model artifacts and serving/scoring logic. In step 2 you create and push a container to ECR that is used to host your model on SageMaker which performs inference, and returns the prediction. In step 3 you validate the container can succesfully host your model on SageMaker. This notebook assumes step 1 to 3 are complete.


**Step 4** you will learn how to package the ML model into a Model Package. In **step 5** you will validate this ML model package by deploying it with Amazon SageMaker. In **step 6** you will learn about resources that guide you on how to list the ML model in AWS Marketplace.

<img src="images/ml-model-publishing-workflow.png"/>


**Table of contents**
1. [Step 4 - Create an ML Model Package](#step4):
    1. [Step 4.1 Define parameters](step41)
    1. [Step 4.1 Create Model Package](step42)
2. [Step 5 - Validate model in Amazon SageMaker environment](#step5): 
    1. [Step 5.1 Validate Real-time inference via Amazon SageMaker Endpoint](#step51)
7. [Step 6 - List ML model on AWS Marketplace](#step6)

## AutoReload

In [1]:
%load_ext autoreload
%autoreload 2

## 0. Install packages

In [2]:
install_needed = True  # should only be True once
# install_needed = False

In [3]:
%%bash
#!/bin/bash

DAEMON_PATH="/etc/docker"
MEMORY_SIZE=10G

FLAG=$(cat $DAEMON_PATH/daemon.json | jq 'has("data-root")')
# echo $FLAG

if [ "$FLAG" == true ]; then
    echo "Already revised"
else
    echo "Add data-root and default-shm-size=$MEMORY_SIZE"
    sudo cp $DAEMON_PATH/daemon.json $DAEMON_PATH/daemon.json.bak
    sudo cat $DAEMON_PATH/daemon.json.bak | jq '. += {"data-root":"/home/ec2-user/SageMaker/.container/docker","default-shm-size":"'$MEMORY_SIZE'"}' | sudo tee $DAEMON_PATH/daemon.json > /dev/null
    sudo service docker restart
    echo "Docker Restart"
fi

Already revised


In [4]:
import sys
import IPython

if install_needed:
    print("installing deps and restarting kernel")
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install -U boto3
    !{sys.executable} -m pip install -U sagemaker
    !{sys.executable} -m pip uninstall pycodestyle -y

    IPython.Application.instance().kernel.do_shutdown(True)

installing deps and restarting kernel
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting boto3
  Using cached boto3-1.26.162-py3-none-any.whl (135 kB)
Collecting botocore<1.30.0,>=1.29.162 (from boto3)
  Using cached botocore-1.29.162-py3-none-any.whl (11.0 MB)
Installing collected packages: botocore, boto3
  Attempting uninstall: botocore
    Found existing installation: botocore 1.29.150
    Uninstalling botocore-1.29.150:
      Successfully uninstalled botocore-1.29.150
  Attempting uninstall: boto3
    Found existing installation: boto3 1.26.150
    Uninstalling boto3-1.26.150:
      Successfully uninstalled boto3-1.26.150
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.27.150 requires botocore==1.29.150, but you have

## 1. Create an ML Model Artifact

### 1.1 model loading 

In [2]:
strPrefix = "triton-ncf"
strModelName = "ncf_food_model"
strTrainedModelDir = "./custom-model"
strModelServingFolder = "triton-docker-serve-pt"

In [3]:
from src.inference import model_fn

* 만약 prediction에 customization이 필요하다면, "./src/model.py"의 class NCF(nn.Module)의 forward 펑션 수정할 것 

In [4]:
ncf_food_model = model_fn(strTrainedModelDir)

######## Staring model_fn() ###############
--> model_dir : ./custom-model
model_config_path: :  /home/ec2-user/SageMaker/aws-llm-serve/jumpstart-onboarding/src/model_config.json
--> model network is loaded
model_file_path: :  ./custom-model/NeuMF-end.pth
####### Model is loaded #########


In [5]:
ncf_food_model

NCF(
  (embed_user_GMF): Embedding(6040, 32)
  (embed_item_GMF): Embedding(3706, 32)
  (embed_user_MLP): Embedding(6040, 128)
  (embed_item_MLP): Embedding(3706, 128)
  (linear): Linear(in_features=4, out_features=4, bias=True)
  (MLP_layers): Sequential(
    (0): Dropout(p=0.0, inplace=False)
    (1): Linear(in_features=256, out_features=128, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.0, inplace=False)
    (4): Linear(in_features=128, out_features=64, bias=True)
    (5): ReLU()
    (6): Dropout(p=0.0, inplace=False)
    (7): Linear(in_features=64, out_features=32, bias=True)
    (8): ReLU()
  )
  (predict_layer): Linear(in_features=64, out_features=1, bias=True)
)

### 1.2. Conversion to torchscript 

In [6]:
import torch
import numpy as np

In [7]:
def trace_model(mode, device, model, dummy_inputs, trace_model_name):

    model = model.eval()
    model.to(device)

    if mode == 'trace' : IR_model = torch.jit.trace(model, dummy_inputs)
    elif mode == 'script': IR_model = torch.jit.script(model)

    print(f"As {mode} : Model is saved {trace_model_name}")
    torch.jit.save(IR_model, trace_model_name)

    print("#### Load Test ####")    
    loaded_m = torch.jit.load(trace_model_name)    
    print(loaded_m.code)    
    dummy_user = dummy_inputs[0]
    dummy_item = dummy_inputs[1]    
    
    result = loaded_m(dummy_user, dummy_item)
    print("Result shape: ", result.shape) 

In [8]:
is_trace, is_script = True, False

if is_trace: mode = 'trace'    
elif is_script: mode = 'script'

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

user_np = np.zeros((1,100)).astype(np.int32)
item_np = np.random.randint(low=1, high=1000, size=(1,100)).astype(np.int32)

dummy_inputs = [
    torch.from_numpy(user_np).to(device),
    torch.from_numpy(item_np).to(device)
]

Using cuda device


In [9]:
strTraceFoodModelName = 'ncf_food_model.pt'
trace_model(mode, device, ncf_food_model, dummy_inputs, strTraceFoodModelName) 

As trace : Model is saved ncf_food_model.pt
#### Load Test ####
def forward(self,
    user: Tensor,
    item: Tensor) -> Tensor:
  predict_layer = self.predict_layer
  MLP_layers = self.MLP_layers
  embed_item_MLP = self.embed_item_MLP
  embed_user_MLP = self.embed_user_MLP
  embed_item_GMF = self.embed_item_GMF
  embed_user_GMF = self.embed_user_GMF
  output_GMF = torch.mul((embed_user_GMF).forward(user, ), (embed_item_GMF).forward(item, ))
  _0 = [(embed_user_MLP).forward(user, ), (embed_item_MLP).forward(item, )]
  input = torch.cat(_0, -1)
  _1 = [output_GMF, (MLP_layers).forward(input, )]
  input0 = torch.cat(_1, -1)
  return (predict_layer).forward(input0, )

Result shape:  torch.Size([1, 100, 1])


### 1.3.Create config.pbtxt

In [10]:
%%writefile ncf_food_config.pbtxt

name: "ncf_food_model"
platform: "pytorch_libtorch"
max_batch_size: 128
input [
  {
    name: "INPUT__0"
    data_type: TYPE_INT32
    dims: [100]
  },
  {
    name: "INPUT__1"
    data_type: TYPE_INT32
    dims: [100]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [-1]
  }
]

Writing ncf_food_config.pbtxt


### 1.4 Artifact packaging
- 아래와 닽은 폴더 구조를 생성해야 함.
```
model_serving_folder
    - model_name
        - version_number
            - model file
        - config file
        
# Example: 

triton-serve-pt
    - ncf_food
        - 1
            - model.pt
        - config.pbtxt

```

In [18]:
import os
from utils.triton import copy_artifact

In [19]:
# ncf_food_model 폴더 생성
food_config = 'ncf_food_config.pbtxt'
copy_artifact(strModelServingFolder, strModelName, strTraceFoodModelName, food_config)

triton-docker-serve-pt:
ncf_food_model

triton-docker-serve-pt/ncf_food_model:
1
config.pbtxt

triton-docker-serve-pt/ncf_food_model/1:
model.pt


### 1.5 Upload model packages

In [25]:
import os
import sagemaker
from utils.triton import tar_artifact, upload_tar_s3

In [26]:
sagemaker_session = sagemaker.Session()

In [27]:
strModelTarFile = tar_artifact(strModelServingFolder, strModelName)    
print("strModelTarFile: ", strModelTarFile)
strModelUriPt = upload_tar_s3(sagemaker_session, strModelTarFile, strPrefix)
print("strModelUriPt: ", strModelUriPt)

drwxrwxr-x ec2-user/ec2-user 0 2023-06-28 02:36 ncf_food_model/
-rw-rw-r-- ec2-user/ec2-user 307 2023-06-28 02:36 ncf_food_model/config.pbtxt
drwxrwxr-x ec2-user/ec2-user   0 2023-06-28 02:36 ncf_food_model/1/
-rw-rw-r-- ec2-user/ec2-user 6444775 2023-06-28 02:36 ncf_food_model/1/model.pt
strModelTarFile:  ncf_food_model.model.tar.gz
strModelUriPt:  s3://sagemaker-us-east-1-419974056037/triton-ncf/ncf_food_model.model.tar.gz


### 1.6 Remove files

In [28]:
listFilePath = [
    strTraceFoodModelName,
    f'{strModelName}.model.tar.gz',
    food_config
]
for strFilePath in listFilePath:
    if os.path.exists(strFilePath):
        os.remove(strFilePath)
    else:
        print("Can not delete the file as it doesn't exists")

## 2. Create custom docker image

In [73]:
import boto3
from utils.ecr import ecr_handler
from utils.triton import account_id_map

In [74]:
ecr = ecr_handler()
strAccountID = boto3.client("sts").get_caller_identity().get("Account")
strRegion = boto3.Session().region_name
strBucketName = sagemaker_session.default_bucket()
strExecutionRole = sagemaker.get_execution_role()

### 2.1 dockerfile

* Deep learning contatiners
    - https://github.com/aws/deep-learning-containers/blob/master/available_images.md
* Triton ver.
    - 23.01, 23.02, 23.03 and 22.07


In [80]:
strBase = "amazonaws.com.cn" if strRegion.startswith("cn-") else "amazonaws.com"
strTritonImageUri = (
    "{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:22.07-py3".format(
        account_id=account_id_map[strRegion],
        region=strRegion,
        base=strBase
    )
)
print(f'strtTritonImageUri: {strTritonImageUri}')

strtTritonImageUri: 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.07-py3


In [81]:
%%writefile custom-docker/Dockerfile

FROM 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.07-py3
RUN pip install -U pip
RUN pip install -U sagemaker
RUN pip install -U boto3
ENV PYTHONUNBUFFERED=TRUE

Overwriting custom-docker/Dockerfile


### 2.2 docker build

In [82]:
strRepositoryName="js-onboarding"  ## <-- 원하는 docker repostory 이름을 추가
strRepositoryName = strRepositoryName.lower()
strDockerFile = "Dockerfile"
strDockerDir = "./custom-docker/"
strTag = "latest"

In [83]:
ecr.build_docker(strDockerDir, strDockerFile, strRepositoryName, strRegionName="us-east-1", strAccountId="785573368785")

/home/ec2-user/SageMaker/aws-llm-serve/jumpstart-onboarding
/home/ec2-user/SageMaker/aws-llm-serve/jumpstart-onboarding/custom-docker
strDockerFile Dockerfile
aws ecr get-login --region 'us-east-1' --registry-ids '785573368785' --no-include-email


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded

Sending build context to Docker daemon  2.048kB

Step 1/5 : FROM 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:22.07-py3
 ---> 455241bf5ae4
Step 2/5 : RUN pip install -U pip
 ---> Running in aa7a0cff3829
Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 56.6 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.2.2
    Uninstalling pip-22.2.2:
      Successfully uninstalled pip-22.2.2
Successfully installed pip-23.1.2
[0mRemoving intermediate container aa7a0cff3829
 ---> 2d8b7bf59cad
Step 3/5 : RUN pip install -U sagemaker
 ---> Running in b5b562ee1bec
Collecting sagemaker
  Downloading sagemaker-2.168.0.tar.gz (844 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 844.7/844.7 kB 52.5 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting

### 2.3 Push to ECR

In [84]:
strEcrRepositoryUri = ecr.register_image_to_ecr(strRegion, strAccountID, strRepositoryName, strTag)

== REGISTER AN IMAGE TO ECR ==
  processing_repository_uri: 419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest
aws ecr get-login --region 'us-east-1' --registry-ids '419974056037' --no-include-email


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded

aws ecr create-repository --repository-name 'js-onboarding'
docker tag 'js-onboarding:latest' '419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest'
docker push '419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest'
== REGISTER AN IMAGE TO ECR ==


In [85]:
strEcrRepositoryUri = "419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest"
print(f'strEcrRepositoryUri: {strEcrRepositoryUri}')

strEcrRepositoryUri: 419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest


## 3. Validation (Serving and Inference)

In [110]:
# Set to True to enable SageMaker to run locally
local_mode = False

if local_mode:
    
    from sagemaker.local import LocalSession
    
    strInstanceType = "local_gpu"
    sagemaker_session = LocalSession()
    sagemaker_session.config = {'local': {'local_code': True}}
    
else:
    strInstanceType = "ml.m5.2xlarge" #"ml.p3.2xlarge"#"ml.g4dn.8xlarge"#"ml.p3.2xlarge", 'ml.p3.16xlarge' , ml.g4dn.8xlarge
    sagemaker_session = sagemaker.Session()

nInstanceCount = 1

### 3.1 Local mode
- 내부적으로 Triton 서버가 구동시에 아래 URL 스크립트가 구동 됨.
    - 여기에 맞는 필요한 환경 변수를 넣어 줌.
    - https://raw.githubusercontent.com/triton-inference-server/server/main/docker/sagemaker/serve

#### 3.1.1 Depoly

In [13]:
import time
import json
import numpy as np
from sagemaker.model import Model

In [116]:
ts = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

# endpoint variables
strSMModelName = f"{strPrefix}-mdl-{ts}" #sm_model_name
strEndpointConfigName = f"{strPrefix}-epc-{ts}" # endpoint_config_name
strEndpointName = f"{strPrefix}-ep-{ts}" # endpoint_name
strModelDataUrl = f"s3://{strBucketName}/{strPrefix}/" #model_data_url

In [102]:
dicContainerEnvs = {
                    "SAGEMAKER_TRITON_LOG_VERBOSE": "3",
                    "SAGEMAKER_TRITON_LOG_INFO": "1",
                    "SAGEMAKER_TRITON_LOG_WARNING" : "1",
                    "SAGEMAKER_TRITON_LOG_ERROR" : "1"
                 }

localPytorchModel = Model(
    model_data= strModelUriPt,
    image_uri = strEcrRepositoryUri,
    role=strExecutionRole,
    env = dicContainerEnvs
)

In [103]:
localPredictor = localPytorchModel.deploy(
    instance_type=strInstanceType,
    initial_instance_count=1,
    endpoint_name=strEndpointName,
    wait=True,
    log=False,
)

Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49ed8e260>: Failed to establish a new connection: [Errno 111] Connection refused')': /ping
Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49ed8e3e0>: Failed to establish a new connection: [Errno 111] Connection refused')': /ping
Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff49ed8e6e0>: Failed to establish a new connection: [Errno 111] Connection refused')': /ping
Attaching to usbfufy3dj-algo-1-stjkc
[36musbfufy3dj-algo-1-stjkc |[0m 
[36musbfufy3dj-algo-1-stjkc |[0m == Triton Inference Server ==
[36musbfufy3dj-algo-1-stjkc |[0m 
[36musbfufy3dj-algo-1

#### 3.1.2 Inference

In [14]:
def create_sample_payload():
    # user
    user_np = np.zeros((1,100)).astype(np.int32)
    # item
    item_np = np.random.randint(low=1, high=1000, size=(1,100)).astype(np.int32)

    payload = {
        "inputs": [
            {"name": "INPUT__0", "shape": [1,100], 
             "datatype": "INT32", "data": user_np.tolist()},
            {"name": "INPUT__1", "shape": [1,100], 
             "datatype": "INT32", "data": item_np.tolist()},
        ]
    }
    
    return payload

payload = create_sample_payload()
print("payload: ", payload)

payload:  {'inputs': [{'name': 'INPUT__0', 'shape': [1, 100], 'datatype': 'INT32', 'data': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]}, {'name': 'INPUT__1', 'shape': [1, 100], 'datatype': 'INT32', 'data': [[568, 755, 898, 155, 244, 6, 25, 283, 256, 685, 65, 893, 19, 286, 149, 85, 556, 984, 264, 386, 556, 543, 714, 510, 160, 107, 721, 842, 332, 868, 472, 994, 146, 475, 995, 706, 891, 918, 555, 334, 942, 995, 556, 271, 715, 183, 84, 572, 914, 288, 871, 287, 465, 477, 158, 815, 201, 789, 777, 78, 839, 65, 118, 670, 342, 341, 615, 528, 729, 386, 445, 138, 793, 352, 349, 187, 23, 9, 88, 834, 826, 644, 230, 885, 834, 318, 536, 462, 792, 203, 995, 978, 51, 895, 748, 400, 212, 332, 962, 491]]}]}


In [104]:
def single_model_invoke_endpoint(client,endpoint_name, payload): 
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/octet-stream", 
        Body=json.dumps(payload),
    )

    result = json.loads(response["Body"].read().decode("utf8"))
    
    return result

runtime_client = sagemaker.local.LocalSagemakerRuntimeClient()    
result = single_model_invoke_endpoint(runtime_client,strEndpointName, payload)
print("result : ", result)

[36musbfufy3dj-algo-1-stjkc |[0m I0628 03:05:40.224234 86 sagemaker_server.cc:190] SageMaker request: 2 /invocations
[36musbfufy3dj-algo-1-stjkc |[0m I0628 03:05:40.224306 86 model_repository_manager.cc:773] GetModel() 'ncf_food_model' version -1
[36musbfufy3dj-algo-1-stjkc |[0m I0628 03:05:40.224326 86 model_repository_manager.cc:773] GetModel() 'ncf_food_model' version -1
[36musbfufy3dj-algo-1-stjkc |[0m I0628 03:05:40.224402 86 infer_request.cc:713] [request id: <id_unknown>] prepared: [0x0x7f2df4004c80] request id: , model: ncf_food_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
[36musbfufy3dj-algo-1-stjkc |[0m original inputs:
[36musbfufy3dj-algo-1-stjkc |[0m [0x0x7f2df4015678] input: INPUT__1, type: INT32, original shape: [1,100], batch + shape: [1,100], shape: [100]
[36musbfufy3dj-algo-1-stjkc |[0m [0x0x7f2df4015258] input: INPUT__0, type: INT32, original shape: [1,100], batch + shape: [1,

#### 3.1.3 Delete endpoint

In [106]:
from utils.inference_utils import delete_endpoint

In [107]:
client = sagemaker.local.LocalSagemakerClient()
delete_endpoint(client, strEndpointName)

Gracefully stopping... (press Ctrl+C again to force)
--- Deleted model: js-onboarding-2023-06-28-03-05-04-782
--- Deleted endpoint_config: triton-ncf-ep-2023-06-28-03-05-03
--- Deleted endpoint: triton-ncf-ep-2023-06-28-03-05-03


### 3.2 Cloud mode

#### 3.2.1 Depoly

In [117]:
dicContainer = {
    "Image": strEcrRepositoryUri,
    "ModelDataUrl": strModelUriPt
}

In [118]:
print(f'dicContainer: {dicContainer}')
print(f'strSMModelName: {strSMModelName}')

dicContainer: {'Image': '419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest', 'ModelDataUrl': 's3://sagemaker-us-east-1-419974056037/triton-ncf/ncf_food_model.model.tar.gz'}
strSMModelName: triton-ncf-mdl-2023-06-28-03-08-35


In [119]:
sm_client = boto3.client(service_name="sagemaker")

create_model_response = sm_client.create_model(
    ModelName=strSMModelName,
    ExecutionRoleArn=strExecutionRole,
    PrimaryContainer=dicContainer
)

In [120]:
print(f'Model Arn: {create_model_response["ModelArn"]}')

Model Arn: arn:aws:sagemaker:us-east-1:419974056037:model/triton-ncf-mdl-2023-06-28-03-08-35


In [121]:
create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=strEndpointConfigName,
    ProductionVariants=[
        {
            "InstanceType": strInstanceType,
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": strSMModelName,
            "VariantName": "AllTraffic",
        }
    ],
)

In [122]:
print(f'Endpoint Config Arn: {create_endpoint_config_response["EndpointConfigArn"]}')

Endpoint Config Arn: arn:aws:sagemaker:us-east-1:419974056037:endpoint-config/triton-ncf-epc-2023-06-28-03-08-35


In [123]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName=strEndpointName,
    EndpointConfigName=strEndpointConfigName
)

In [124]:
print(f'Endpoint Arn: {create_endpoint_response["EndpointArn"]}')

Endpoint Arn: arn:aws:sagemaker:us-east-1:419974056037:endpoint/triton-ncf-ep-2023-06-28-03-08-35


In [None]:
%%time 

resp = sm_client.describe_endpoint(EndpointName=strEndpointName)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=strEndpointName)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

Status: Creating
Status: Creating
Status: Creating


#### 3.2.2 Inference

In [126]:
runtime_client = boto3.Session().client('sagemaker-runtime')
single_model_invoke_endpoint(runtime_client,strEndpointName, payload)

{'model_name': 'ncf_food_model',
 'model_version': '1',
 'outputs': [{'name': 'OUTPUT__0',
   'datatype': 'FP32',
   'shape': [1, 100, 1],
   'data': [-2.661646604537964,
    -0.9603654146194458,
    0.341450959444046,
    0.16189703345298767,
    -0.7379575967788696,
    -3.8147194385528564,
    0.43504074215888977,
    -2.7857987880706787,
    -0.7051801681518555,
    -1.9214153289794922,
    -2.4508779048919678,
    -1.4815970659255981,
    -0.840631365776062,
    -0.3676549792289734,
    -2.83782958984375,
    -4.166333198547363,
    0.8842571973800659,
    -2.8424930572509766,
    -0.37986236810684204,
    -1.6701970100402832,
    -2.5594990253448486,
    -0.49171993136405945,
    -0.19276882708072662,
    -2.5152997970581055,
    -3.843963146209717,
    -1.0526206493377686,
    0.024429410696029663,
    -0.3725481629371643,
    2.967946767807007,
    0.37049928307533264,
    -2.642673969268799,
    -2.519130229949951,
    -2.6336967945098877,
    -3.574371337890625,
    -3.840346

#### 3.2.3 Delete endpoint

In [127]:
client = boto3.Session().client('sagemaker')
delete_endpoint(client, strEndpointName)

--- Deleted model: triton-ncf-mdl-2023-06-28-03-08-35
--- Deleted endpoint_config: triton-ncf-epc-2023-06-28-03-08-35
--- Deleted endpoint: triton-ncf-ep-2023-06-28-03-08-35


## 4. Create an ML Model Package

In [1]:
import json
import boto3
import sagemaker as sage
from sagemaker import get_execution_role, ModelPackage
import time

# Common variables
session = sage.Session()
s3_bucket = session.default_bucket()
region = session.boto_region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
role = get_execution_role()

s3_client = session.boto_session.client("s3")
sm_runtime = boto3.client("sagemaker-runtime")

In this section, we will see how you can package your artifacts (ECR image and the trained model artifacts) into a ModelPackage. Once you complete this, you can list your product as a pretrained model in the AWS Marketplace.

**NOTE:** If your model can be deployed on multiple hardware types (CPU/GPU/Inferentia) then a ModelPackage must be created for each and added to the MP listing as different versions as, in general, the container image used will be different for each.  

#### Model Package Definition
A Model Package is a reusable abstraction for model artifacts that packages all the ingredients necessary for inference. It consists of an inference specification that defines the inference image to use along with an optional model data location.

The ModelPackage must be created in the AWS account that will be registered as a seller on the AWS Marketplace.

### 4.1 Define parameters 

In [2]:
# Define parameters
model_name = "marketplace-model-test"#"<<YourModelName>>"
model_description = "marketplace-model-test"#"<<YourModelDescription>>"

# <<YourSupportedContentTypes>>
supported_content_types = ["application/octet-stream"]#["text/csv", "application/json", "application/jsonlines"]

# <<YourSupportedResponseMIMETypes>>
supported_response_MIME_types = [ 
    "application/json"
]

A Model Package creation process requires you to specify following:
  1. Docker image
  2. Model artifacts
    - You can either package these inside the docker image, as we have done in this example, or provide them as a gzipped tarball.
    - In the case of large Models gzipped tarball is required. 
  3. Validation specification 
        
In order to provide confidence to sellers (and buyers) that the products work in Amazon SageMaker, before listing them on AWS Marketplace SageMaker needs to perform basic validations. The product can be listed in AWS Marketplace only if this validation process succeeds. This validation process uses the validation profile and sample data provided by you to create a transform job in your account using the Model to verify your inference image works with SageMaker.

Next, you need to identify the right instance-sizes for your ML models. You can do so by running performance tests on top of your ML Model.

**NOTE:** In addition to tuning, take into account the requirements of your model when identifying instance types.  If your model does not use GPU resources, then do not include GPU instance types.  Similarly, if your model does use GPU resources, but can only make use of a single GPU, do not include instance types that have multiple GPUs as it will lead to increased infrastructure charges for your customers with no performance benefit.

In [3]:
supported_realtime_inference_instance_types = ["ml.g5.12xlarge"]#["<<YourModelSupportInstanceType>>"]
supported_batch_transform_instance_types = ["ml.g4dn.12xlarge"] #  use either a g4dn.12xlarge or p3.8xlarge. However, the Batch Transform validation step is not required

In [4]:
validation_file_name = "input.jsonl"
validation_input_path = f"s3://{s3_bucket}/validation-input-json/"
validation_output_path = f"s3://{s3_bucket}/validation-output-jsonl/"

*(Not required)* First, we create sample data to be used in the validation stage of the ModelPackage creation and upload it to S3. This sample data would need to be in the format your model expects

In [17]:
json_line = json.dumps(payload)
with open("input.jsonl", "w") as f:
    f.write(json_line)
s3_client.put_object(Bucket=s3_bucket, Key="validation-input-json/input.jsonl", Body=json_line)

{'ResponseMetadata': {'RequestId': 'N1C5PHHZKP8DAZBP',
  'HostId': 'CuwpNHIU7kpQ4DYMzfeq2FU97EtCNOJlCa0KgJpSrEa8DLLD3Gp2rzWFUZQXB5oKwCfhegdtUIRQQzkBRmHW8Q==',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'CuwpNHIU7kpQ4DYMzfeq2FU97EtCNOJlCa0KgJpSrEa8DLLD3Gp2rzWFUZQXB5oKwCfhegdtUIRQQzkBRmHW8Q==',
   'x-amz-request-id': 'N1C5PHHZKP8DAZBP',
   'date': 'Wed, 28 Jun 2023 04:07:13 GMT',
   'x-amz-server-side-encryption': 'AES256',
   'etag': '"1ac97b608a183346185ba9a2f1794283"',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'ETag': '"1ac97b608a183346185ba9a2f1794283"',
 'ServerSideEncryption': 'AES256'}

In [16]:
# print (strEcrRepositoryUri)
# print (strModelUriPt)

### 4.2 Create Model Package 

In [7]:
docker_image_uri = "419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest"
model_data_location = "s3://sagemaker-us-east-1-419974056037/triton-ncf/ncf_food_model.model.tar.gz"

In [8]:
# docker_image_uri = strEcrRepositoryUri#"<<YourModelImageURI>>" # ECR URI of Image used to host model
# model_data_location = strModelUriPt#"<<YourModelS3Location>>"

When creating the ModelPackage you will recieve the error:

```
~/anaconda3/envs/python3/lib/python3.8/site-packages/botocore/validate.py in serialize_to_request(self, parameters, operation_model)
    380             if report.has_errors():
--> 381                raise ParamValidationError(report=report.generate_report())
    382         return self._serializer.serialize_to_request(
    383             parameters, operation_model

ParamValidationError: Parameter validation failed:
Invalid length for parameter ValidationSpecification.ValidationProfiles, value: 0, valid min length: 1
```

In order to resolve this issue, open the `validate.py` file in this case it is located at `~/anaconda3/envs/python3/lib/python3.8/site-packages/botocore/validate.py`. Remove/comment out the following code: 

```
380 if report.has_errors():
381                 raise ParamValidationError(report=report.generate_report())
```

Restart the Kernal, import boto3 again and re-run the cell. 



**!which python** <BR>
**for pytorch_p310: cd ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/validate.py**

In [36]:
model_package = session.sagemaker_client.create_model_package(
    ModelPackageName=model_name,
    ModelPackageDescription=model_description,
    InferenceSpecification={
        "Containers": [
            {
                "Image": docker_image_uri,
                "ModelDataUrl": model_data_location
            }
        ],
        "SupportedTransformInstanceTypes": supported_batch_transform_instance_types,
        "SupportedRealtimeInferenceInstanceTypes": supported_realtime_inference_instance_types,
        "SupportedContentTypes": supported_content_types,
        "SupportedResponseMIMETypes": supported_response_MIME_types,
    },
    CertifyForMarketplace=True,  # Make sure to set this to True
    #ValidationSpecification={
    #    "ValidationRole": role,
    #    "ValidationProfiles": [],
    #},
    ValidationSpecification={
        'ValidationRole': role,
        'ValidationProfiles': [
            {
                'ProfileName': "validation",
                'TransformJobDefinition': {
                    'MaxConcurrentTransforms': 1,
                    'MaxPayloadInMB': 64,
                    'BatchStrategy': 'SingleRecord',
                    #'Environment': {
                    #    'string': 'string'
                    #},
                    'TransformInput': {
                        'DataSource': {
                            'S3DataSource': {
                                'S3DataType': 'S3Prefix',
                                'S3Uri': "s3://sagemaker-us-east-1-419974056037/validation-input-json/input.jsonl"
                            }
                        },
                        'ContentType': 'application/octet-stream',
                        'CompressionType': 'None',
                        'SplitType': 'None'
                    },
                    'TransformOutput': {
                        'S3OutputPath': 's3://sagemaker-us-east-1-419974056037/validation-output-json/output.json',
                        'Accept': 'application/json',
                        'AssembleWith': 'None',
                        #'KmsKeyId': 'string'
                    },
                    'TransformResources': {
                        'InstanceType': 'ml.g4dn.12xlarge',
                        'InstanceCount': 1,
                        #'VolumeKmsKeyId': 'string'
                    }
                }
            },
        ]
    },
)

In [37]:
session.wait_for_model_package(model_package_name=model_name) # If failure occurs navigate to SageMaker Console > My marketplace model packages > select the failed ModelPackage for details. 

................................................................................


{'ModelPackageName': 'marketplace-model-test',
 'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:419974056037:model-package/marketplace-model-test',
 'ModelPackageDescription': 'marketplace-model-test',
 'CreationTime': datetime.datetime(2023, 6, 28, 4, 37, 6, 954000, tzinfo=tzlocal()),
 'InferenceSpecification': {'Containers': [{'Image': '419974056037.dkr.ecr.us-east-1.amazonaws.com/js-onboarding:latest',
    'ImageDigest': 'sha256:d129f5ed05ae9517a76176061bf0987eef0ea5a83002a142e100990c9acd392a',
    'ModelDataUrl': 's3://sagemaker-us-east-1-419974056037/triton-ncf/ncf_food_model.model.tar.gz'}],
  'SupportedTransformInstanceTypes': ['ml.g4dn.12xlarge'],
  'SupportedRealtimeInferenceInstanceTypes': ['ml.g5.12xlarge'],
  'SupportedContentTypes': ['application/octet-stream'],
  'SupportedResponseMIMETypes': ['application/json']},
 'ValidationSpecification': {'ValidationRole': 'arn:aws:iam::419974056037:role/service-role/AmazonSageMaker-ExecutionRole-20221206T163436',
  'ValidationProfil

Once you have executed the preceding cell, open the [Model Packages console from Amazon SageMaker](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/model-packages/my-resources) and check if model creation succeeded. 

Choose the Model and then open the **Validation** tab to see the validation results.

## 5. Validate model in Amazon SageMaker environment

##### Create a deployable model from the model package.

In [39]:
model = ModelPackage(
    role=role,
    model_package_arn=model_package["ModelPackageArn"],
    sagemaker_session=session,
)

### 5.1 Validate Real-time inference via Amazon SageMaker Endpoint

##### Deploy the SageMaker model to an endpoint

In [40]:
model.deploy(
    initial_instance_count=1,
    instance_type=supported_realtime_inference_instance_types[0],
    endpoint_name=model_name,
)
model.endpoint_name

-----------!

'marketplace-model-test'

In [41]:
content_type = supported_content_types[0]

##### Example invocation via boto3

In [42]:
# Make use of your own example input data to test the Endpoint
#input_json = '{"text": "sample"}'

response = sm_runtime.invoke_endpoint(
    EndpointName=model.endpoint_name,
    ContentType=content_type,
    Accept="application/json",
    Body=json.dumps(payload),
)

json.load(response["Body"])

{'model_name': 'ncf_food_model',
 'model_version': '1',
 'outputs': [{'name': 'OUTPUT__0',
   'datatype': 'FP32',
   'shape': [1, 100, 1],
   'data': [-1.796771764755249,
    -1.6427546739578247,
    -2.4718635082244873,
    -1.281156063079834,
    2.3520302772521973,
    -0.7353585958480835,
    1.2487637996673584,
    1.7070531845092773,
    0.8721898794174194,
    -0.7815717458724976,
    -4.7818145751953125,
    -2.924468994140625,
    0.6013489961624146,
    -2.3256564140319824,
    1.301037311553955,
    1.1565217971801758,
    -0.6119728088378906,
    -2.7658627033233643,
    -0.13805341720581055,
    -1.6411755084991455,
    -0.6119728088378906,
    -3.5467724800109863,
    -0.9603652954101562,
    0.8064634799957275,
    -0.5977281332015991,
    -3.7661967277526855,
    -0.7232743501663208,
    -1.699474573135376,
    -0.7003310322761536,
    -1.4859449863433838,
    -0.3064301908016205,
    0.2435605823993683,
    -3.6060190200805664,
    -0.2855709195137024,
    -4.563620090

##### Example invocation via the AWS CLI

In [43]:
# Perform inference
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $model.endpoint_name \
    --body fileb://$validation_file_name \
    --content-type $content_type \
    --region $session.boto_region_name \
    out.out
    
    
# Print inference
!head out.out

{
    "ContentType": "application/json",
    "InvokedProductionVariant": "AllTraffic"
}
{"model_name":"ncf_food_model","model_version":"1","outputs":[{"name":"OUTPUT__0","datatype":"FP32","shape":[1,100,1],"data":[-1.796771764755249,-1.6427546739578248,-2.4718635082244875,-1.281156063079834,2.3520302772521974,-0.7353585958480835,1.2487637996673585,1.7070531845092774,0.8721898794174194,-0.7815717458724976,-4.7818145751953129,-2.924468994140625,0.6013489961624146,-2.3256564140319826,1.301037311553955,1.1565217971801758,-0.6119728088378906,-2.7658627033233644,-0.13805341720581056,-1.6411755084991456,-0.6119728088378906,-3.5467724800109865,-0.9603652954101563,0.8064634799957275,-0.5977281332015991,-3.7661967277526857,-0.7232743501663208,-1.699474573135376,-0.7003310322761536,-1.4859449863433838,-0.3064301908016205,0.24356058239936829,-3.6060190200805666,-0.2855709195137024,-4.563620090484619,-0.9588063955307007,-1.5518157482147217,-1.6185109615325928,-3.140951633453369,-1.8111308813095093,

Clean up the endpoint and endpoint configuration created.

In [44]:
model.sagemaker_session.delete_endpoint(model.endpoint_name)
model.sagemaker_session.delete_endpoint_config(model.endpoint_name)

Congratulations! Since the model is not required, you can delete it. Note that you are deleting the deployable model. Not the model package.

In [45]:
model.delete_model()

To publish the model to the AWS Marketplace, you will need to specify model package ARN. Copy the following Model Package ARN 

In [46]:
model_package["ModelPackageArn"]

'arn:aws:sagemaker:us-east-1:419974056037:model-package/marketplace-model-test'

### <a name="step6"></a>Step 6: List ML Model on AWS Marketplace


1. Model Partner creates [public profile](https://docs.aws.amazon.com/marketplace/latest/userguide/seller-registration-process.html#seller-public-profile) on AWS Marketplace and registers to be a seller.
There is no need to provide Tax information as the product on Marketplace will be listed as free.

2. In the [Model Packages](https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/model-packages/my-resources) section of the SageMaker console you'll find the entity you created in this notebook. If it was successfully created and validated, you should be able to select the entity and choose **Publish new ML Marketplace listing**.

<img src="images/publish-to-marketplace-action.png"/>

You will be redirected to the [AWS Marketplace Management portal](https://aws.amazon.com/marketplace/management/ml-products/) where you will be able to build a listing.

<img src="images/listing.png"/>

1. If your model targets multiple hardware types, remember to add each ModelPackage to the listing as separate versions.
2. Click Add and fill in the model information. Kindly set Product visibility must be set to `Public`.
<img src="images/public.png"/>

3. Allowlist account `171503325295`, `572320329544` and `559110549532` for access to the model. 
For region support select: `us-east-1, us-west-2, eu-west-1, eu-central-1, eu-west-2, ap-northeast-1, ap-south-1, ca-central-1, us-east-2, ap-northeast-2`
<img src="images/allowlist-accs.png"/>

4. Under Pricing and terms, set pricing model as:
**Inference based pricing (custom metering) at $0**

You will see the following:
(Optional) If the container did not implement the below please confirm and move forward. 
```
I confirm that my model package supports the response header for custom metering. Example response header: X-Amzn-Inference-Metering:
{"Dimension": "inference.count", "ConsumedUnits": 3}
I understand that in absence of this header, default metering will be used instead.
```

<img src="images/inference-based-pricing.png"/>

5. Listing status should show as follows:
**Do not click Sign off and publish**

<img src="images/status-1.png"/>

6. Vissibility status of the listing should be `Limited`.

<img src="images/status-2.png"/>




**Resources**
* [Publishing your product in AWS Marketplace](https://docs.aws.amazon.com/marketplace/latest/userguide/ml-publishing-your-product-in-aws-marketplace.html)


In [13]:
!which python

/home/ec2-user/anaconda3/envs/python3/bin/python
