# CentML Yoxall Demo

In this demo we will use a <b>RESNET-50</b> model in ONNX format and optimize it with CentML using the CentML APIs. Once the ONNX model is optimized, we will compare the performance of the optimized model with the original format

## Export PyTorch Model to ONNX

We will use the open source Pytorch <b>RESNET-50</b> ONNX model for this demo
We have also created a `param.json` which contains the following data about the input shape:
```
[
    {
        "input_shape":"1,3,224,224",
        "dtype":"float16"
    }
]
```

In [1]:
import torch
import onnx
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2).eval().half().cuda()
dummy_input = torch.randn(1, 3, 224, 224).to(torch.float16).cuda()

input_names = [ "actual_input" ]
output_names = [ "output" ]
torch.onnx.export(model,
                 dummy_input,
                 "./model.onnx",
                 verbose=False,
                 input_names=input_names,
                 output_names=output_names,
                 export_params=True,
                 )

## Upload the model

In [4]:
import sagemaker
import boto3


s3_client = boto3.client("s3", region_name="us-east-1")
default_bucket = sagemaker.session.Session().default_bucket()

# Change model name here
MODEL_NAME = "resnet50"

# Set path to ONNX file
path_to_onnx = './model.onnx'

# Set path to params.json file
path_to_params = "./params.json"


s3_client.upload_file(path_to_onnx, default_bucket, f"{MODEL_NAME}/model.onnx")
s3_client.upload_file(path_to_params, default_bucket, f"{MODEL_NAME}/params.json")

## Submit optimize request

In [4]:
import sagemaker
import boto3
from sagemaker import get_execution_role
import time
s3_client = boto3.client("s3", region_name="us-east-1")
default_bucket = sagemaker.session.Session().default_bucket()

# Select hardware from https://aws.amazon.com/sagemaker/pricing/
instance = "ml.g4dn.2xlarge" # T4 instance
MODEL_NAME = "resnet50"

client = boto3.client('sagemaker', region_name="us-east-1")

training_job_name = f'{MODEL_NAME}-{str(time.time()).replace(".", "")}'

response = client.create_training_job(
    TrainingJobName=training_job_name,
    AlgorithmSpecification={
        'TrainingImage': '725137708992.dkr.ecr.us-east-1.amazonaws.com/centml:latest',
        'TrainingInputMode': 'File',
    },
    RoleArn='arn:aws:iam::725137708992:role/centml-yoxall-sageMakerRole-dev',
    InputDataConfig=[
        {
            'ChannelName': 'model',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': f's3://{default_bucket}/{MODEL_NAME}/model.onnx',
                    'S3DataDistributionType': 'FullyReplicated',
                },
            },
            "ContentType": "application/octet-stream",
            "CompressionType": "None",
            "InputMode": "File"
        },
        {    
            "ChannelName": "params",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": f"s3://{default_bucket}/{MODEL_NAME}/params.json",
                    "S3DataDistributionType": "FullyReplicated",    
                },
            },
            "ContentType": "application/json",
            "CompressionType": "None",
            "InputMode": "File"
        }
    ],
    OutputDataConfig={
        'S3OutputPath': f's3://{default_bucket}/outputs/'
    },
    ResourceConfig={
        'InstanceType': instance,
        'InstanceCount': 1,
        'VolumeSizeInGB': 225,
    },
    StoppingCondition={
        "MaxRuntimeInSeconds": 86400
    },
)

print(f"Created training job {response}")

Created training job {'TrainingJobArn': 'arn:aws:sagemaker:us-east-1:725137708992:training-job/resnet50-16803151213335679', 'ResponseMetadata': {'RequestId': '92e6b96e-5287-452f-85a7-bb00ea848b39', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '92e6b96e-5287-452f-85a7-bb00ea848b39', 'content-type': 'application/x-amz-json-1.1', 'content-length': '101', 'date': 'Sat, 01 Apr 2023 02:12:00 GMT'}, 'RetryAttempts': 0}}


##  Check status of optimization job

In [4]:
import boto3

client = boto3.client("sagemaker", region_name="us-east-1")
training_job_name = "resnet50-16801275426092536"
# Get sagemaker job status
response = client.describe_training_job(TrainingJobName=training_job_name)

status = response["TrainingJobStatus"]

print(f"Optimization job status: {status}")

Optimization job status: Completed


## Wait for optimize to finish.

<b>An optimization task can take upto several hours.</b>
We can check the status of a optimization job with the status API using the optimization task id from above.

## Download optimized model

In [6]:
import boto3
import tarfile

response = client.describe_training_job(TrainingJobName=training_job_name)
status = response["TrainingJobStatus"]
assert status == "Completed", "Optimization job failed or is still in progress"

downloadUrl = response["ModelArtifacts"]["S3ModelArtifacts"]

!aws s3 cp {downloadUrl} .

# open file
file = tarfile.open('model.tar.gz')
  
# extracting file
file.extractall('./model')
  
file.close()

download: s3://sagemaker-us-east-1-725137708992/outputs/resnet50-16801275426092536/output/model.tar.gz to ./model.tar.gz


## Load optimized model and run CentML benchmark

In [7]:
%%time

import erin
import torch
import time
import numpy as np
from torchvision import transforms

# Set file paths
params = "params.json"
onnx_path = "model.onnx"

# Set the model in Hidet/Erin
model = erin.create_model(onnx_path, params, './model')

np_payload = np.random.rand(1,3,224, 224).astype("float16")
hidet_tensor = erin.from_numpy(np_payload).cuda()

# Configure number of iterations to run here
NUM_ITERATIONS = 100

hidet_time_durations = []
for i in range(0,NUM_ITERATIONS):
    # Start time
    start_time = time.time()
    
    # Prediction tensor
    output = model.predict(hidet_tensor)
    
    #End time
    end_time = time.time()
    
    duration = end_time - start_time
    hidet_time_durations.append(duration)

print("Average time: {:0.4f}s".format(sum(hidet_time_durations)/len(hidet_time_durations)))

Average time: 0.0024s
CPU times: user 2.97 s, sys: 380 ms, total: 3.35 s
Wall time: 9.95 s


## Run PyTorch benchmark

In [8]:
%%time

import torch
import onnx
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2).eval().half().cuda()
pytorch_tensor = torch.from_numpy(np_payload).cuda()

pytorch_time_durations = []

for i in range(0, NUM_ITERATIONS):
    # Start time
    start_time = time.time()
    
    # Prediction tensor
    output = model(pytorch_tensor)
    
    #End time
    end_time = time.time()
    
    duration = end_time - start_time
    pytorch_time_durations.append(duration)

print("Average time: {:0.4f}s".format(sum(pytorch_time_durations)/len(pytorch_time_durations)))

Average time: 0.0102s
CPU times: user 2.02 s, sys: 90.7 ms, total: 2.11 s
Wall time: 1.48 s
