# Check if inference works normally:

Here, first we want to check if our inference code is working properly. Note that my inference output is a JSON variable; yours might be a different type.

In [None]:
import json
from inference import model_fn, predict_fn, input_fn, output_fn

from PIL import Image
import numpy as np
image = Image.open('IMG1.jpg')

inputs = {"input": image}

response = output_fn(
    predict_fn(
        input_fn(inputs, "application/x-image"),
        model_fn("./") # upload your model here in the same folder where your notebook exists.
                       # you don't want to tar your model.
    ),
    "application/json"
)
print(json.loads(response))

# Deploy your model and test:

Do all imports; especially do not forget the serializers and deserializers. We also need the time for naming our resources; so we recognize them easily.

In [38]:
import os
import json
import boto3
import sagemaker
from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role, Session
from sagemaker.deserializers import JSONDeserializer
from sagemaker.serializers import JSONSerializer
import boto3
from datetime import datetime
from scipy.optimize import linear_sum_assignment
current_datetime = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

Here we build our pytorch model. Before running this, make sure you have uploaded your model (plus inference) on your S3 bucket.
I have set some environment valiables such as TS_MAX_RESPONSE_SIZE to increase my endpoint's response size. Feel free to change or remove them. 

In [40]:
sess = Session()
role = get_execution_role()

model = PyTorchModel(entry_point='inference.py',
                     model_data='s3://your-bucket/path-to-model/model.tar.gz',
                     framework_version='2.0', # version of pytorch
                     py_version='py310', # python version
                     role=role,
                     sagemaker_session=sess,
                     env={'SAGEMAKER_MODEL_SERVER_TIMEOUT': '180', 'TS_MAX_RESPONSE_SIZE': '20000000',
                         'TS_MAX_REQUEST_SIZE':'20000000',})

Select your endpoint instance and deploy your model to AWS. This may takes a little while. You can check the logs by going to SageMaker -> Inference -> Endpoints. 
We also named our endpoint in a way that we can easily find it later.

In [None]:
INSTANCE_TYPE = 'ml.g5.xlarge'
ENDPOINT_NAME = 'your_model_name_' + str(current_datetime)

predictor = model.deploy(initial_instance_count=1,
                         # serializer=JSONSerializer(), # feel free to use any serialization/deserialization
                                                        # regarding your scenario!
                         instance_type=INSTANCE_TYPE,
                         deserializer=JSONDeserializer(),
                         endpoint_name=ENDPOINT_NAME)

Now, everything is ready for you to use your endpoint. So, load your model input and don't wory about serializing your data if you have used a proper sagemaker serialization. 

In [None]:
#for numpy input (need seialization)
from PIL import Image
import numpy as np
image = Image.open('IMG.jpg')
# bytes_data = image.tobytes() # you may want to serialize your data manually.

result = predictor.predict(image)
print("Prediction:", result)

Hope you have sucessfully got your desired output. Now, it's time to selete the resources.

In [None]:
sm_client = boto3.client(service_name="sagemaker")
response = sm_client.describe_endpoint_config(EndpointConfigName=ENDPOINT_NAME)
endpoint_config_name = response['EndpointConfigName']

# Delete Endpoint
sm_client.delete_endpoint(EndpointName=ENDPOINT_NAME)

# Delete Endpoint Configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# Delete Model
for prod_var in response['ProductionVariants']:
    model_name = prod_var['ModelName']
    sm_client.delete_model(ModelName=model_name)   