# Deploying fine-tuned model to SageMaker Endpoint to perform Inference (Local Mode)
---


### Inference

Using local mode, you can easily verify that your inference code is working before deploying to the SageMaker endpoint. You do not have to wait for a separate EC2 instance to be provisioned when calling estimator.

In [1]:
import sagemaker
from sagemaker.mxnet.model import MXNetModel
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()
#model_data = 's3://<YOUR BUCKET>/<YOUR MODEL PATH>/model.tar.gz'
model_data = 's3://sagemaker-us-east-1-143656149352/mxnet-training-2020-05-19-01-08-59-727/output/model.tar.gz'

mxnet_model = MXNetModel(model_data=model_data,
                         role=role,
                         entry_point='inference.py',
                         source_dir = './src',
                         py_version='py3',
                         framework_version='1.6.0'
                        )

In [2]:
%%time
predictor = mxnet_model.deploy(instance_type='local_gpu', initial_instance_count=1)
print(predictor.endpoint)

Attaching to tmppmnd454m_algo-1-4z4hp_1
[36malgo-1-4z4hp_1  |[0m Collecting git+https://****@github.com/SKTBrain/KoBERT.git@master (from -r /opt/ml/model/code/requirements.txt (line 5))
[36malgo-1-4z4hp_1  |[0m   Cloning https://****@github.com/SKTBrain/KoBERT.git (to revision master) to /home/model-server/tmp/pip-req-build-m1b9hftq
[36malgo-1-4z4hp_1  |[0m   Running command git clone -q 'https://****@github.com/SKTBrain/KoBERT.git' /home/model-server/tmp/pip-req-build-m1b9hftq
[36malgo-1-4z4hp_1  |[0m Collecting sentencepiece
[36malgo-1-4z4hp_1  |[0m   Downloading sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 14.1 MB/s eta 0:00:01
[36malgo-1-4z4hp_1  |[0m [?25hCollecting onnxruntime
[36malgo-1-4z4hp_1  |[0m   Downloading onnxruntime-1.3.0-cp36-cp36m-manylinux1_x86_64.whl (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 40.4 MB/s eta 0:00:01
[36malgo-1-4z4hp_1  |[0m [?25hCollecting transforme

[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,043 [INFO ] W-9003-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9003
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,043 [INFO ] W-9003-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID]173
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,043 [INFO ] W-9003-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MXNet worker started.
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,044 [INFO ] W-9003-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.8
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,044 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:04,044 [INFO ] W-9002-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9002
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:

The code cell below performs real-time prediction.

In [3]:
# Wow, this is a story that repeats reversal over reversal. Highly recommended
input_sentence = '우와, 정말 반전에 반전을 거듭하는 스토리입니다. 강력 추천합니다.'
pred_out = predictor.predict(input_sentence)
print(pred_out)

[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:15,497 [INFO ] W-9002-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 91
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:15,497 [INFO ] W-9002-model ACCESS_LOG - /172.18.0.1:54172 "POST /invocations HTTP/1.1" 200 94
{'score': [0.030505415052175522, 0.9694945812225342], 'time': 0.08880257606506348}


In [4]:
# The contents are really messed up, and the actor's acting skills are also messed up.
input_sentence = '하하, 정말 엉망진창에 배우 연기력도 꽝이에요.'
pred_out = predictor.predict(input_sentence)
print(pred_out)

[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:21,400 [INFO ] W-9003-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 87
[36malgo-1-4z4hp_1  |[0m 2020-05-28 11:51:21,401 [INFO ] W-9003-model ACCESS_LOG - /172.18.0.1:54172 "POST /invocations HTTP/1.1" 200 89
{'score': [0.9753417372703552, 0.024658288806676865], 'time': 0.08612608909606934}


### Optional: Delete Endpoint

In [5]:
predictor.delete_endpoint()
predictor.delete_model()

Gracefully stopping... (press Ctrl+C again to force)
