# Deploying fine-tuned model to SageMaker Endpoint to perform Inference (Script Mode)
---


### Inference

Amazon SageMaker manages the built-in deep learning frameworks, including MXNet, TensorFlow, PyTorch, and Chainer, as a managed Docker container, so you can easily train the model, or deploy to the endpoint using script without building a new Docker container with BYOC(Bring Your Own Container).

There are several ways to include dependency packages in a script.

- ***Option 1.*** Insert `requirements.txt` in the directory containing the entry point script. However, as of May 2020 TensorFlow framework does not support `requirements.txt` when setting script mode, so please use another option if you use TensorFlow.
- ***Option 2.*** Include a dependency package installation command in the inference script. For example,

```python
subprocess.call([sys.executable, '-m', 'pip', 'install', 'gluonnlp', 'torch', 'sentencepiece', 
                'onnxruntime', 'transformers', 'git+https://git@github.com/SKTBrain/KoBERT.git@master'])

```

Note that the endpoint deployment time is about 9-11 minutes when using the GPU instance and about 7-8 minutes when using the CPU instance.

In [1]:
import sagemaker
from sagemaker.mxnet.model import MXNetModel
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()
#model_data = 's3://<YOUR BUCKET>/<YOUR MODEL PATH>/model.tar.gz'
model_data = 's3://sagemaker-us-east-1-143656149352/mxnet-training-2020-05-19-01-08-59-727/output/model.tar.gz'

mxnet_model = MXNetModel(model_data=model_data,
                         role=role,
                         entry_point='inference.py',
                         source_dir = './src',
                         py_version='py3',
                         framework_version='1.6.0'
                        )

In [2]:
%%time
predictor = mxnet_model.deploy(instance_type='ml.p2.xlarge', initial_instance_count=1)
print(predictor.endpoint)

---------------!mxnet-inference-2020-05-28-14-36-38-048
CPU times: user 31.7 s, sys: 2.63 s, total: 34.3 s
Wall time: 8min


If the endpoint is created and you want to restart the jupyter notebook, initializing the predictor can be done using the code cell below.

In [3]:
# import sagemaker
# from sagemaker.mxnet.model import MXNetPredictor
# sagemaker_session = sagemaker.Session()
# endpoint_name = '<YOUR ENDPOINT NAME>'
# predictor = MXNetPredictor(endpoint_name, sagemaker_session)

The code cell below performs real-time prediction.

In [7]:
# Wow, this is a story that repeats reversal over reversal. Highly recommended
input_sentence = '우와, 정말 반전에 반전을 거듭하는 스토리입니다. 강력 추천합니다.'
pred_out = predictor.predict(input_sentence)
print(pred_out)

{'score': [0.03050542250275612, 0.9694945812225342], 'time': 0.01893901824951172}


In [8]:
# The contents are really messed up, and the actor's acting skills are also messed up.
input_sentence = '하하, 정말 엉망진창에 배우 연기력도 꽝이에요.'
pred_out = predictor.predict(input_sentence)
print(pred_out)

{'score': [0.9753417372703552, 0.02465830184519291], 'time': 0.019909381866455078}


### Delete Endpoint

In [9]:
# predictor.delete_endpoint()
# predictor.delete_model()