# TensorFlow training and serving

Script mode is a training script format for TensorFlow that lets you execute any TensorFlow training script in SageMaker with minimal modification. The [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) handles transferring your script to a SageMaker training instance. On the training instance, SageMaker's native TensorFlow support sets up training-related environment variables and executes your training script. In this tutorial, we use the SageMaker Python SDK to launch a training job and deploy the trained model.

Script mode supports training with a Python script, a Python module, or a shell script. In this example, we will show how easily you can train a SageMaker using TensorFlow 2.1 scripts with SageMaker Python SDK. In addition, this notebook demonstrates how to perform real time inference with the [SageMaker TensorFlow Serving container](https://github.com/aws/sagemaker-tensorflow-serving-container). The TensorFlow Serving container is the default inference method for script mode. For full documentation on the TensorFlow Serving container, please visit [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst).


## 1. Sagemaker notebook 설명
<p>Sagemaker notebook은 완전 관리형 서비스로 컨테이너 기반으로 구성되어 있습니다. 사용자가 직접 컨테이너를 볼 수 없지만, 내부적으로는 아래와 같은 원리로 동작합니다. </p>
<p><img src="./fig/fig00.png" width="800", height="80"></p>

- **S3 (Simple Storage Serivce)** : Object Storage로서 학습할 데이터 파일과 학습 결과인 model, checkpoint, tensorboard를 위한 event 파일, 로그 정보 등을 저장하는데 사용합니다.
- **SageMaker Notebook** : 학습을 위한 스크립트 작성과 디버깅, 그리고 실제 학습을 수행하기 위한 Python을 개발하기 위한 환경을 제공합니다.
- **Amazon Elastic Container Registry(ECR)** :  Docker 컨테이너 이미지를 손쉽게 저장, 관리 및 배포할 수 있게 해주는 완전관리형 Docker 컨테이너 레지스트리입니다. Sagemaker는 기본적인 컨테이너를 제공하기 때문에 별도 ECR에 컨테이너 이미지를 등록할 필요는 없습니다. 하지만, 별도의 학습 및 배포 환경이 필요한 경우 custom 컨테이너 이미지를 만들어서 ECR에 등록한 후 이 환경을 활용할 수 있습니다.

<p>학습과 추론을 하는 hosting 서비스는 각각 다른 컨테이너 환경에서 수행할 수 있으며, 쉽게 다량으로 컨테이너 환경을 확장할 수 있으므로 다량의 학습과 hosting이 동시에 가능합니다.   
</p>

# Set up the environment

Let's start by setting up the environment:

In [1]:
import sys

In [2]:
!{sys.executable} -m pip install --upgrade pip
# !{sys.executable} -m pip install tensorflow-gpu==2.1.0
!{sys.executable} -m pip install tensorflow-datasets --use-feature=2020-resolver
!{sys.executable} -m pip install sagemaker-experiments
!{sys.executable} -m pip install smdebug
# !{sys.executable} -m pip install grpcio==1.24.3



In [3]:
import os
import time
import sagemaker
import boto3
import tensorflow_datasets as tfds
import tensorflow as tf
from PIL import Image

import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.analytics import ExperimentAnalytics

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

from sagemaker.debugger import Rule, DebuggerHookConfig, TensorBoardOutputConfig, CollectionConfig, rule_configs

%matplotlib inline

In [4]:
sagemaker_session = sagemaker.Session()

role = get_execution_role()
region = sagemaker_session.boto_session.region_name

sess = boto3.Session()
sm = sess.client('sagemaker')

## Create a S3 bucket to hold data

In [5]:
# create a s3 bucket to hold data, note that your account might already created a bucket with the same name
account_id = sess.client('sts').get_caller_identity()["Account"]
data_bucket = 'sagemaker-experiments-{}-{}'.format(sess.region_name, account_id)
bucket = 'sagemaker-{}-{}'.format(sess.region_name, account_id)
prefix = 'image_segmentation/oxford_iiit_pet/3.2.0'

try:
    if sess.region_name == "us-east-1":
        sess.client('s3').create_bucket(Bucket=data_bucket)
    else:
        sess.client('s3').create_bucket(Bucket=data_bucket, 
                                        CreateBucketConfiguration={'LocationConstraint': sess.region_name})
except Exception as e:
    print(e)

An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.


<p>이번 학습에 사용할 이미지 데이터는 <strong><a href="https://www.robots.ox.ac.uk/~vgg/data/pets/" target="_blank" class ='btn-default'>Oxford-IIIT Pet Dataset</a></strong> 입니다. Oxford-IIIT Pet Dataset은 <strong>37</strong>개 다른 종의 개와 고양이 이미지를 각각 200장 씩 제공하고 있으며, Ground Truth 또한 Classification, Object Detection, Segmentation와 관련된 모든 정보가 있으나, 이번 학습에서는 37개 class에 대해 일부 이미지로 Classification 문제를 해결하기 위해 학습을 진행할 예정입니다.</p>
<p><img src="./fig/pet_annotations.jpg" width="700", height="70"></p> 

## Data Generator

In [6]:
# dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
builder = tfds.builder('oxford_iiit_pet:3.*.*')
info = builder.info
print(info)
# by setting register_checksums as True to pass the check
config = tfds.download.DownloadConfig(register_checksums = True)
builder.download_and_prepare(download_config=config)
dataset = builder.as_dataset()

INFO:absl:Load dataset info from /home/ec2-user/tensorflow_datasets/oxford_iiit_pet/3.2.0
INFO:absl:Reusing dataset oxford_iiit_pet (/home/ec2-user/tensorflow_datasets/oxford_iiit_pet/3.2.0)
INFO:absl:Constructing tf.data.Dataset for split None, from /home/ec2-user/tensorflow_datasets/oxford_iiit_pet/3.2.0


tfds.core.DatasetInfo(
    name='oxford_iiit_pet',
    version=3.2.0,
    description='The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200
images for each class. The images have large variations in scale, pose and
lighting. All images have an associated ground truth annotation of breed.',
    homepage='http://www.robots.ox.ac.uk/~vgg/data/pets/',
    features=FeaturesDict({
        'file_name': Text(shape=(), dtype=tf.string),
        'image': Image(shape=(None, None, 3), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=37),
        'segmentation_mask': Image(shape=(None, None, 1), dtype=tf.uint8),
        'species': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    }),
    total_num_examples=7349,
    splits={
        'test': 3669,
        'train': 3680,
    },
    supervised_keys=('image', 'label'),
    citation="""@InProceedings{parkhi12a,
      author       = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A. and Ja

### Upload dataset to S3

Next, we'll upload the TFRecord datasets to S3 so that we can use it in training and batch transform jobs.

In [7]:
inputs = "s3://{}/{}/".format(data_bucket, prefix)
!aws s3 cp /home/ec2-user/tensorflow_datasets/oxford_iiit_pet/3.2.0/ $inputs --recursive
print('input spec: {}'.format(inputs))


upload: ../../tensorflow_datasets/oxford_iiit_pet/3.2.0/image.image.json to s3://sagemaker-experiments-ap-northeast-2-322537213286/image_segmentation/oxford_iiit_pet/3.2.0/image.image.json
upload: ../../tensorflow_datasets/oxford_iiit_pet/3.2.0/dataset_info.json to s3://sagemaker-experiments-ap-northeast-2-322537213286/image_segmentation/oxford_iiit_pet/3.2.0/dataset_info.json
upload: ../../tensorflow_datasets/oxford_iiit_pet/3.2.0/label.labels.txt to s3://sagemaker-experiments-ap-northeast-2-322537213286/image_segmentation/oxford_iiit_pet/3.2.0/label.labels.txt
upload: ../../tensorflow_datasets/oxford_iiit_pet/3.2.0/oxford_iiit_pet-test.tfrecord-00003-of-00004 to s3://sagemaker-experiments-ap-northeast-2-322537213286/image_segmentation/oxford_iiit_pet/3.2.0/oxford_iiit_pet-test.tfrecord-00003-of-00004
upload: ../../tensorflow_datasets/oxford_iiit_pet/3.2.0/oxford_iiit_pet-test.tfrecord-00000-of-00004 to s3://sagemaker-experiments-ap-northeast-2-322537213286/image_segmentation/oxford_i

# Construct a script for training

Here is the entire script:

In [8]:
# # TensorFlow 2.1 script
!pygmentize 'source_dir/image_segmentation.py'

Error: cannot read infile: [Errno 2] No such file or directory: 'source_dir/image_segmentation.py'


# Create a training job using the `TensorFlow` estimator

The `sagemaker.tensorflow.TensorFlow` estimator handles locating the script mode container, uploading your script to a S3 location and creating a SageMaker training job. Let's call out a couple important parameters here:

* `py_version` is set to `'py3'` to indicate that we are using script mode since legacy mode supports only Python 2. Though Python 2 is deprecated soon, you can use script mode with Python 2 by setting `py_version` to `'py2'` and `script_mode` to `True`.

### SageMaker Experiments
- experiments를 관리하고 추적하는 기능 제공
<center><img src="./fig/experiments_fig.png" width="900" height="700"></center>


- trial components : pre-processing jobs, training jobs, and batch transform jobsb

#### Track an Experiment
- Experiment 정보를 기록하기 위해 tracker를 사용
- 기존 trial components 를 로딩하거나(Tracker.load) 신규 trial component를 생성하는 방식으로 사용(Tracker.create)
- 아래는 데이터셋을 업로드하는 S3 버킷의 URI와 데이터셋 관련 정보를 log로 남기는 예제임

In [9]:
## Dataset 위치
inputs= 's3://{}/{}'.format(data_bucket, prefix)
# inputs

#### Create an Experiment
- The top level entity as a collection of trials that are observed, compared, and evaluated as a group

In [10]:
experiment_name = "experiments-v2" ## 원하는 experiment 이름으로 변경

experiment_existed = True
try:
    experiment = sm.describe_experiment(ExperimentName=experiment_name)
except:
    experiment_existed = False

if not experiment_existed:
    experiment = Experiment.create(
        experiment_name=experiment_name, 
        description="Segmentation of skincare images", 
        sagemaker_boto_client=sm)
print(experiment)

{'ExperimentName': 'experiments-v2', 'ExperimentArn': 'arn:aws:sagemaker:ap-northeast-2:322537213286:experiment/experiments-v2', 'DisplayName': 'experiments-v2', 'Description': 'Segmentation of skincare images', 'CreationTime': datetime.datetime(2020, 9, 8, 23, 57, 39, 332000, tzinfo=tzlocal()), 'CreatedBy': {}, 'LastModifiedTime': datetime.datetime(2020, 12, 25, 2, 34, 33, 797000, tzinfo=tzlocal()), 'LastModifiedBy': {}, 'ResponseMetadata': {'RequestId': '2b66d77b-9de5-425e-a859-6ee9eacf4487', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '2b66d77b-9de5-425e-a859-6ee9eacf4487', 'content-type': 'application/x-amz-json-1.1', 'content-length': '307', 'date': 'Mon, 04 Jan 2021 14:28:07 GMT'}, 'RetryAttempts': 0}}


#### Create  Trials
- 각  trial는 다른 hyperparameters에 대해 학습하는 과정을 나타냅니다. 

In [11]:
trial_name = f"{int(time.time())}-{experiment_name}"
    
train_trial = Trial.create(
    trial_name=trial_name, 
    experiment_name=experiment_name,
    sagemaker_boto_client=sm,
)

with Tracker.create(display_name="dataset-info", sagemaker_boto_client=sm) as tracker:
    tracker.log_parameters({
        "dataset": "oxford_iiit_pet",
        "resize" : 128
    })
    # we can log the s3 uri to the dataset we just uploaded
    tracker.log_input(name="oxford_iiit_pet/3.1.0", media_type="s3/uri", value=inputs)
    
# associate the proprocessing trial component with the current trial
train_trial.add_trial_component(tracker.trial_component)

### SageMaker Debugger
- Training job에서 캡쳐하는 tensor 데이터를 모니터링, 기록 및 분석하여 훈련의 가시성을 높이는 기능 제공
- 2단계 프로세스로 동작하는 smdebug 라이브러리 활용
  1. tensors(및 scalar) 저장 : 특정 순간의 training job 상태 정의하며, 이러한 tensor를 캡쳐하고 분석하기 위해 저장 가능한 라이브러리 제공
  2. 분석 : 저장된 tensor는 사전 패키지로 제공되는 rules에 의해 캡처가 되어 조건에 따라 분석이 됨
<center><img src="https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/images/how-debugger-works-4.png" width="700" height="500"></center>
- hook_config 정의 : https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md

#### Create Hook

In [12]:
## artifacts 위치
training_job_name = "{}-imgsegmentation-training-job".format(int(time.time()))

tensorboard_output= 's3://{}/{}/{}'.format(bucket, training_job_name, 'Tensorboard')
debugger_output_path = 's3://{}/{}/output/debug'.format(bucket, training_job_name)
print('input spec: \n{}  \n{}'.format(tensorboard_output,debugger_output_path))

input spec: 
s3://sagemaker-ap-northeast-2-322537213286/1609770489-imgsegmentation-training-job/Tensorboard  
s3://sagemaker-ap-northeast-2-322537213286/1609770489-imgsegmentation-training-job/output/debug


In [13]:
hook_config = DebuggerHookConfig(
    s3_output_path=debugger_output_path,
    hook_parameters={
        "save_interval": "40"
    },
    collection_configs=[
        CollectionConfig("weights"),
        CollectionConfig("biases"),
    ]
)

#### Define Rules

https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html

In [14]:
rules = [
        Rule.sagemaker(rule_configs.vanishing_gradient()),
        Rule.sagemaker(rule_configs.loss_not_decreasing()),
]

In [15]:
hyperparameters = {
        'OUTOUT_CHANNELS' : 3,
        'RESIZE_WIDTH' : 128,
        'RESIZE_HEIGHT' : 128,
        'EPOCHS' : 35,
        'VAL_SUBSPLITS': 5,
        'BATCH_SIZE': 32,
        'BUFFER_SIZE': 1000,
        'DATASET_NAME': 'oxford_iiit_pet',
        'SAVE_INTERVAL' : 3
    }

In [16]:
estimator = TensorFlow(entry_point='image_segmentation_base.py',
                       source_dir='source_dir',
                       role=role,
                       instance_count=1,
                       volume_size=400,
                       instance_type='ml.p3.16xlarge',  # local_mode 수행 시 주석 처리
#                        train_instance_type='local',  # local_mode 수행 시 사용
#                        use_spot_instances=True,  # spot instance 활용
                       max_run=12*60*60,         # spot instance 활용
#                        max_wait=12*60*60,        # spot instance 활용
                       framework_version='2.1.0',
                       py_version='py3',
                       hyperparameters=hyperparameters,
                       tensorboard_output_config=TensorBoardOutputConfig(tensorboard_output),
                       rules = rules,
                       debugger_hook_config=hook_config,
                       metric_definitions=[
                             {'Name': 'train:loss', 'Regex': 'Train loss: ([0-9\\.]+)'},
                             {'Name': 'train:accuracy', 'Regex': 'Val loss: ([0-9\\.]+)'}
                       ],
                       enable_sagemaker_metrics=True
                      )

## Calling ``fit``

To start a training job, we call `estimator.fit(training_data_uri)`.

An S3 location is used here as the input. `fit` creates a default channel named `'training'`, which points to this S3 location. In the training script we can then access the training data from the location stored in `SM_CHANNEL_TRAINING`. `fit` accepts a couple other types of input as well. See the API doc [here](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.EstimatorBase.fit) for details.

When training is complete, the training job will upload the saved model for TensorFlow serving.

In [17]:
estimator.fit(
    inputs = {'training': inputs},
    job_name=training_job_name,
    logs='All',
    experiment_config={
            "TrialName": train_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        },
    wait=False
)

INFO:sagemaker:Creating training-job with name: 1609770489-imgsegmentation-training-job


ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit 'ml.p3dn.24xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.

In [None]:
sagemaker_session.logs_for_job(estimator.latest_training_job.name, wait=True)

After training is complete, it is always a good idea to take a look at training curves to diagnose problems, if any, during training and determine the representativeness of the training and validation datasets. We can do this with TensorBoard, and also with the Keras API: conveniently, the Keras fit invocation returns a data structure with the training history. In our training script, this history is saved on the lead training node, then uploaded with the model when training is complete.

To retrieve the history, we first download the model locally, then unzip it to gain access to the history data structure. We can then simply load the history as JSON:

In [None]:
artifacts_dir = estimator.model_data.replace('model.tar.gz', '')
print(artifacts_dir)
!aws s3 ls --human-readable {artifacts_dir}

In [None]:
!rm -rf ./artifacts/
!rm -rf ./models/

In [None]:
import json , os

path = './models'
if not os.path.exists(path):
    os.makedirs(path)

!aws s3 cp {artifacts_dir}model.tar.gz {path}/model.tar.gz
!tar -xzf {path}/model.tar.gz -C {path}

In [None]:
import json , os

path = './artifacts'
if not os.path.exists(path):
    os.makedirs(path)

!aws s3 cp {artifacts_dir}output.tar.gz {path}/output.tar.gz
!tar -xzf {path}/output.tar.gz -C {path}

with open(os.path.join(path, 'model_history.p'), "r") as f:
    model_history = json.load(f)

Now we can plot the history with two graphs, one for accuracy and another for loss. Each graph shows the results for both the training and validation datasets. Although training is a stochastic process that can vary significantly between training jobs, overall you are likely to see that the training curves are converging smoothly and steadily to higher accuracy and lower loss, while the validation curves are more jagged. This is due to the validation dataset being relatively small and thus not as representative as the training dataset.

In [None]:
import matplotlib.pyplot as plt

def plot_training_curves(history): 
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharex=True)
    ax = axes[0]
    ax.plot(history['accuracy'], label='train')
    ax.plot(history['val_accuracy'], label='validation')
    ax.set(
        title='model accuracy',
        ylabel='accuracy',
        xlabel='epoch')
    ax.legend()

    ax = axes[1]
    ax.plot(history['loss'], label='train')
    ax.plot(history['val_loss'], label='validation')
    ax.set(
        title='model loss',
        ylabel='loss',
        xlabel='epoch')
    ax.legend()
    fig.tight_layout()
    
plot_training_curves(model_history)

In [None]:
def display(display_list, title):
    plt.figure(figsize=(15, 15))
    
    for i in range(len(display_list)):
        plt.subplot(1, len(display_list), i+1)
        plt.title(title[i])
        plt.imshow(Image.open(display_list[i]))
        plt.axis('off')
    plt.show()

In [None]:
title = ['Input Image', 'True Mask' ,'Predicted Mask']

if hyperparameters['EPOCHS']%hyperparameters['SAVE_INTERVAL'] == 0:
    err = hyperparameters['SAVE_INTERVAL']
else:
    err = hyperparameters['EPOCHS']%hyperparameters['SAVE_INTERVAL']

last_epoch = hyperparameters['EPOCHS'] - err

sample_image = os.path.join(path, 'sample_image.jpg')
real_mask = os.path.join(path, 'sample_mask.png')
predicted_mask = os.path.join(path, str(last_epoch) +'-predicted_mask.png')
display([sample_image, real_mask, predicted_mask], title)


predict_imgs = []
titles = []

for i in range(0,last_epoch+1,hyperparameters['SAVE_INTERVAL']):
    predict_mask = os.path.join(path, str(i) +'-predicted_mask.png')
    predict_imgs.append(predict_mask)
    titles.append('epoch -' +str(i))
display(predict_imgs, titles)

In [None]:
import time

description = sm.describe_training_job(TrainingJobName=estimator.latest_training_job.name)
# description = sm.describe_training_job(TrainingJobName=job_name)
print('Debug Hook configuration: ')
print(description['DebugHookConfig'])
print()
print('Debug rules configuration: ')
print(description['DebugRuleConfigurations'])
print()
print('Training job status')
print(description['TrainingJobStatus'])

In [None]:
sm_sess = sagemaker.Session()
sm_sess.logs_for_job(estimator.latest_training_job.name, wait=True)

In [None]:
import time


iterate = True
while(iterate):
    description = sm.describe_training_job(TrainingJobName=estimator.latest_training_job.name)
    eval_status_1 = description['DebugRuleEvaluationStatuses'][0]
    eval_status_2 = description['DebugRuleEvaluationStatuses'][1]
    print(eval_status_1)
    print(eval_status_2)
    if eval_status_1['RuleEvaluationStatus'] != 'InProgress' or eval_status_2['RuleEvaluationStatus'] != 'InProgress':
        iterate = False
    else:
        time.sleep(60)

In [None]:
processing_job_arn = eval_status_1['RuleEvaluationJobArn']
processing_job_name = processing_job_arn[processing_job_arn.rfind('/') + 1 :]
print(processing_job_name)

client = sm_sess.sagemaker_client
descr = client.describe_processing_job(ProcessingJobName=processing_job_name)
descr

In [None]:
tensors_path = estimator.latest_job_debugger_artifacts_path()

import smdebug.trials as smd
trial = smd.create_trial(path=tensors_path)
print(f"Saved these tensors: {trial.tensor_names()}")

In [None]:
print(f"Loss values during evaluation were {trial.tensor('val_accuracy').values()}")



# Deploy the trained model to an endpoint

The `deploy()` method creates a SageMaker model, which is then deployed to an endpoint to serve prediction requests in real time. We will use the TensorFlow Serving container for the endpoint, because we trained with script mode. This serving container runs an implementation of a web server that is compatible with SageMaker hosting protocol. The [Using your own inference code]() document explains how SageMaker runs inference containers.

In [None]:
predictor = estimator.deploy(initial_instance_count=1, 
                             instance_type='ml.t2.medium', 
                             endpoint_name=training_job_name + '-t2medium')

# Invoke the endpoint (without inference.py)

Let's download the training data and use that as input for inference.

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt

In [None]:
def input_handler(img_path):
    import tensorflow as tf
    
    sample_img = cv2.imread(img_path)
    sample_img = cv2.resize(sample_img, dsize=(128, 128), interpolation=cv2.INTER_CUBIC)
    sample_img = np.float32(sample_img)
    sample_img = np.expand_dims(sample_img,axis=0)
    sample_img = sample_img / 255.0 
    return sample_img

In [None]:
def output_handler(pred_mask):
    import tensorflow as tf
    
    pred_mask=np.array(pred_mask)
    pred_mask = tf.argmax(pred_mask, axis=-1)
    pred_mask = pred_mask[..., tf.newaxis]
    pred_mask = pred_mask[0]
    pred_mask = tf.keras.preprocessing.image.array_to_img(pred_mask)
    return pred_mask

In [None]:
img_path = '/home/ec2-user/tensorflow_datasets/downloads/extracted/TAR.robots.ox.ac.uk_vgg_pets_imagesZxlcXhwB8atfm2pdIrjCelgNiW7ORYkX5h1Fkzf6MY0.tar.gz/images/Abyssinian_1.jpg'

In [None]:
Image.open(img_path)

In [None]:
%time
sample_img = input_handler(img_path)
predictions = predictor.predict(sample_img)
pred_mask=output_handler(predictions['predictions'])
plt.imshow(pred_mask)
plt.axis('off')
plt.show()

The formats of the input and the output data correspond directly to the request and response formats of the `Predict` method in the [TensorFlow Serving REST API](https://www.tensorflow.org/serving/api_rest). SageMaker's TensforFlow Serving endpoints can also accept additional input formats that are not part of the TensorFlow REST API, including the simplified JSON format, line-delimited JSON objects ("jsons" or "jsonlines"), and CSV data.

In this example we are using a `numpy` array as input, which will be serialized into the simplified JSON format. In addtion, TensorFlow serving can also process multiple items at once as you can see in the following code. You can find the complete documentation on how to make predictions against a TensorFlow serving SageMaker endpoint [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#making-predictions-against-a-sagemaker-endpoint).

# Invoke the endpoint (with inference)

In [None]:
import cv2
import numpy as np
import json
import io
import base64
import matplotlib.pyplot as plt

from sagemaker.tensorflow.serving import Model, Predictor
%matplotlib inline

### make custom-serving-image
<p> 현재 tensorflow serving 이미지에는 opencv를 지원하지 않고 있습니다. 이에 따라 별도로 opencv가 설치된 custom serving container를 생성해야 합니다. 이를 위해 custom-serving-container 폴더에서 Dockerfile을 이용하여 custom container image를 생성합니다.</p>
<p>이미지를 생성하기 전에 작업하시는 region의 ECR 서비스에서 리포지토리를 만들어야 합니다.
저는 리포지토리를 tensorflow200-opencv341-inference-eia 이름으로 us-east-2에서 작업을 하는 것으로 가정하고 아래와 같이 수행하였습니다.</p>

<p> 기본적으로 제공되는 <strong><a href="https://github.com/aws/deep-learning-containers/blob/master/available_images.md" target="_blank" class ='btn-default'>Docker image</a></strong>는 다음과 같습니다. </p>

<p>아래는 docker와 <strong><a href="https://docs.aws.amazon.com/ko_kr/cli/latest/userguide/cli-chap-install.html" target="_blank" class ='btn-default'>aws cli</a></strong>가 동작하는 터미널 환경에서 작업을 수행하시기 바랍니다. </p>
    
<pre>
<code>
   - $(aws ecr get-login --no-include-email --registry-ids 763104351884 --region us-east-2) 
   - docker build -f Dockerfile.eia -t tensorflow200-opencv341-inference-eia:2.0.0-cpu . 
   - $(aws ecr get-login --no-include-email)
   - docker image tag tensorflow200-opencv341-inference-eia:2.0.0-cpu [내계정].dkr.ecr.us-east-2.amazonaws.com/tensorflow200-opencv341-inference-eia:2.0.0-cpu 
   - docker push [내계정].dkr.ecr.us-east-2.amazonaws.com/tensorflow200-opencv341-inference-eia:2.0.0-cpu
</code>
</pre>

<p> 각 환경에 맞게 container_image와, training_job, model_path를 셋팅합니다.</p>

In [None]:
container_image = 'XXXXXXXXXX.dkr.ecr.us-east-2.amazonaws.com/tensorflow200-opencv341-inference-eia:2.0.0-cpu'
training_job = '[TTTTTTTT]-SkinCare-training-job'
model_path='s3://sagemaker-us-east-2-XXXXXXXXXXXXXX/[TTTTTTTT]-training-job/output/model.tar.gz'

<p>Model과 endpoint를 아래와 같이 생성합니다. </p>

In [None]:
model = Model(model_data=model_path, role=role, framework_version='2.0.0',  entry_point='inference.py', source_dir='./source_dir', image=container_image)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.t2.medium',accelerator_type='ml.eia2.medium', endpoint_name=training_job + '-t2me-eia2-invoke')

In [None]:
runtime_client = boto3.client('runtime.sagemaker')
accept_content_type='application/json'
input_content_type='application/x-image'

endpoint_name='1590218539-SkinCare-training-job-t2me-eia2-invoke'

## 테스트할 이미지
image_path='/home/ec2-user/tensorflow_datasets/downloads/extracted/TAR.robots.ox.ac.uk_vgg_pets_imagesZxlcXhwB8atfm2pdIrjCelgNiW7ORYkX5h1Fkzf6MY0.tar.gz/images/Abyssinian_1.jpg'

with open(image_path, mode='rb') as file:
    img = file.read()

file_byte_string = base64.encodebytes(img).decode("utf-8")

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Accept=accept_content_type,
    ContentType=input_content_type,
    Body=file_byte_string
)

result = json.loads(response['Body'].read().decode())
image= io.BytesIO(base64.decodebytes(result.encode('utf-8')))
pred_mask = Image.open(image)

plt.imshow(pred_mask)
plt.axis('off')
plt.show()

# Delete the endpoint

Let's delete the endpoint we just created to prevent incurring any extra costs.

In [None]:
# sagemaker.Session().delete_endpoint(predictor.endpoint)