# Hands-on: Training and deploying GluonNLP models on AWS SageMaker

You will learn the following:

- practice fine-tuning BERT for sentiment classification
- exporting models in a self-contained way
- creating a SageMaker Endpoint serving your model

In [1]:
# this notebook requires mxnet-cu101 >= 1.6.0b20191102, gluonnlp >= 0.8.1
# you can create a sagemaker notebook instance with the lifecycle configuration file: sagemaker-lifecycle.config
!pip list | grep mxnet
!pip list | grep gluonnlp

keras-mxnet                        2.2.4.2       
mxnet-cu101                        1.6.0b20191122
mxnet-model-server                 1.0.5         
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
gluonnlp                           0.9.0.dev0    
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import argparse, time
import numpy as np
import mxnet as mx
import gluonnlp as nlp

# Hyperparameters
parser = argparse.ArgumentParser('BERT finetuning')
parser.add_argument('--batch_size', default=32)
parser.add_argument('--num_epochs', default=1)
parser.add_argument('--lr', default=5e-5)
args = parser.parse_args([])

batch_size = args.batch_size
num_epochs = args.num_epochs
lr = args.lr

### Get Pre-trained BERT Model

We can load the pre-trained BERT easily using the model API in GluonNLP, which returns the vocabulary along with the model. We include the pooler layer of the pre-trained model by setting `use_pooler` to `True`.
The list of pre-trained BERT models available in GluonNLP can be found [here](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

In [3]:
ctx = mx.gpu(0)
bert, vocabulary = nlp.model.get_model('bert_12_768_12', # the 12-layer BERT Base model
                                        dataset_name='book_corpus_wiki_en_uncased',
                                        # use pre-trained weights
                                        pretrained=True, ctx=ctx,
                                        # decoder and classifier are for pre-training only
                                        use_decoder=False, use_classifier=False)

Now that we have loaded the BERT model, we only need to attach an additional layer for classification.
The `BERTClassifier` class uses a BERT base model to encode sentence representation, followed by a `nn.Dense` layer for classification. We only need to initialize the classification layer. The encoding layers are already initialized with pre-trained weights. 

In [4]:
net = nlp.model.BERTClassifier(bert, num_classes=2)
net.classifier.initialize(ctx=ctx)  # only initialize the classification layer from scratch
net.hybridize()  # compile the model, required for deployment

## Data Preprocessing

To use the pre-trained BERT model, we need to:
- tokenize the inputs into words,
- insert [CLS] at the beginning of a sentence, 
- insert [SEP] at the end of a sentence, and
- generate segment ids

### Data Transformations

We again use the IMDB dataset, but for this time, downloading using the GluonNLP data API. We then use the transform API to transform the raw scores to positive labels and negative labels. 
To process sentences with BERT-style '[CLS]', '[SEP]' tokens, you can use `data.BERTSentenceTransform` API.

In [5]:
train_dataset_raw = nlp.data.IMDB('train')
test_dataset_raw = nlp.data.IMDB('test')

# tokenize texts into words
tokenizer = nlp.data.BERTTokenizer(vocabulary)
# add begin-of-sentence, end-of-sentence tokens and perform vocabulary lookup
transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=128, pair=False, pad=False)

def transform_fn(data):
    # transform texts to tensors
    text, label = data
    # transform label into position / negative
    label = 1 if label >= 5 else 0
    data, length, segment_type = transform([text])
    return data.astype('float32'), length.astype('float32'), segment_type.astype('float32'), label

In [6]:
train_dataset = train_dataset_raw.transform(transform_fn)
test_dataset = test_dataset_raw.transform(transform_fn)

data, length, _, label = train_dataset[0]
print('original sentence = \n{}'.format(train_dataset_raw[0][0]))
print('\nword indices = \n{}'.format(data.astype('int32')))

original sentence = 
Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!

word indices = 
[    2 22953  2213  4381  2152  2003  1037  9476  4038  1012  2009  2743
  2012  1996  2168  2051  2004  2070  2060  3454  2055  2082  2166  1010
  2107  2

### Let's Train the Model

Now we have all the pieces to put together, and we can finally start fine-tuning the
model with a few epochs.

In [7]:
padding_id = vocabulary[vocabulary.padding_token]
batchify_fn = nlp.data.batchify.Tuple(
        nlp.data.batchify.Pad(axis=0, pad_val=padding_id), # words
        nlp.data.batchify.Stack(), # valid length
        nlp.data.batchify.Pad(axis=0, pad_val=0), # segment type
        nlp.data.batchify.Stack(np.float32)) # label

train_data = mx.gluon.data.DataLoader(train_dataset,
                               batchify_fn=batchify_fn, shuffle=True,
                               batch_size=batch_size, num_workers=4)
test_data = mx.gluon.data.DataLoader(test_dataset,
                              batchify_fn=batchify_fn,
                              shuffle=False, batch_size=batch_size, num_workers=4)

In [8]:
from mxnet.gluon.contrib.estimator import TrainBegin, BatchBegin, LoggingHandler


class MyLearningRateHandler(TrainBegin, BatchBegin):
    """Warm-up learning rate handler.

    Parameters
    ----------
    trainer: gluon.Trainer
        Trainer object to adjust the learning rate on.
    num_warmup_steps: int
        Number of initial steps during which the learning rate is linearly
        increased to it's target.
    num_train_steps: int
        Total number of steps to be taken during training. Should be equal to
        the number of batches * number of epochs.
    lr: float
        Base learning rate to reach after warmup.
    """

    def __init__(self, trainer, num_warmup_steps, num_train_steps, lr):
        self.trainer = trainer
        self.num_warmup_steps = num_warmup_steps
        self.num_train_steps = num_train_steps
        self.lr = lr

        self.step_num = 0

    def train_begin(self, estimator, *args, **kwargs):
        self.step_num = 0

    def batch_begin(self, estimator, *args, **kwargs):
        self.step_num += 1
        if self.step_num < self.num_warmup_steps:
            new_lr = self.lr * self.step_num / self.num_warmup_steps
        else:
            non_warmup_steps = self.step_num - self.num_warmup_steps
            offset = non_warmup_steps / (self.num_train_steps - self.num_warmup_steps)
            new_lr = self.lr - offset * self.lr
        self.trainer.set_learning_rate(new_lr)

In [9]:
from mxnet.gluon.contrib import estimator
from mxnet.gluon.utils import split_and_load

class MyEstimator(estimator.Estimator):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # params for grad clipping
        self.params = [p for p in self.net.collect_params().values() if p.grad_req != 'null']
        
    def fit_batch(self, train_batch, batch_axis=0):
        train_batch = [split_and_load(x, ctx_list=self.context, batch_axis=batch_axis) for x in train_batch]
        with mx.autograd.record():
            pred = [self.net(inp, token_type, seq_len) for inp, seq_len, token_type, _ in zip(*train_batch)]
            loss = [self.loss(out, label.astype('float32')) for out, _, _, _, label in zip(pred, *train_batch)]
        mx.autograd.backward(loss)

        # Gradient clipping
        trainer.allreduce_grads()
        nlp.utils.clip_grad_global_norm(self.params, 1)
        trainer.update(1)

        return train_batch[:3], train_batch[3], pred, loss

    def evaluate_batch(self, val_batch, val_metrics, batch_axis=0):
        val_batch = [split_and_load(x, ctx_list=self.context, batch_axis=batch_axis) for x in val_batch]
        pred = [self.net(inp, token_type, seq_len) for inp, seq_len, token_type, _ in zip(*val_batch)]
        label = [l for _, _, _, l in zip(*val_batch)]
        # update metrics
        for metric in val_metrics:
            metric.update(label, pred)

In [10]:
trainer = mx.gluon.Trainer(net.collect_params(), 'bertadam',
                        {'learning_rate': lr, 'wd':0.01})
loss_fn = mx.gluon.loss.SoftmaxCELoss()
metrics = [mx.metric.Loss(), mx.metric.Accuracy()]
lr_handler = MyLearningRateHandler(trainer=trainer, num_warmup_steps=50, lr=5e-5,
                                   num_train_steps = len(train_data) * num_epochs)
logging_handler = LoggingHandler(train_metrics=metrics, verbose=LoggingHandler.LOG_PER_BATCH)
event_handlers = [lr_handler, logging_handler]

est = MyEstimator(net=net, loss=loss_fn, metrics=metrics, trainer=trainer, context=ctx)
est.fit(train_data=train_data, epochs=num_epochs, event_handlers=event_handlers)

Training begin: using optimizer BERTAdam with current learning rate 0.0001 
Train for 1 epochs.
[Epoch 0] Begin, current learning rate: 0.0001
[Epoch 0][Batch 0][Samples 32] time/batch: 4.138s training loss: 0.7866, training accuracy: 0.4062
[Epoch 0][Batch 1][Samples 64] time/batch: 0.277s training loss: 0.7901, training accuracy: 0.4062
[Epoch 0][Batch 2][Samples 96] time/batch: 0.265s training loss: 0.7752, training accuracy: 0.4167
[Epoch 0][Batch 3][Samples 128] time/batch: 0.254s training loss: 0.7569, training accuracy: 0.4531
[Epoch 0][Batch 4][Samples 160] time/batch: 0.261s training loss: 0.7446, training accuracy: 0.4625
[Epoch 0][Batch 5][Samples 192] time/batch: 0.256s training loss: 0.7480, training accuracy: 0.4167
[Epoch 0][Batch 6][Samples 224] time/batch: 0.264s training loss: 0.7369, training accuracy: 0.4375
[Epoch 0][Batch 7][Samples 256] time/batch: 0.254s training loss: 0.7301, training accuracy: 0.4531
[Epoch 0][Batch 8][Samples 288] time/batch: 0.270s training 

[Epoch 0][Batch 80][Samples 2592] time/batch: 0.264s training loss: 0.5644, training accuracy: 0.7022
[Epoch 0][Batch 81][Samples 2624] time/batch: 0.258s training loss: 0.5633, training accuracy: 0.7031
[Epoch 0][Batch 82][Samples 2656] time/batch: 0.253s training loss: 0.5606, training accuracy: 0.7048
[Epoch 0][Batch 83][Samples 2688] time/batch: 0.270s training loss: 0.5606, training accuracy: 0.7057
[Epoch 0][Batch 84][Samples 2720] time/batch: 0.252s training loss: 0.5587, training accuracy: 0.7074
[Epoch 0][Batch 85][Samples 2752] time/batch: 0.262s training loss: 0.5600, training accuracy: 0.7082
[Epoch 0][Batch 86][Samples 2784] time/batch: 0.267s training loss: 0.5579, training accuracy: 0.7101
[Epoch 0][Batch 87][Samples 2816] time/batch: 0.265s training loss: 0.5571, training accuracy: 0.7113
[Epoch 0][Batch 88][Samples 2848] time/batch: 0.259s training loss: 0.5572, training accuracy: 0.7117
[Epoch 0][Batch 89][Samples 2880] time/batch: 0.254s training loss: 0.5550, traini

[Epoch 0][Batch 160][Samples 5152] time/batch: 0.257s training loss: 0.4869, training accuracy: 0.7663
[Epoch 0][Batch 161][Samples 5184] time/batch: 0.264s training loss: 0.4866, training accuracy: 0.7662
[Epoch 0][Batch 162][Samples 5216] time/batch: 0.269s training loss: 0.4873, training accuracy: 0.7659
[Epoch 0][Batch 163][Samples 5248] time/batch: 0.257s training loss: 0.4873, training accuracy: 0.7656
[Epoch 0][Batch 164][Samples 5280] time/batch: 0.260s training loss: 0.4864, training accuracy: 0.7657
[Epoch 0][Batch 165][Samples 5312] time/batch: 0.254s training loss: 0.4872, training accuracy: 0.7654
[Epoch 0][Batch 166][Samples 5344] time/batch: 0.256s training loss: 0.4873, training accuracy: 0.7657
[Epoch 0][Batch 167][Samples 5376] time/batch: 0.255s training loss: 0.4858, training accuracy: 0.7666
[Epoch 0][Batch 168][Samples 5408] time/batch: 0.259s training loss: 0.4853, training accuracy: 0.7666
[Epoch 0][Batch 169][Samples 5440] time/batch: 0.254s training loss: 0.48

[Epoch 0][Batch 240][Samples 7712] time/batch: 0.257s training loss: 0.4553, training accuracy: 0.7841
[Epoch 0][Batch 241][Samples 7744] time/batch: 0.262s training loss: 0.4551, training accuracy: 0.7842
[Epoch 0][Batch 242][Samples 7776] time/batch: 0.254s training loss: 0.4549, training accuracy: 0.7847
[Epoch 0][Batch 243][Samples 7808] time/batch: 0.267s training loss: 0.4546, training accuracy: 0.7850
[Epoch 0][Batch 244][Samples 7840] time/batch: 0.266s training loss: 0.4550, training accuracy: 0.7846
[Epoch 0][Batch 245][Samples 7872] time/batch: 0.269s training loss: 0.4547, training accuracy: 0.7849
[Epoch 0][Batch 246][Samples 7904] time/batch: 0.266s training loss: 0.4542, training accuracy: 0.7852
[Epoch 0][Batch 247][Samples 7936] time/batch: 0.260s training loss: 0.4536, training accuracy: 0.7857
[Epoch 0][Batch 248][Samples 7968] time/batch: 0.270s training loss: 0.4531, training accuracy: 0.7860
[Epoch 0][Batch 249][Samples 8000] time/batch: 0.260s training loss: 0.45

[Epoch 0][Batch 320][Samples 10272] time/batch: 0.259s training loss: 0.4295, training accuracy: 0.7996
[Epoch 0][Batch 321][Samples 10304] time/batch: 0.265s training loss: 0.4292, training accuracy: 0.7997
[Epoch 0][Batch 322][Samples 10336] time/batch: 0.264s training loss: 0.4293, training accuracy: 0.7995
[Epoch 0][Batch 323][Samples 10368] time/batch: 0.259s training loss: 0.4288, training accuracy: 0.7999
[Epoch 0][Batch 324][Samples 10400] time/batch: 0.255s training loss: 0.4282, training accuracy: 0.8003
[Epoch 0][Batch 325][Samples 10432] time/batch: 0.257s training loss: 0.4279, training accuracy: 0.8006
[Epoch 0][Batch 326][Samples 10464] time/batch: 0.259s training loss: 0.4275, training accuracy: 0.8007
[Epoch 0][Batch 327][Samples 10496] time/batch: 0.267s training loss: 0.4277, training accuracy: 0.8009
[Epoch 0][Batch 328][Samples 10528] time/batch: 0.261s training loss: 0.4274, training accuracy: 0.8011
[Epoch 0][Batch 329][Samples 10560] time/batch: 0.261s training 

[Epoch 0][Batch 399][Samples 12800] time/batch: 0.263s training loss: 0.4121, training accuracy: 0.8107
[Epoch 0][Batch 400][Samples 12832] time/batch: 0.259s training loss: 0.4118, training accuracy: 0.8108
[Epoch 0][Batch 401][Samples 12864] time/batch: 0.262s training loss: 0.4113, training accuracy: 0.8111
[Epoch 0][Batch 402][Samples 12896] time/batch: 0.259s training loss: 0.4111, training accuracy: 0.8112
[Epoch 0][Batch 403][Samples 12928] time/batch: 0.266s training loss: 0.4107, training accuracy: 0.8113
[Epoch 0][Batch 404][Samples 12960] time/batch: 0.259s training loss: 0.4104, training accuracy: 0.8114
[Epoch 0][Batch 405][Samples 12992] time/batch: 0.261s training loss: 0.4102, training accuracy: 0.8114
[Epoch 0][Batch 406][Samples 13024] time/batch: 0.256s training loss: 0.4100, training accuracy: 0.8115
[Epoch 0][Batch 407][Samples 13056] time/batch: 0.262s training loss: 0.4096, training accuracy: 0.8117
[Epoch 0][Batch 408][Samples 13088] time/batch: 0.258s training 

[Epoch 0][Batch 478][Samples 15328] time/batch: 0.262s training loss: 0.3992, training accuracy: 0.8180
[Epoch 0][Batch 479][Samples 15360] time/batch: 0.257s training loss: 0.3987, training accuracy: 0.8182
[Epoch 0][Batch 480][Samples 15392] time/batch: 0.258s training loss: 0.3985, training accuracy: 0.8183
[Epoch 0][Batch 481][Samples 15424] time/batch: 0.260s training loss: 0.3980, training accuracy: 0.8185
[Epoch 0][Batch 482][Samples 15456] time/batch: 0.258s training loss: 0.3978, training accuracy: 0.8186
[Epoch 0][Batch 483][Samples 15488] time/batch: 0.269s training loss: 0.3978, training accuracy: 0.8186
[Epoch 0][Batch 484][Samples 15520] time/batch: 0.259s training loss: 0.3980, training accuracy: 0.8187
[Epoch 0][Batch 485][Samples 15552] time/batch: 0.257s training loss: 0.3977, training accuracy: 0.8189
[Epoch 0][Batch 486][Samples 15584] time/batch: 0.261s training loss: 0.3972, training accuracy: 0.8190
[Epoch 0][Batch 487][Samples 15616] time/batch: 0.262s training 

[Epoch 0][Batch 557][Samples 17856] time/batch: 0.256s training loss: 0.3891, training accuracy: 0.8236
[Epoch 0][Batch 558][Samples 17888] time/batch: 0.258s training loss: 0.3891, training accuracy: 0.8237
[Epoch 0][Batch 559][Samples 17920] time/batch: 0.256s training loss: 0.3890, training accuracy: 0.8238
[Epoch 0][Batch 560][Samples 17952] time/batch: 0.264s training loss: 0.3889, training accuracy: 0.8239
[Epoch 0][Batch 561][Samples 17984] time/batch: 0.256s training loss: 0.3888, training accuracy: 0.8238
[Epoch 0][Batch 562][Samples 18016] time/batch: 0.260s training loss: 0.3890, training accuracy: 0.8239
[Epoch 0][Batch 563][Samples 18048] time/batch: 0.259s training loss: 0.3887, training accuracy: 0.8240
[Epoch 0][Batch 564][Samples 18080] time/batch: 0.262s training loss: 0.3884, training accuracy: 0.8240
[Epoch 0][Batch 565][Samples 18112] time/batch: 0.257s training loss: 0.3884, training accuracy: 0.8240
[Epoch 0][Batch 566][Samples 18144] time/batch: 0.256s training 

[Epoch 0][Batch 636][Samples 20384] time/batch: 0.256s training loss: 0.3795, training accuracy: 0.8283
[Epoch 0][Batch 637][Samples 20416] time/batch: 0.260s training loss: 0.3794, training accuracy: 0.8284
[Epoch 0][Batch 638][Samples 20448] time/batch: 0.256s training loss: 0.3789, training accuracy: 0.8286
[Epoch 0][Batch 639][Samples 20480] time/batch: 0.261s training loss: 0.3788, training accuracy: 0.8288
[Epoch 0][Batch 640][Samples 20512] time/batch: 0.258s training loss: 0.3785, training accuracy: 0.8289
[Epoch 0][Batch 641][Samples 20544] time/batch: 0.276s training loss: 0.3782, training accuracy: 0.8290
[Epoch 0][Batch 642][Samples 20576] time/batch: 0.271s training loss: 0.3781, training accuracy: 0.8291
[Epoch 0][Batch 643][Samples 20608] time/batch: 0.256s training loss: 0.3781, training accuracy: 0.8290
[Epoch 0][Batch 644][Samples 20640] time/batch: 0.258s training loss: 0.3778, training accuracy: 0.8291
[Epoch 0][Batch 645][Samples 20672] time/batch: 0.261s training 

[Epoch 0][Batch 715][Samples 22912] time/batch: 0.259s training loss: 0.3683, training accuracy: 0.8340
[Epoch 0][Batch 716][Samples 22944] time/batch: 0.262s training loss: 0.3680, training accuracy: 0.8342
[Epoch 0][Batch 717][Samples 22976] time/batch: 0.256s training loss: 0.3678, training accuracy: 0.8344
[Epoch 0][Batch 718][Samples 23008] time/batch: 0.260s training loss: 0.3678, training accuracy: 0.8344
[Epoch 0][Batch 719][Samples 23040] time/batch: 0.261s training loss: 0.3679, training accuracy: 0.8344
[Epoch 0][Batch 720][Samples 23072] time/batch: 0.266s training loss: 0.3678, training accuracy: 0.8343
[Epoch 0][Batch 721][Samples 23104] time/batch: 0.258s training loss: 0.3679, training accuracy: 0.8342
[Epoch 0][Batch 722][Samples 23136] time/batch: 0.278s training loss: 0.3677, training accuracy: 0.8343
[Epoch 0][Batch 723][Samples 23168] time/batch: 0.261s training loss: 0.3675, training accuracy: 0.8344
[Epoch 0][Batch 724][Samples 23200] time/batch: 0.260s training 

### Validation and Inference

In [11]:
val_metric = mx.metric.Accuracy()
est.evaluate(test_data, val_metrics=[val_metric])
print('Validation {} = {}'.format(*val_metric.get()))

Validation accuracy = 0.879


In [12]:
def predict_sentiment(net, ctx, transform, sentence):
    ctx = ctx[0] if isinstance(ctx, list) else ctx
    inputs, seq_len, token_types = transform([sentence])
    inputs = mx.nd.array([inputs], ctx=ctx)
    token_types = mx.nd.array([token_types], ctx=ctx)
    seq_len = mx.nd.array([seq_len], ctx=ctx)
    out = net(inputs, token_types, seq_len)
    label = mx.nd.argmax(out, axis=1)
    return 'positive' if label.asscalar() == 1 else 'negative'

In [13]:
predict_sentiment(net, ctx, transform, 'this movie is so great')

'positive'

## Deploy on SageMaker

1. Model parameters
2. Code with data pre-processing and model inference
3. A docker container with dependencies installed
4. Launch a serving end-point with SageMaker SDK

### 1. Save Model Parameters

In [14]:
# save parameters, model definition and vocabulary in a zip file
net.export('checkpoint')
with open('vocab.json', 'w') as f:
    f.write(vocabulary.to_json())
import tarfile
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("checkpoint-0000.params") 
    tar.add("checkpoint-symbol.json") 
    tar.add("vocab.json")

### 2. the Code for Inference

Two functions: 
1. model_fn() to load model parameters
2. transform_fn() to run model inference given an input

In [15]:
%%writefile serve.py
import json, logging, warnings
import gluonnlp as nlp
import mxnet as mx


def model_fn(model_dir):
    """
    Load the gluon model. Called once when hosting service starts.
    :param: model_dir The directory where model files are stored.
    :return: a Gluon model, and the vocabulary
    """
    prefix = 'checkpoint'
    net = mx.gluon.nn.SymbolBlock.imports(prefix + '-symbol.json',
                                          ['data0', 'data1', 'data2'],
                                          prefix + '-0000.params')
    net.load_parameters('%s/' % model_dir + prefix + '-0000.params',
                        ctx=mx.cpu())
    vocab_json = open('%s/vocab.json' % model_dir).read()
    vocab = nlp.Vocab.from_json(vocab_json)
    tokenizer = nlp.data.BERTTokenizer(vocab)
    transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=128,
                                               pair=False, pad=False)
    return net, vocab, transform


def transform_fn(model, data, input_content_type, output_content_type):
    """
    Transform a request using the Gluon model. Called once per request.
    :param model: The Gluon model and the vocab
    :param data: The request payload.
    :param input_content_type: The request content type.
    :param output_content_type: The (desired) response content type.
    :return: response payload and content type.
    """
    # we can use content types to vary input/output handling, but
    # here we just assume json for both
    net, vocabulary, transform = model
    sentence = json.loads(data)
    result = predict_sentiment(net, mx.cpu(), transform, sentence)
    response_body = json.dumps(result)
    return response_body, output_content_type


def predict_sentiment(net, ctx, transform, sentence):
    ctx = ctx[0] if isinstance(ctx, list) else ctx
    inputs, seq_len, token_types = transform([sentence])
    inputs = mx.nd.array([inputs], ctx=ctx)
    token_types = mx.nd.array([token_types], ctx=ctx)
    seq_len = mx.nd.array([seq_len], ctx=ctx)
    out = net(inputs, token_types, seq_len)
    label = mx.nd.argmax(out, axis=1)
    return 'positive' if label.asscalar() == 1 else 'negative'

Overwriting serve.py


### 3. Build a Docker Container for Serving

Let's prepare a docker container with all the dependencies required for model inference. Here we build a docker container based on the SageMaker MXNet inference container, and you can find the list of all available inference containers at https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

Here we use local mode for demonstration purpose. To deploy on actual instances, you need to login into AWS elastic container registry (ECR) service, and push the container to ECR. 

```
docker build -t $YOUR_EDR_DOCKER_TAG . -f Dockerfile
$(aws ecr get-login --no-include-email --region $YOUR_REGION)
docker push $YOUR_EDR_DOCKER_TAG
```

In [16]:
%%writefile Dockerfile

ARG REGION
FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/mxnet-inference:1.4.1-gpu-py3

RUN pip install --upgrade --user --pre 'mxnet-mkl' 'https://github.com/dmlc/gluon-nlp/tarball/v0.9.x'

RUN pip list | grep mxnet

COPY *.py /opt/ml/model/code/

Overwriting Dockerfile


In [17]:
!export REGION=$(wget -qO- http://169.254.169.254/latest/meta-data/placement/availability-zone) &&\
 docker build --no-cache --build-arg REGION=${REGION::-1} -t my-docker:inference . -f Dockerfile

Sending build context to Docker daemon  845.1MB
Step 1/5 : ARG REGION
Step 2/5 : FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/mxnet-inference:1.4.1-gpu-py3
 ---> d9dd4dcfe0c2
Step 3/5 : RUN pip install --upgrade --user --pre 'mxnet-mkl' 'https://github.com/dmlc/gluon-nlp/tarball/v0.9.x'
 ---> Running in 6429cf000b5e
Collecting https://github.com/dmlc/gluon-nlp/tarball/v0.9.x
  Downloading https://github.com/dmlc/gluon-nlp/tarball/v0.9.x (2.4MB)
Collecting mxnet-mkl
  Downloading https://files.pythonhosted.org/packages/64/72/c5566aabde6ee0bda1f09d026603169a717dbd9f26f6be85ee2b4ed2cf03/mxnet_mkl-1.6.0b20191025-py2.py3-none-manylinux1_x86_64.whl (64.9MB)
[91mERROR: mxnet-mkl 1.6.0b20191025 has requirement numpy<2.0.0,>1.16.0, but you'll have numpy 1.14.6 which is incompatible.
[0mInstalling collected packages: mxnet-mkl, gluonnlp
  Running setup.py install for gluonnlp: started
    Running setup.py install for gluonnlp: finished with status 'done'
Successfully installed gluonnlp-0.9.

## Use SageMaker SDK to Deploy the Model

We create a MXNet model which can be deployed later, by specifying the docker image, and entry point for the inference code. If serve.py does not work, use dummy_hosting_module.py for debugging purpose. 

In [23]:
import sagemaker
from sagemaker.mxnet.model import MXNetModel
sagemaker_model = MXNetModel(model_data='file:///home/ec2-user/SageMaker/reinvent19-gluonnlp/tutorial/model.tar.gz',
                             image='my-docker:inference', # docker images
                             role=sagemaker.get_execution_role(), 
                             py_version='py3',            # python version
                             entry_point='serve.py',
                             source_dir='.')

We use 'local' mode to test our deployment code, where the inference happens on the current instance.
If you are ready to deploy the model on a new instance, change the `instance_type` argument to values such as `ml.c4.xlarge`

In [24]:
# Here we use 'local' mode for testing, for real instances use c5.2xlarge, p2.xlarge, etc
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='local')

Attaching to tmpftop30rv_algo-1-f5zzv_1
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:10,714 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-f5zzv_1  |[0m MMS Home: /usr/local/lib/python3.6/site-packages
[36malgo-1-f5zzv_1  |[0m Current directory: /
[36malgo-1-f5zzv_1  |[0m Temp directory: /home/model-server/tmp
[36malgo-1-f5zzv_1  |[0m Number of GPUs: 0
[36malgo-1-f5zzv_1  |[0m Number of CPUs: 8
[36malgo-1-f5zzv_1  |[0m Max heap size: 13646 M
[36malgo-1-f5zzv_1  |[0m Python executable: /usr/local/bin/python3.6
[36malgo-1-f5zzv_1  |[0m Config file: /etc/sagemaker-mms.properties
[36malgo-1-f5zzv_1  |[0m Inference address: http://0.0.0.0:8080
[36malgo-1-f5zzv_1  |[0m Management address: http://127.0.0.1:8081
[36malgo-1-f5zzv_1  |[0m Model Store: /.sagemaker/mms/models
[36malgo-1-f5zzv_1  |[0m Initial Models: ALL
[36malgo-1-f5zzv_1  |[0m Log dir: /logs
[36malgo-1-f5zzv_1  |[0m Metrics dir: /logs
[36malgo-1-f5zzv_1  |[0m Netty threads: 0
[36malgo-1-

[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,017 [INFO ] W-9006-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2960
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,027 [INFO ] W-9002-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2979
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,045 [INFO ] W-9001-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2999
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,063 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3011
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,065 [INFO ] W-9004-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2997
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,072 [INFO ] W-9007-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3027
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:14,078 [INFO ] W-9003-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 3022
[36malgo-1-f5zzv_1 

In [25]:
output = predictor.predict('The model is deployed. Great!')  
print('\nPrediction output: {}\n\n'.format(output))

[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:27,000 [WARN ] W-9006-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 	data0: None
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:27,000 [WARN ] W-9006-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   input_sym_arg_type = in_param.infer_type()[0]
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:28,077 [INFO ] W-9006-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1100
[36malgo-1-f5zzv_1  |[0m 2019-12-04 04:55:28,077 [INFO ] W-9006-model ACCESS_LOG - /172.18.0.1:35848 "POST /invocations HTTP/1.1" 200 1104

Prediction output: positive




### Clean Up

Remove the endpoint after we are done. 

In [26]:
predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


# Resources
- Amazon SageMaker https://aws.amazon.com/sagemaker/
- Amazon SageMaker Python SDK https://sagemaker.readthedocs.io/
- GluonNLP http://gluon-nlp.mxnet.io/
- GluonCV http://gluon-cv.mxnet.io/
- GluonTS https://gluon-ts.mxnet.io/
- Dive into Deep Learning http://d2l.ai/
- MXNet Forum https://discuss.mxnet.io/

For more fine-tuning scripts, visit the [BERT model zoo webpage](http://gluon-nlp.mxnet.io/model_zoo/bert/index.html).

## References

[1] Devlin, Jacob, et al. "Bert:
Pre-training of deep
bidirectional transformers for language understanding."
arXiv preprint
arXiv:1810.04805 (2018).

[2] Dolan, William B., and Chris
Brockett.
"Automatically constructing a corpus of sentential paraphrases."
Proceedings of
the Third International Workshop on Paraphrasing (IWP2005). 2005.

[3] Peters,
Matthew E., et al. "Deep contextualized word representations." arXiv
preprint
arXiv:1802.05365 (2018).

[4] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (gelus)." arXiv preprint arXiv:1606.08415 (2016).