# AWS Machine Learning Purpose-built Accelerators Tutorial
## Learn how to use [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/) and [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) with [Amazon SageMaker](https://aws.amazon.com/sagemaker/), to optimize your ML workload
## Part 3/3 - Compiling and deploying a Bert model to AWS Inferentia1 with SageMaker + [Hugging Face Optimum Neuron](https://huggingface.co/docs/optimum-neuron/index)

**SageMaker studio Kernel: PyTorch 1.13 Python 3.9 CPU - ml.t3.medium** 

In this tutorial, you'll learn how to compile a model to AWS Inferentia and then deploy it to a SageMaker real-time endpoint powered by AWS Inferentia1. First we'll kick-off a SageMaker job to compile the model. We need to do this once. After that, we can deploy our model to a SageMaker endpoint and finally get some predictions.

In section 02, you extract some metadata from the Optimum Neuron API and render a table with the current tested/supported models (similar models not listed there can also be compatible, but you need to check by yourself). This table is important for you to understand which models can be selected for deployment. However, if you also need to fine-tune your model, check a similar table in the notebook **Part 2** to see which models can be fine-tuned with AWS Trainium using HF Optimum Neuron. That way you can plan your end2end solution and start implementing it right now.

## 1) Install some required packages

In [None]:
%pip install -U sagemaker

## 2) Supported models/tasks

Models with **[TP]** after the name support Tensor Parallelism

In [None]:
from IPython.display import Markdown, display

display(Markdown("../docs/optimum_neuron_models.md"))

## 3) Prepare the model to deploy to Inferentia 1

In [None]:
import os
import boto3
import shutil
import sagemaker

print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
region = sess.boto_region_name

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {region}")

### 3.1) Model compilation instructions

We'll not compile the model now, given it takes some time. However, you can see below the steps required to prepare the model before deploying it to Inferentia1

#### inference.py
```python
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import os
os.environ['NEURON_RT_NUM_CORES'] = '1'
import json
import torch
from optimum.neuron import NeuronModelForSequenceClassification
from transformers import AutoTokenizer

def model_fn(model_dir, context=None):
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    model = NeuronModelForSequenceClassification.from_pretrained(model_dir)
    return model,tokenizer

def input_fn(input_data, content_type, context=None):
    if content_type == 'application/json':
        req = json.loads(input_data)
        prompt = req.get('prompt')
        if prompt is None or len(prompt) < 3:
            raise("Invalid prompt. Provide an input like: {'prompt': 'text text text'}")
        return prompt
    else:
        raise Exception(f"Unsupported mime type: {content_type}. Supported: application/json")

def predict_fn(input_object, model_tokenizer, context=None):
    try:
        model,tokenizer = model_tokenizer
        inputs = tokenizer(input_object, truncation=True, return_tensors="pt")
        logits = model(**inputs).logits
        idx = logits.argmax(1, keepdim=True)
        conf = torch.gather(logits, 1, idx)
        return torch.cat([idx,conf], 1)
    except Exception as e:
        print(e)
        return None
```
#### requirements.txt
```python
optimum-neuron==0.0.10
```

#### commands
```python
# download trained model
!aws s3 cp $checkpoint_s3_uri /tmp/
# extract the content to a local dir
!mkdir -p bert_spam
!tar -xzvf /tmp/model.tar.gz -C bert_spam
# run the compiler
!optimum-cli export neuron \
    --model ./model \
    --sequence_length 512 \
    --disable-validation \
    --dynamic-batch-size \
    --batch_size 1 \
    --task text-classification \
    ./neuron_model
# create a .tar file
!mkdir -p neuron_model/code
# copy inference.py and requirements.txt to neuron_model/code
!cd neuron_model && tar -czvf ../model.tar.gz
```

#### model.tar.gz structure
```text
 |- config.json
 |- model.neuron
 |- code/
 |  |- inference.py
 |  |- requirements.txt
```


### 3.2) Download a pre-compiled model

In [None]:
import urllib.request
cache_url = 'https://d1bu2r8jxe4p17.cloudfront.net/models/bert_base_uncased_1_512_dyn_inf1.tar.gz'
urllib.request.urlretrieve(cache_url, "model.tar.gz")
# now we upload the model to our S3 bucket
model_data = sess.upload_data('model.tar.gz', bucket=bucket, key_prefix=f'models/bert-spam')
print(model_data)

## 4) Deploy a SageMaker real-time endpoint

In [None]:
import logging
from sagemaker.utils import name_from_base
from sagemaker.pytorch.model import PyTorchModel

# depending on the inf1 instance you deploy the model you'll have more or less accelerators
# we'll ask SageMaker to launch 1 worker per core

pytorch_model = PyTorchModel(    
    image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-inference-neuron:1.13.1-neuron-py310-sdk2.12.0-ubuntu20.04",
    model_data=model_data,
    role=role,
    name=name_from_base('bert-spam-classifier'),
    sagemaker_session=sess,
    container_log_level=logging.DEBUG,
    model_server_workers=4, # 1 worker per core
    framework_version="1.13.1",
    env = {
        'SAGEMAKER_MODEL_SERVER_TIMEOUT' : '3600' 
    }
    # for production it is important to define vpc_config and use a vpc_endpoint
    #vpc_config={
    #    'Subnets': ['<SUBNET1>', '<SUBNET2>'],
    #    'SecurityGroupIds': ['<SECURITYGROUP1>', '<DEFAULTSECURITYGROUP>']
    #}
)
pytorch_model._is_compiled_model = True

In [None]:
predictor = pytorch_model.deploy(
    initial_instance_count=1,
    instance_type='ml.inf1.xlarge'
)

## 5) Run a simple test

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

In [None]:
import time

labels={0: "not spam", 1: "spam"}
not_spam=" Deezer.com 10,406,168 Artist DB\n\nWe have scraped the Deezer Artist DB, right now there are 10,406,168 listings according to Deezer.com\n\nPlease note in going through part of the list, it is obvious there are mistakes inside their system.\n\nExamples include and Artist with &amp; in its name might also be found with "and" but the Albums for each have different totals etc. Have no clue if there are duplicate albums etc do this error in their system. Even a comma in a name could mean the Artist shows up more than once, I saw in 1 instance that 1 Artist had 6 different ArtistIDs due to spelling errors.\n\nSo what is this DB, very simple, it gives you the ArtistID and the actual name of the Artist in another column. If you want to see the artist you add the baseurl to the ArtistID\n\nAn example is ArtistID 115 is AC/DC\n\n[https://www.deezer.com/us/artist/115](https://www.deezer.com/us/artist/115)\n\nYou do not have to use [https://www.deezer.com/us/artist/](https://www.deezer.com/us/artist/) if your first language is other than English, just see if Deezer supports your language and use that baseref\n\nFrench for example is [https://www.deezer.com/fr/artist/115](https://www.deezer.com/fr/artist/115)\n\nI am providing the DB in 3 different formats:\n\n \n\nI tried posting download links here but it seems Reddit does not like that so get them here:\n\n[https://pastebin\\[DOT\\]com/V3KJbgif](https://pastebin.com/V3KJbgif)\n\n&amp;#x200B;\n\n**Special thanks go to** [**/user/KoalaBear84**](https://www.reddit.com/user/KoalaBear84) **for writing the scraper.**\n\n&amp;#x200B;\n\n**Cross Posted to related Reddit Groups**"
spam="🚨 ATTENTION ALL USERS! 🚨\n\n🆘 Are you looking for a way to GET RICH QUICK? 🆘\n\n💰 Don't waste your time with boring old jobs! 💰\n\n💸 Join our CRAZY MONEY-MAKING SYSTEM today! 💸\n\n🤑 Just sign up and start earning BIG BUCKS right away! 🤑\n\n👉 Plus, if you refer your friends, you'll get even MORE CASH! 👈\n\n🔥 This is the HOTTEST OFFER of the year! 🔥\n\n👍 Don't wait"
for i,text in enumerate([not_spam, spam]):
    t=time.time()
    pred = predictor.predict({"prompt": text})
    elapsed = (time.time()-t)*1000
    print(f"Elapsed time: {elapsed}")
    print(f"Pred: {i} - {labels[pred[0][0]]} / score: {pred[0][1]}")