# Codegen Sagemaker inference with Intel optimizations

## Agenda
0. Prerequisites
1. Build Deep Learning Container and push it to AWS ECR
2. Create a Torchserve file and put it on S3 bucket
3. Create AWS Sagemaker endpoint
4. Invoke the endpoint

### Prerequisites

Install all libraries required to run the example.

In [1]:
!pip install "sagemaker>=2.175.0" --upgrade --quiet
! pip install awscli boto3 botocore numpy s3transfer torch-model-archiver==0.8.1 torchserve==0.8.2 --upgrade



Remember also that you have all required accesses on you AWS account. To run this example you're going to need following accesses:
- AmazonEC2ContainerRegistryFullAccess
- AmazonEC2FullAccess
- AmazonS3FullAccess

### Build Deep Learning Container and push it to AWS ECR

If you don't have Docker image prepared beforehand, clone the Deep Learning Containers repository and build the image with all required intel optimizations.

In [2]:
!git clone https://github.com/aalbersk/deep-learning-containers
!cd deep-learning-containers && git checkout intel_pytorch_ipex

fatal: destination path 'deep-learning-containers' already exists and is not an empty directory.
Already on 'intel_pytorch_ipex'
Your branch is up to date with 'origin/intel_pytorch_ipex'.


By default the image will build `2.2` version of Pytorch+IPEX image. If you'd like to build another version, modify fields `version` and `short_version` in [pytorch/inference/buildspec-intel.yml](https://github.com/aalbersk/deep-learning-containers/blob/intel_pytorch_ipex/pytorch/inference/buildspec-intel.yml). The command below will automatically build the image and push it into your ECR.

In [None]:
!cd deep-learning-containers && PYTHONPATH=$PYTHONPATH:$(pwd):$(pwd)/src INTEL_DEDICATED=true python src/main.py --buildspec pytorch/inference/buildspec-intel.yml --framework pytorch --image_types inference --device_types cpu

### Create a Torchserve file and put it on S3 bucket

# **<span style="background: yellow">Todo: plan how to get the model and describe it</span>**

If you'd like to use your own version of Codegen, here's how to create a torchserve file and put it on S3 bucket.

As default Intel DLC has only essential Pytorch libraries + latest Transformers (4.37), Codegen requires requirements with following libraries additionaly:
```python
transformers==4.33.2
tiktoken
```

To generate a Torchserve MAR file use following command:

In [None]:
!torch-model-archiver --model-name codegen25 --version 1.0 --handler codegen_handler.py --config-file model-config.yaml --extra-files codegen25.py -r requirements.txt --archive-format tgz

Next, copy the model into an S3 bucket of your choice:

In [None]:
!aws s3 cp codegen25.tar.gz s3://<s3 bucket name>/

### Create AWS Sagemaker endpoint

# **<span style="background: yellow">Todo: describe creating the endpoint</span>**
# **<span style="background: yellow">Todo: prepare variables to change based on user needs</span>**
# **<span style="background: yellow">Todo: initialize with codegen not bert</span>**

In [3]:
from datetime import datetime

current_datetime = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

In [5]:
import sagemaker
import boto3

boto3_session = boto3.session.Session(region_name="us-west-2")
smr = boto3.client('sagemaker-runtime')
sm = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
sess = sagemaker.session.Session(boto3_session, sagemaker_client=sm, sagemaker_runtime_client=smr)
region = sess._region_name
account = sess.account_id()

bucket_name = sess.default_bucket()
prefix = "torchserve"
output_path = f"s3://{bucket_name}/{prefix}"
print(f'account={account}, region={region}, role={role}, output_path={output_path}')

account=205130860845, region=us-west-2, role=arn:aws:iam::205130860845:role/sagemaker_fullaccess, output_path=s3://sagemaker-us-west-2-205130860845/torchserve


In [12]:
from sagemaker import Model

instance_type = "ml.m7i.8xlarge"
endpoint_name = sagemaker.utils.name_from_base("bert-ipex")
s3_url = "s3://intel-sagemaker/bert_ts_clean.tar.gz"

container = "205130860845.dkr.ecr.us-west-2.amazonaws.com/pytorch_inference:2.2.0-cpu-intel-py310-ubuntu20.04-sagemaker-2024-02-28-13-36-20"
model = Model(
    name="torchserve-bert-ipex" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    # Enable SageMaker uncompressed model artifacts
    model_data=s3_url,
    image_uri=container,
    role=role,
    sagemaker_session=sess,
    env={"TS_INSTALL_PY_DEP_PER_MODEL": "true",
         "SAGEMAKER_CONTAINER_LOG_LEVEL": "0",
         "SAGEMAKER_REGION": region},
)
print(endpoint_name)
print(model)

bert-ipex-2024-03-04-17-46-00-450
<sagemaker.model.Model object at 0x7f3dc5adfdf0>


In [13]:
model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
    #volume_size=32, # increase the size to store large model
    model_data_download_timeout=3600, # increase the timeout to download large model
    container_startup_health_check_timeout=600, # increase the timeout to load large model
)

----!

### Invoke the endpoint

# **<span style="background: yellow">Todo: describe invoking the endpoint</span>**
# **<span style="background: yellow">Todo: use humaneval to generate</span>**

In [16]:
import time, json

client = boto3.client('sagemaker-runtime')
context="The Panthers finished the regular season with a 15-1 record, and quarterback Cam Newton was named the NFL Most Valuable Player (MVP). They defeated the Arizona Cardinals 49-15 in the NFC Championship Game and advanced to their second Super Bowl appearance since the franchise was founded in 1995. The Broncos finished the regular season with a 12-4 record, and denied the New England Patriots a chance to defend their title from Super Bowl XLIX by defeating them 20-18 in the AFC Championship Game. They joined the Patriots, Dallas Cowboys, and Pittsburgh Steelers as one of four teams that have made eight appearances in the Super Bowl."

question="Who was named the MVP?"

custom_attributes = "c000b4f9-df62-4c85-a0bf-7c525f9104a4"  # An example of a trace ID.
# endpoint_name = "dlc-test"                               # Your endpoint name.
content_type = "application/json"                           # The MIME type of the input data in the request body.
accept = "*/*"                                              # The desired MIME type of the inference in the response.

import io

class Parser:
    def __init__(self):
        self.buff = io.BytesIO()
        self.read_pos = 0
        
    def write(self, content):
        self.buff.seek(0, io.SEEK_END)
        self.buff.write(content)
        data = self.buff.getvalue()
        
    def scan_lines(self):
        self.buff.seek(self.read_pos)
        for line in self.buff.readlines():
            if line[-1] != b'\n':
                self.read_pos += len(line)
                yield line[:-1]
                
    def reset(self):
        self.read_pos = 0

start_time = time.time()
response = client.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name, 
    CustomAttributes=custom_attributes, 
    ContentType=content_type,
    Accept=accept,
    Body=json.dumps({'context': context, 'question': question})
    )
print("--- %s seconds ---" % (time.time() - start_time))

parser = Parser()
for event in response['Body']:
    parser.write(event['PayloadPart']['Bytes'])
    for line in parser.scan_lines():
        print("\n", line.decode("utf-8"), end=' \n')

--- 0.3708372116088867 seconds ---

 was named the nfl most valuable player ( mvp ). they defeated the arizona cardinals 49 - 15 in the nfc championship game and advanced to their second super bowl appearance since the franchise was founded in 1995. the broncos finis 

 ed the regular season with a 12 - 4 record, and denied the new england patriots a chance to defen 
