# Codegen Sagemaker inference with Intel optimizations

## Agenda
0. Prerequisites
1. Build Deep Learning Container and push it to AWS ECR
2. Create a Torchserve file and put it on S3 bucket
3. Create AWS Sagemaker endpoint
4. Invoke the endpoint

### Prerequisites

Install all libraries required to run the example.

In [12]:
!pip install "sagemaker>=2.175.0" --upgrade --quiet

Remember also that you have all required accesses on you AWS account. To run this example you're going to need following accesses:
- AmazonEC2ContainerRegistryFullAccess
- AmazonEC2FullAccess
- AmazonS3FullAccess

### Build Deep Learning Container and push it to AWS ECR

If you don't have Docker image prepared beforehand, clone the Deep Learning Containers repository and build the image with all required intel optimizations.

In [19]:
!git clone https://github.com/aalbersk/deep-learning-containers
!cd deep-learning-containers && git checkout intel_pytorch_ipex

fatal: destination path 'deep-learning-containers' already exists and is not an empty directory.
branch 'intel_pytorch_ipex' set up to track 'origin/intel_pytorch_ipex'.
Switched to a new branch 'intel_pytorch_ipex'


By default the image will build `2.2` version of Pytorch+IPEX image. If you'd like to build another version, modify fields `version` and `short_version` in [pytorch/inference/buildspec-intel.yml](https://github.com/aalbersk/deep-learning-containers/blob/intel_pytorch_ipex/pytorch/inference/buildspec-intel.yml). The command below will automatically build the image and push it into your ECR.

In [None]:
!cd deep-learning-containers && PYTHONPATH=$PYTHONPATH:$(pwd):$(pwd)/src INTEL_DEDICATED=true python src/main.py --buildspec pytorch/inference/buildspec-intel.yml --framework pytorch --image_types inference --device_types cpu

### Create a Torchserve file and put it on S3 bucket
If you'd like to use your own version of Codegen, here's how to create a torchserve file and put it on S3 bucket.

As default Intel DLC has only essential Pytorch libraries + latest Transformers (4.37), Codegen requires requirements with following libraries additionaly:
```python
transformers==4.33.2
tiktoken
```

To generate a Torchserve MAR file use following command:

### Create AWS Sagemaker endpoint

In [11]:
REGION=""
ACCOUNT=""

!aws configure

# Loging to your private Amazon ECR registry
!aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT.dkr.ecr.$REGION.amazonaws.com

AWS Access Key ID [None]: ^C

Note: AWS CLI version 2, the latest major version of the AWS CLI, is now stable and recommended for general use. For more information, see the AWS CLI version 2 installation instructions at: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: argument --region: expected one argument
Error: Cannot perform an interactive login from a non TTY device


In [14]:
from datetime import datetime

current_datetime = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

In [13]:
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::205130860845:role/sagemaker_fullaccess
sagemaker session region: us-west-2


In [16]:
model_name = f"bert-test-model-{current_datetime}"

primary_container = {
    "Image": f"205130860845.dkr.ecr.us-west-2.amazonaws.com/pytorch_inference:2.2.0-cpu-intel-py310-ubuntu20.04-sagemaker-2024-02-28-13-36-20:latest",
    "ModelDataUrl": f"s3://aalbersk-dlc-ipex-models/bert_ts_clean.tar.gz"
}

create_model_response = sess.create_model(
    ModelName=model_name,
    ExecutionRoleArn=role,
    PrimaryContainer=primary_container)

TypeError: Session.create_model() got an unexpected keyword argument 'ModelName'