# Serve OPT-30B on SageMaker With PyTorch PiPPy Using TorchServe 


## Contents

This notebook demonstrates how to use PyTorch Native large model inference solution TorchServe+PiPPy on SageMaker. In this example, OPT-30B is loaded on g5.24xlarge.

In [None]:
!pip install numpy
!pip install pillow
!pip install -U sagemaker

In [None]:
# Python Built-Ins:
from datetime import datetime
import os
import json
import logging
import time

# External Dependencies:
import boto3
from botocore.exceptions import ClientError
import sagemaker
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.model import Model

sess = boto3.Session()
sm = sess.client("sagemaker")
region = sess.region_name
account = boto3.client("sts").get_caller_identity().get("Account")

smsess = sagemaker.Session(boto_session=sess)
role = sagemaker.get_execution_role()

# Configuration:
bucket_name = smsess.default_bucket()
prefix = "torchserve"
output_path = f"s3://{bucket_name}/{prefix}"
print(f"account={account}, region={region}, role={role}")

## Create Model Artifacts
This example creates a tar.gz format TorchServe model artifact for each model.
### Install torch-model-archiver

!pip install torch-model-archiver

### Download OPT-30B from HuggingFace

#### Implement customized handler
This [readme](https://github.com/pytorch/serve/blob/e205e6b9836a881dea6b5d2afee20570b1280f36/docs/large_model_inference.md?plain=1#L1) describes how to implement large model's handler in TorchServe.

In [None]:
!cat workspace/opt/pippy_handler.py

#### Download OPT-30B from HuggingFace, Create and Upload opt-30b.tar.gz file 

In [None]:
# Create Model Manifest
!torch-model-archiver --model-name opt-30b --version 1.0 --handler workspace/opt/pippy_handler.py --config-file workspace/opt/model-config.yaml --archive-format no-archive

!cd opt-30b && cp -rp workspace/opt/code/ .

# Download OPT-30B from HuggingFace
!cd opt-30b && mkdir model && python workspace/opt/Download_model.py --model_name facebook/opt-30b
# Replace symbolic link b/c SageMaker does not allow symbolic link in tgz file
!cd opt-30b/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546 && for f in $(find -type l);do cp --remove-destination $(readlink $f) $f;done;

# Create model tgz file
!export GZIP='--fast'
!cd opt-30b && tar cvzf opt-30b.tar.gz .

In [None]:
# Upload model tgz to S3
!cd opt-30b && aws s3 cp opt-30b.tar.gz {output_path}/opt-30b.tar.gz

## Create the Model Endpoint with the SageMaker SDK

In [None]:
# Use SageMaker PyTorch DLC as base image
baseimage = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=region,
    py_version="py310",
    image_scope="inference",
    version="2.0",
    instance_type="ml.g5.24xlarge",
)
print(baseimage)

In [None]:
# This is where our endpoint will read models from on S3.
model_s3uri = output_path
print(model_s3uri)
model = Model(
    name="torchserve-opt-" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S"),
    model_data=f"{model_s3uri}/opt-30b.tar.gz",
    image_uri=baseimage,
    role=role,
    sagemaker_session=smsess,
    env={"GZIP": "--fast"},
)

print(model)

### Deploy the Endpoint

You need to consider the appropriate instance type and number of instances for the projected prediction workload across all the models you plan to host behind your multi-model endpoint. The number and size of the individual models will also drive memory requirements.

In [None]:
try:
    predictor.delete_endpoint(delete_endpoint_config=True)
    print("Deleting previous endpoint...")
    time.sleep(10)
except (NameError, ClientError):
    pass

model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.24xlarge",
    endpoint_name="torchserve-opt-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime()),
    model_data_download_timeout=3600,
    container_startup_health_check_timeout=1800,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

## Get predictions from the endpoint

In [None]:
predictor = sagemaker.predictor.Predictor(
    endpoint_name=model.endpoint_name, sagemaker_session=smsess
)
print(predictor)

### OPT Inference Request

In [None]:
import json

payload = json.dumps({"data": "Hey, are you conscious? Can you talk to me?"}).encode("utf-8")


response = predictor.predict(data=payload).decode("utf-8")
print(response)

## Clean up

Endpoints should be deleted when no longer in use, since (per the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/)) they're billed by time deployed. Here we'll also delete the endpoint configuration - to keep things tidy.

In [None]:
predictor.delete_endpoint(delete_endpoint_config=True)

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/advanced_functionality|multi_model_pytorch|pytorch_multi_model_endpoint.ipynb)
