# LAB: Foundation Models

In this demo we are going to explore SageMaker JumpStart Foundation Models.

# Set Up

In [3]:
!pip install jupyterlab --quiet
!pip install ipywidgets==7.0.0 --quiet
!pip install --upgrade sagemaker --quiet

[0m

Permissions and environment variables

In [5]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

# Select a pre-trained model

You can continue with the default model, or can choose a different model from the dropdown generated upon running the next cell. A complete list of SageMaker pre-trained models can also be accessed at SageMaker pre-trained Models. 

In [6]:
model_id, model_version = (
    "huggingface-text2text-flan-t5-xl",
    "*",
)

# Retrieve Artifacts & Deploy an Endpoint

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model. To host the pre-trained model, we create an instance of sagemaker.model.Model [https://sagemaker.readthedocs.io/en/stable/api/inference/model.html](<https://sagemaker.readthedocs.io/en/stable/api/inference/model.html>)__ and deploy it. This may take a few minutes.



In [7]:
def get_sagemaker_session(local_download_dir) -> sagemaker.Session:
    """Return the SageMaker session."""

    sagemaker_client = boto3.client(
        service_name="sagemaker", region_name=boto3.Session().region_name
    )

    session_settings = sagemaker.session_settings.SessionSettings(
        local_download_dir=local_download_dir
    )

    # the unit test will ensure you do not commit this change
    session = sagemaker.session.Session(
        sagemaker_client=sagemaker_client, settings=session_settings
    )

    return session

We need to create a directory to host the downloaded model.

In [8]:
!mkdir -p download_dir

The code below defines configuration settings for deploying. Each model is associated with a specific AWS instance type (like "ml.m5.4xlarge") and has certain environment variables set, primarily to dictate the number of worker processes for serving the model. 


In [13]:
_large_model_env = {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"}

_model_config_map = {
    "huggingface-text2text-flan-t5-xxl": {
        "instance_type": "ml.m5.4xlarge",
        "env": _large_model_env,
    },
    "huggingface-text2text-flan-t5-xxl-fp16": {
        "instance_type": "ml.m5.4xlarge",
        "env": _large_model_env,
    },
    "huggingface-text2text-flan-t5-xxl-bnb-int8": {
        "instance_type": "ml.m5.4xlarge",
        "env": _large_model_env,
    },
    "huggingface-text2text-flan-t5-xl": {
        "instance_type": "ml.m5.4xlarge",
        "env": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    },
    "huggingface-text2text-flan-t5-large": {
        "instance_type": "ml.m5.4xlarge",
        "env": {"MMS_DEFAULT_WORKERS_PER_MODEL": "1"},
    },
    "huggingface-text2text-flan-ul2-bf16": {
        "instance_type": "ml.m5.xlarge",
        "env": _large_model_env,
    },
}

The code below sets up and deploys the foundation model to AWS SageMaker for inference. It determines the appropriate instance type and fetches necessary URIs based on the model_id. Once configured, the model is deployed as a SageMaker endpoint.

In [15]:
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

if model_id in _model_config_map:
    inference_instance_type = _model_config_map[model_id]["instance_type"]
else:
    inference_instance_type = "ml.m5.4xlarge"

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
if model_id in _model_config_map:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    model = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_model_config_map[model_id]["env"],
    )
else:
    model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        model_data=model_uri,
        entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        sagemaker_session=get_sagemaker_session("download_dir"),
    )

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

------!

# Query endpoint and parse response

The code defines utilities to interact with an AWS SageMaker endpoint: one function sends encoded text for inference, and the other decodes the resulting response to retrieve generated text. Additionally, text formatting constants are established but not utilized within these functions.

In [16]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

The code below defines text formatting constants and two sample texts. It then loops through these texts, sending each to a SageMaker endpoint for inference. The responses are parsed, and both the original and generated texts are printed, with the generated text displayed in bold.


In [17]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text1 = "Translate to German:  I enjoy travelling and experiencing new cultures"
text2 = "A step by step recipe to make ice cream:"


for text in [text1, text2]:
    query_response = query_endpoint(text.encode("utf-8"), endpoint_name=endpoint_name)
    generated_text = parse_response(query_response)
    print(
        f"Inference:{newline}"
        f"input text: {text}{newline}"
        f"generated text: {bold}{generated_text}{unbold}{newline}"
    )

Inference:
input text: Translate to German:  My name is Arthur
generated text: [1mIch bin Arthur.[0m

Inference:
input text: A step by step recipe to make ice cream:
generated text: [1mIn a medium bowl combine 4 cups chilled whipping cream, 1 cup each vanilla and chocolate[0m



# Clean up the endpoint

In [20]:
model_predictor.delete_model()
model_predictor.delete_endpoint()

Exception: One or more models cannot be deleted, please retry. 
Failed models: jumpstart-example-huggingface-text2text-2023-08-25-15-10-13-512