# Using Jamba 1.5 Large on SageMaker through Model Packages

This sample notebook shows you how to deploy **Jamba 1.5 Large** using Amazon SageMaker.


--------------------
## <font color='orange'>Important:</font>
Please visit model detail page in <a href="https://aws.amazon.com/marketplace/pp/prodview-bf26px7gdisek">https://aws.amazon.com/marketplace/pp/prodview-bf26px7gdisek</a> to learn more. <font color='orange'>If you do not have access to the link, please contact account admin for the help.</font>

You will find details about the model including pricing, supported region, and end user license agreement. To use the model, please click “<font color='orange'>Continue to Subscribe</font>” from the detail page, come back here and learn how to deploy and inference.


-------------------

Jamba 1.5 Large is the first of its kind hybrid Mamba-Transformer architecture at a production grade level offering unmatched efficiency. With an unprecedented context window length (256K) in a smaller model, it offers high quality output for tasks needing large input context & low latency, at a competitive price point.


## Pre-requisites:
1. Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**.
1. This notebook is intended to work with **boto3 v1.35.68** or higher.

## Contents:
1. [Select a model package](#1.-Select-a-model-package)
1. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#Create-an-endpoint)
   1. [Interact with the model](#Interact-with-the-model)
1. [Clean-up](#3.-Clean-up)
   1. [Delete the endpoint](#Delete-the-endpoint)
   1. [Delete the endpoint configuration](#Delete-the-endpoint-configuration)
   1. [Delete the model](#Delete-the-model)
    

## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## Installations

In [None]:
!pip install boto3 requests --upgrade

## Imports

In [None]:
import json
import boto3
import requests

### Check the version of boto3 - must be v1.35.68 or higher

In [None]:
boto3.__version__

## 1. Select a model package
Confirm that you received this notebook from the model catalog in SageMaker AWS Management Console.

Get latest model package ARN

In [None]:
region = boto3.Session().region_name
model_name = "jamba-1-5-large"
version = "latest"

# Get the updated ARN
model_package_arn_payload = {"modelName": model_name, "version": version, "region": region}
get_model_package_arn_url = "https://api.ai21.com/studio/v1/jumpstart/get_model_version_arn"
model_package_arn_response = requests.post(get_model_package_arn_url, json=model_package_arn_payload)
model_package_arn = model_package_arn_response.json()["arn"]

## 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).

In [None]:
# Set SageMaker execution role ARN
sagemaker_execution_role_arn = "<<INSERT-YOUR-SAGEMAKER-EXECUTION-ROLE>>"

In [None]:
sagemaker_base_name = "jamba-1-5-large"
sagemaker_model_name = f"{sagemaker_base_name}-model"
sagemaker_endpoint_config_name = f"{sagemaker_base_name}-endpoint-config"
sagemaker_endpoint_name = f"{sagemaker_base_name}-endpoint"

content_type = "application/json"

real_time_inference_instance_type = (
    "ml.p4de.24xlarge"
)

### Create an endpoint

In [None]:
sm_client = boto3.client('sagemaker', region_name=region)

In [None]:
# create model
sm_client.create_model(
    ModelName=sagemaker_model_name,
    ExecutionRoleArn=sagemaker_execution_role_arn,
    PrimaryContainer={
        'ModelPackageName': model_package_arn,
    },
    EnableNetworkIsolation=True,
)

In [None]:
# create endpoint config
endpoint_config = sm_client.create_endpoint_config(
    EndpointConfigName=sagemaker_endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': real_time_inference_instance_type,
        'InitialInstanceCount': 1,
        'ModelName': sagemaker_model_name,
        'VariantName': 'variant1',
        'ModelDataDownloadTimeoutInSeconds': 3600,
        'InferenceAmiVersion': 'al2-ami-sagemaker-inference-gpu-2',
    }]
)

In [None]:
# create endpoint
endpoint = sm_client.create_endpoint(
    EndpointName=sagemaker_endpoint_name,
    EndpointConfigName=sagemaker_endpoint_config_name,
)

Once endpoint has been created, you would be able to perform real-time inference.

### Interact with the model

In [None]:
sm_runtime = boto3.client("sagemaker-runtime")

For advanced usage, you can check out the [AI21 Jamba 1.5 documentation](https://docs.ai21.com/reference/jamba-15-api-ref).

#### Non-streaming

In [None]:
messages = [{"role": "user", "content": "Tell me a joke about pokemons"}]
input_json = json.dumps({"messages": messages})

response = sm_runtime.invoke_endpoint(
    EndpointName=sagemaker_endpoint_name,
    Accept="application/json",
    Body=input_json,
)

print(json.load(response["Body"]))

#### Streaming

In [None]:
messages = [{"role": "user", "content": "How are you?"}]
input_json = json.dumps({"messages": messages, "max_tokens": 10})

response = sm_runtime.invoke_endpoint_with_response_stream(
    EndpointName=sagemaker_endpoint_name,
    Accept="application/json",
    Body=input_json,
)

for event in response['Body']:
    print(event['PayloadPart']['Bytes'].decode('utf-8'))

## 3. Clean-up

### Delete the endpoint

In [None]:
sm_client.delete_endpoint(EndpointName=sagemaker_endpoint_name)

### Delete the endpoint configuration

In [None]:
sm_client.delete_endpoint_config(EndpointConfigName=sagemaker_endpoint_config_name)

### Delete the model

In [None]:
sm_client.delete_model(ModelName=sagemaker_model_name)