# Deploy Liquid AI 7B Model Package from AWS Marketplace

This notebook covers the Liquid 7B model targeting L40S GPU. 

Learn more on https://www.liquid.ai/lfm-7b .


The model runs on all of the G6E range instance type family https://aws.amazon.com/ec2/instance-types/g6e/
including the smallest `ml.g6e.xlarge` instance and the multi GPU instances, though using a single GPU is cost-effective.

| instance type   | GPU | vCPU | RAM(GB)|
|-----------------|-----|------|--------|
| ml.g6e.xlarge   |   1 |    4 |     32 |
| ml.g6e.2xlarge  |   1 |    8 |     64 |
| ml.g6e.4xlarge  |   1 |   16 |    128 |
| ml.g6e.8xlarge  |   1 |   32 |    256 |
| ml.g6e.16xlarge |   1 |   64 |    512 |

## 1. Subscribe to the model package

1. Open the model package listing [page](https://aws.amazon.com/marketplace/pp/prodview-a7hfjtjhloy36).
2. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
3. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
4. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
model_package_arn = "<Customer to specify Model package ARN corresponding to their AWS region>"

## 2. Setup the endpoint

### 2.1 Initiate a session

In [None]:
%pip install sagemaker boto3

In [None]:
# initiate a session
import sagemaker

role = sagemaker.get_execution_role()

sagemaker_session = sagemaker.Session()

### 2.2 Specify the instance type

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

Smallest 1GPU instance

In [5]:
instance_type = "ml.g6e.xlarge"

More powerful 1GPU instance

In [4]:
instance_type = "ml.g6e.16xlarge"

### 2.3 Create an endpoint

In [13]:
# create a deployable model from the model package.
model = sagemaker.ModelPackage(
    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session
)

In [None]:
# deploy the model
predictor = model.deploy(
    initial_instance_count=1, instance_type=instance_type, endpoint_name=model.name
)
# endpoint name is available after the deployment
endpoint_name = model.endpoint_name

Once endpoint has been created, you would be able to perform real-time inference.

## 3. Inference examples

### 3.1 Simple example

The model encourages to use a chat template, for more details see https://platform.openai.com/docs/api-reference/chat

In [None]:
import json

body = {
    # has to be always supplied
    "model": "/opt/ml/model",
    # your conversation
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning in 2 sentences?"
        }
    ],
    # extras
    "max_tokens": 512,
    "temperature": 0.8,
    "top_p": 0.9,
}

response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(body),
)

json.load(response["Body"])

### 3.2 Streaming example

In [None]:
import json

body = {
    # differnet from the previous example
    "stream": True,
    # has to be always supplied
    "model": "/opt/ml/model",
    # your conversation
    "messages": [
        {
            "role": "user",
            "content": "What is deep learning in 2 sentences?"
        }
    ],
    # extras
    "max_tokens": 512,
    "temperature": 0.8,
    "top_p": 0.9,
}

response = sagemaker_session.sagemaker_runtime_client.invoke_endpoint_with_response_stream(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(body),
)

def walk():
    last_chunk = ""
    for chunk in response["Body"]:
        proc_chunk = chunk["PayloadPart"]["Bytes"].decode('utf_8').replace("data:", "").replace("[DONE]", "").strip()
        try:
            payload = last_chunk + proc_chunk
            response_json = json.loads(payload)
            print(response_json)
            last_chunk = ""
            yield response_json['choices'][0]['delta']['content']
        except (json.decoder.JSONDecodeError):
            # partial response, re-try on the next iteration
            last_chunk = proc_chunk

total_response = ''.join(walk())
total_response

## 4. Clean-up

### 4.1 Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [None]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### 4.2 Delete the model

In [None]:
model.delete_model()

### 4.3 Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

