## Pixtral 12b LMI v12 Deployment Guide

This notebook demonstrates how to deploy the [Pixtral-12b](https://huggingface.co/mistralai/Pixtral-12B-2409) model using the LMI v12 container. This model is not stored in the typical HuggingFace pretrained format, so more configurations are required to deploy this successfully. While there are community versions of this model that have been converted into the HuggingFace pretrained format, those models are not compatible with LMI v12 as they are not compatible with vLLM. 

If you have finetuned this model and saved the artifacts in the HuggingFace pretrained format, you will need to convert the artifacts back into the mistral format. You can read more about that process in this discussion: https://huggingface.co/mistral-community/pixtral-12b/discussions/4.

### Install Required dependencies

In [None]:
%pip install sagemaker boto3

## Create the SageMaker model object

In [None]:
from sagemaker import image_uris
from sagemaker.djl_inference import DJLModel

image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.30.0-lmi12.0.0-cu124"

role = "arn:aws:iam::125045733377:role/AmazonSageMaker-ExecutionRole-djl"

# Once the SageMaker Python SDK PR is merged, we can use image_uris directly
# image_uri = image_uris.retrieve(framework="djl-lmi", version="0.30.0", region="us-west-2")

model = DJLModel(
    role=role,
    image_uri=image_uri,
    env={
        "HF_MODEL_ID": "mistralai/Pixtral-12B-2409",
        "HF_TOKEN": "<huggingface hub token>",
        "OPTION_ENGINE": "Python",
        "OPTION_MPI_MODE": "true",
        "OPTION_ROLLING_BATCH": "lmi-dist",
        "OPTION_MAX_MODEL_LEN": "8192", # this can be tuned depending on instance type + memory available
        "OPTION_MAX_ROLLING_BATCH_SIZE": "16", # this can be tuned depending on instance type + memory available
        "OPTION_TOKENIZER_MODE": "mistral",
        "OPTION_ENTRYPOINT": "djl_python.huggingface",
        "OPTION_TENSOR_PARALLEL_DEGREE": "max",
        "OPTION_LIMIT_MM_PER_PROMPT": "image=4", # this can be tuned to control how many image per prompt are allowed
    }
)

## Deploy the model

In [None]:
predictor = model.deploy(instance_type="ml.g6.12xlarge", initial_instance_count=1)

## Test prompts

The following prompts demonstrate how to use the pixtral-12b model for:
- Text only inference
- Single image inference
- Multi image inference

For the multi image inference use-case, we use two images. However, the model is configured to accept up to 4 images in a single prompt. This setting can be tuned with the `OPTION_LIMIT_MM_PER_PROMPT` configuration.

In [None]:
IMAGE_1_KITTEN = "https://resources.djl.ai/images/kitten.jpg"
IMAGE_2_TRUCK = "https://resources.djl.ai/images/truck.jpg"

text_only_payload = {
    "messages": [
        {
            "role": "user",
            "content": "I would like to get better at basketball. Can you provide me a 3 month plan to improve my skills?"
        }
    ],
    "max_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
}

single_image_payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Can you describe the following image and tell me what it contains?",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": IMAGE_1_KITTEN
                    }
                }
            ]
        }
    ],
    "max_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
}

multi_image_payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Can you describe the following images and tell me what they have in common? If they have nothing in common, please explain why.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": IMAGE_1_KITTEN
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": IMAGE_2_TRUCK
                    }
                }
            ]
        }
    ],
    "max_tokens": 1024,
    "temperature": 0.6,
    "top_p": 0.9,
}

# Text Only Inference

In [None]:
print(f"Prompt is:\n {text_only_payload['messages'][0]['content']}")
text_only_output = predictor.predict(text_only_payload)
print("Response is:\n")
print(text_only_output['choices'][0]['message']['content'])
print('----------------------------')

In [None]:
from PIL import Image
import requests
from io import BytesIO

response_kitten = requests.get(IMAGE_1_KITTEN)
img_kitten = Image.open(BytesIO(response_kitten.content))
response_truck = requests.get(IMAGE_2_TRUCK)
img_truck = Image.open(BytesIO(response_truck.content))

# Single Image Inference

In [None]:
print("This is the image provided to the model")
img_kitten.show()
single_image_output = predictor.predict(single_image_payload)
print(single_image_output['choices'][0]['message']['content'])
print('----------------------------')

# Multi Image Inference

In [None]:
print("These are the images provided to the model")
img_kitten.show()
img_truck.show()
multi_image_output = predictor.predict(multi_image_payload)
print(multi_image_output['choices'][0]['message']['content'])
print('----------------------------')

In [None]:
# clean up resources
predictor.delete_endpoint()
model.delete_model()