# Deploying MultiModal Models with LMI Deep Learning Containers

In this tutorial, you will use the LMI DLC available on SageMaker to host and serve inference for a MultiModal model. We will be using the [Llava-v1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) model available on the HuggingFace Hub.

Please make sure that you have an IAM role with SageMaker access enabled before proceeding with this example. 

For a list of supported multimodal models in LMI, please see the documentation [here]()

## Step 1: Install Notebook Python Dependencies

In [None]:
%pip install sagemaker --upgrade --quiet

## Step 2: Leverage the SageMaker PythonSDK to deploy your endpoint

In [None]:
import sagemaker
from sagemaker.djl_inference import DJLModel

role = sagemaker.get_execution_role() # iam role for the endpoint
session = sagemaker.session.Session() # sagemaker session for interacting with aws APIs

In [None]:
# Choose a specific version of LMI image directly:
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124"

In [None]:
model = DJLModel(
    model_id="llava-hf/llava-v1.6-mistral-7b-hf",
    role=role,
    image_uri=image_uri,
)

In [None]:
predictor = model.deploy(initial_instance_count=1, instance_type="ml.g6.4xlarge")

## Step 3: Make Inference Predictions

For multimodal models, LMI containers support the [OpenAI Chat Completions Schema](https://platform.openai.com/docs/guides/chat-completions). You can find specific details about LMI's implementation of this spec [here](https://docs.djl.ai/docs/serving/serving/docs/lmi/user_guides/chat_input_output_schema.html).

The OpenAI Chat Completions Schema allows two methods of specifying the image data:

* an image url (e.g. https://resources.djl.ai/images/dog_bike_car.jpg)
* base64 encoded string of the image data

If an image url is provided, the container will make a network call to fetch the image. This is ok for small applications and experimentation, but is not recommended in a production setting. If you are in a network isolated environment you must use the base64 encoded string representation.

We will demonstrate both mechanisms here.

### Getting a Test Image

You are free to use any image that you want. In this example, we'll be using the following image.

In [None]:
%pip install Pillow

In [None]:
import urllib.request
from PIL import Image

image_url = "https://resources.djl.ai/images/dog_bike_car.jpg"
image_path = "dog_bike_car.jpg"
# download the image locally
urllib.request.urlretrieve(image_url, image_path)

In [None]:
img = Image.open('dog_bike_car.jpg')
img.show()

### Using the image http url directly

In [None]:
messages = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is this image of?"
                }, 
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    }
                }
            ]
        }
    ]
}

In [None]:
response = predictor.predict(messages)

In [None]:
print(response["choices"][0]["message"]["content"])

## Using the base64 encoded image

In [None]:
import base64

def encode_image_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')
    
encoded_image = encode_image_base64(image_path)
base64_image_url = f"data:image/jpeg;base64,{encoded_image}"

In [None]:
messages = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is this image of?"
                }, 
                {
                    "type": "image_url",
                    "image_url": {
                        "url": base64_image_url
                    }
                }
            ]
        }
    ]
}

In [None]:
response = predictor.predict(messages)

In [None]:
print(response["choices"][0]["message"]["content"])

## Clean Up Resources

This example demonstrates how to use the LMI container to deploy MultiModal models and serve inference requests. The 0.29.0-lmi container supports a variety of multimodal models using the OpenAI Chat Completions API spec. In the future, we plan to increase the set of multimodal architectures supported, as well as provide additional API specs that can be used to make inference requests.

In [None]:
predictor.delete_endpoint()
model.delete_model()