# Containerized Deployment and Inference with Pixtral on EC2

In this notebook we will be deploying [Mistral's Pixtral model](https://huggingface.co/mistralai/Pixtral-12B-2409) as a containerized inference server on Amazon EC2. Pixtral is trained to understand both natural images and documents, achieving 52.5% on the MMMU reasoning benchmark, surpassing a number of larger models. The model shows strong abilities in tasks such as chart and figure understanding, document question answering, multimodal reasoning and instruction following. Pixtral is able to ingest images at their natural resolution and aspect ratio, giving the user flexibility on the number of tokens used to process an image. Pixtral is also able to process any number of images in its long context window of 128K tokens. Unlike previous open-source models, Pixtral does not compromise on text benchmark performance to excel in multimodal tasks.

## Prerequisites
Follow the steps below to set up your EC2 instance with the Ubuntu DLAMI that comes pre-installed with Nvidia drivers, docker, and other tools that you will require.

#### Create Your EC2 instance
##### Follow the steps here for a detailed set up of your EC2 instance: [setup](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html)

##### Steps:
- Navigate to the EC2 dashboard from the AWS mgmt console and launch your instance.
- Search for the `Deep Learning Base OSS Nvidia driver GPU AMI (Ubuntu 22.04)` and select it.
-  Choose the instance size as `g5.12xlarge` or any larger instance size.
> [!NOTE]  
> Depending on the instance size, you will need to adjust the `tensor_parallel_size` parameter and the `max_model_len` as with the g5.12xlarge the max KV cache is smaller than the max model context length of 128k tokens. Feel free to not specify the `max_model_len` with larger instance sizes to fully utilize the context window.
- Set the inbound rule for `ssh` to your local machine's ip address or `anywhere` (note that it is not in accordance to set this to allow trafic from any ipv4, please ensure you secure these ports once done testing.
- Create and specify your ssh key in the instance configuration step. You will need your `.pem` file
- Set the EBS volume sie to 100GB `gp3`.
- Create your instance.

Once you have launched your instance, navigate to either your terminal or VSCODE and follow the steps below:

<b>ssh for powershell:</b>
```
$PUBLIC_DNS="paste your public ipv4 dns here" # public ipv4 DNS, e.g. ec2-3-80-.... from ec2 console
$KEY_PATH="paste ssh key path here" # local path to key, e.g. ssh/trn.pem

ssh -i $KEY_PATH -L 8080:localhost:8080 ubuntu@$PUBLIC_DNS
```
<b>ssh for linux/macOS:</b>
```
export PUBLIC_DNS="paste your public ipv4 dns here" # public ipv4 DNS, e.g. ec2-3-80-.... from ec2 console
export KEY_PATH="paste ssh key path here" # local path to key, e.g. ssh/trn.pem

ssh -i $KEY_PATH -L 8080:localhost:8080 ubuntu@$PUBLIC_DNS
``` 
You should have sshed into your EC2 instance.
Next we can change our directory to home, create a directory for our notebook, install jupyter, and launch the jupyter environment.
```
ubuntu@ip-172-31-0-170:~$ cd
ubuntu@ip-172-31-0-170:~$ cd jupyter-pixtral/
ubuntu@ip-172-31-0-170:~/jupyter-pixtral$ cd notebooks/
ubuntu@ip-172-31-0-170:~/jupyter-pixtral/notebooks$ sudo pip3 install notebook
ubuntu@ip-172-31-0-170:~/jupyter-pixtral/notebooks$ python3 -m notebook --allow-root --port=8080 
```
You should see a familiar jupyter output with a URL to the notebook.

`http://localhost:8080/....`

We can click on it, and a jupyter environment opens in our local browser. Upload this notebook to your jupyter environment and runthe steps in the cells below.

In [None]:
import subprocess

# Define the bash script
bash_script = """
export HF_TOKEN="provide your hf token"

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env HUGGING_FACE_HUB_TOKEN=${HF_TOKEN} \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:v0.6.1.post2 \
    --model mistralai/Pixtral-12B-2409 \
    --tokenizer_mode mistral \
    --load_format mistral \
    --config_format mistral \
    --tensor_parallel_size 4 \
    --gpu_memory_utilization 0.9 \
    --max_model_len 60000
"""
# Run the bash script and capture real-time output
process = subprocess.Popen(bash_script, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

while True:
    output = process.stdout.readline()
    if output == b'' and process.poll() is not None:
        break
    if output:
        print(output.decode().strip())

# Capture and print any errors
stderr = process.stderr.read().decode()
if stderr:
    print("Errors:", stderr)

In [None]:
!pip install --upgrade requests --quiet
!pip install --upgrade gradio --quiet
!pip install --upgrade jupyter ipywidgets --quiet

In [37]:
  import httpx

  url = "http://localhost:8000/v1/chat/completions"
  headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
  timeout=httpx.Timeout(250.0) #increase timeout as needed
  data = {
      "model": "mistralai/Pixtral-12B-2409",
      "messages": [
          {
              "role": "user",
              "content": [
                  {"type": "text", "text": "Describe and identify the location in this image."},
                  {
                      "type": "image_url",
                      "image_url": {"url": "https://huggingface.co/datasets/nithiyn/bounding-box/resolve/main/mykonos-2.jpeg"},
                  },
              ],
          }
      ],
  }

  response = httpx.post(url, headers=headers, json=data)

  print(response.json())

{'id': 'chat-62d19c41603f4796b3f10d3e15677d67', 'object': 'chat.completion', 'created': 1727113071, 'model': 'mistralai/Pixtral-12B-2409', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The image depicts a coastal town at sunset, characterized by its white buildings with flat roofs and some blue accents on the shutters. The town is situated along the seafront, with a prominent pier extending into the calm waters. The sky transitions from light hues near the horizon to deeper blue above, indicating the setting sun. The overall ambiance is serene and picturesque, typical of a Mediterranean or Greek island setting. The architecture, combined with the location and atmosphere, suggests that this could be Mykonos, a popular Greek island known for its white-washed buildings and picturesque sunsets.', 'tool_calls': []}, 'logprobs': None, 'finish_reason': 'stop', 'stop_reason': None}], 'usage': {'prompt_tokens': 2701, 'total_tokens': 2823, 'completion_tokens': 122}, 'prom

In [38]:
response_json= response.json()
print(response_json['choices'][0]['message']['content'])

The image depicts a coastal town at sunset, characterized by its white buildings with flat roofs and some blue accents on the shutters. The town is situated along the seafront, with a prominent pier extending into the calm waters. The sky transitions from light hues near the horizon to deeper blue above, indicating the setting sun. The overall ambiance is serene and picturesque, typical of a Mediterranean or Greek island setting. The architecture, combined with the location and atmosphere, suggests that this could be Mykonos, a popular Greek island known for its white-washed buildings and picturesque sunsets.


In [39]:
import gradio as gr
import httpx

# Function that sends the input to the API and returns the response
def call_api(text_prompt, image_urls):
    url = "http://localhost:8000/v1/chat/completions"
    headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
    timeout=httpx.Timeout(250.0) #increase timeout as needed
    # Create the list of image content messages
    content_list = [{"type": "text", "text": text_prompt}]
    
    # Add each image URL as a message with its type
    for image_url in image_urls:
        content_list.append({
            "type": "image_url",
            "image_url": {"url": image_url}
        })
    
    # Construct the request payload
    data = {
        "model": "mistralai/Pixtral-12B-2409",
        "messages": [
            {
                "role": "user",
                "content": content_list,  # Pass the list of text and image URLs
            }
        ],
    }

    # Send the request to the API
    response = httpx.post(url, headers=headers, json=data)

    # Return the API response (assuming the response is in JSON format)
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]["content"]
    else:
        return f"Error: {response.status_code}, {response.text}"

# Define Gradio interface
def gradio_interface():
    # Inputs: Textbox for prompt and Textbox for multiple image URLs
    text_prompt = gr.Textbox(label="Enter text prompt", placeholder="Describe and identify the location in these images.")
    
    # Add a new input for multiple URLs using Textbox and set it as a list
    image_urls = gr.Textbox(label="Enter image URLs (comma separated)", placeholder="Enter your URL")
    
    # Output: The generated content from the model
    output_text = gr.Textbox(label="Generated Description")

    # Create the Gradio interface
    interface = gr.Interface(
        fn=lambda text, urls: call_api(text, urls.split(',')),  # Split URLs by comma and pass them as a list
        inputs=[text_prompt, image_urls],  # Inputs to the function
        outputs=output_text,   # Output returned by the function
        title="Multi-Image Description Generator",  # Title for the interface
        description="Enter a text prompt and multiple image URLs, and the model will describe the images and their locations."
    )
    
    # Launch the interface
    interface.launch(share=True)

# Run the Gradio interface
if __name__ == "__main__":
    gradio_interface()


Running on local URL:  http://127.0.0.1:7866
Running on public URL: https://01f633a521e020e4f5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


---
### Distributors
- AWS
- Mistral