<a href="https://colab.research.google.com/github/HSV-AI/presentations/blob/master/2024/240925_Pixtral.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Logo](https://camo.githubusercontent.com/455da7518417340e112a473e3bdd91dae3dc8fda296d247ad3f3bc95cced8738/68747470733a2f2f6873762e61692f77702d636f6e74656e742f75706c6f6164732f323032322f30332f6c6f676f5f7631315f323032322e706e67)

# Welcome
- Vision
- Mission
- How to Connect - [Signup](https://hsv.ai/subscribe/)




# Pixtral

Here are some key pieces from the [Mistral AI post](https://mistral.ai/news/pixtral-12b/) announcing Pixtral:

- Natively multimodal, trained with interleaved image and text data
- Strong performance on multimodal tasks, excels in instruction following
- Maintains state-of-the-art performance on text-only benchmarks
- Architecture:
  - New 400M parameter vision encoder trained from scratch
  - 12B parameter multimodal decoder based on Mistral Nemo
  - Supports **variable image sizes and aspect ratios**
  - Supports **multiple images** in the long context window of 128k tokens

## Image Encoding

Variable image size: Pixtral is designed to optimize for both speed and performance. We trained a new vision encoder that natively supports variable image sizes:

- We simply pass images through the vision encoder at their native resolution and aspect ratio, converting them into image tokens for each 16x16 patch in the image
- These tokens are then flattened to create a sequence, with [IMG BREAK] and [IMG END] tokens added between rows and at the end of the image.
- [IMG BREAK] tokens let the model distinguish between images of different aspect ratios with the same number of tokens.

## Full Model

This is the best image I can find to show how the pieces are put together:

![Image](https://mistral.ai/images/news/pixtral-12b/pixtral-model-architecture.png)




# Le Chat

My first attempt at using Pixtral was through the Le Chat site from Mistral - [https://chat.mistral.ai/chat](https://chat.mistral.ai/chat)

# Le Platforme

I signed up for free to get access to the API. The site asked me for a phone number to verify. I chose the "Experiment for Free" option. Then I created an API key and was off to the races. The first step was to recreate the example from the API documentation. It analyzes this image:

![Image](https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg)

In [1]:
!pip install mistralai

Collecting mistralai
  Downloading mistralai-1.1.0-py3-none-any.whl.metadata (23 kB)
Collecting httpx<0.28.0,>=0.27.0 (from mistralai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jsonpath-python<2.0.0,>=1.0.6 (from mistralai)
  Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)
Collecting typing-inspect<0.10.0,>=0.9.0 (from mistralai)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting httpcore==1.* (from httpx<0.28.0,>=0.27.0->mistralai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<0.28.0,>=0.27.0->mistralai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<0.10.0,>=0.9.0->mistralai)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Downloading mistralai-1.1.0-py3-none-any.whl (229 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.7/22

In [6]:
import os
from mistralai import Mistral
from google.colab import userdata
api_key = userdata.get('MISTRAL_API_KEY')

# Specify model
model = "pixtral-12b-2409"

# Initialize the Mistral client
client = Mistral(api_key=api_key)

# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)


The image features a serene landscape with a vast expanse of snow-covered terrain. There are distinct features resembling sand dunes, but the dunes are covered in snow. The sky overhead has a gradient of soft pastel colors, suggesting either sunrise or sunset. The horizon line is visible, and there's a sense of calm and tranquility in the scene.


# Recreating the announcement post:

Analyzing this image, what are the top 5 economies from Europe?

![Image](https://mistral.ai/images/news/pixtral-12b/gdp.png)

In [4]:
# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "List the top 5 countries in Europe with the highest GDP"
            },
            {
                "type": "image_url",
                "image_url": "https://mistral.ai/images/news/pixtral-12b/gdp.png"
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)


The top 5 countries in Europe with the highest GDP, based on the diagram, are:

1. **Germany**
   - GDP: $3.99T
   - GDP Percentage: 4.65%

2. ** United Kingdom**
   - GDP: $2.82T
   - GDP Percentage: 3.29%

3. **France**
   - GDP: $2.78T
   - GDP Percentage: 3.24%

4. **Italy**
   - GDP: $2.07T
   - GDP Percentage: 2.42%

5. **Spain**
   - GDP: $1.43T
   - GDP Percentage: 1.66%

These countries are highlighted in green on the diagram.


Create a website based on this diagram:

![Image](https://mistral.ai/images/news/pixtral-12b/image-to-code.jpg)

In [5]:
# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Write HTML code to create a website like this"
            },
            {
                "type": "image_url",
                "image_url": "https://mistral.ai/images/news/pixtral-12b/image-to-code.jpg"
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)


To create a website similar to the one depicted in the image, you'll need a basic HTML structure. Below is an example of HTML code to create a simple webpage with a title, a dropdown menu for selecting an ice cream flavor, and a "Next" button.

```html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Pick an Ice Cream Flavor</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 0;
            padding: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            background-color: #f0f0f0;
        }
        .container {
            background-color: #fff;
            padding: 20px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
            border-radius: 8px;
            text-align: center;
        }
        h1 {
            margin-bottom: 20px;
        }
   

# Different images

Let's see what happens if we use an image without a descriptive filename:

URL: https://picsum.photos/id/30/600/300

![Image](https://picsum.photos/id/30/600/300)

In [7]:
# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": "https://picsum.photos/id/30/600/300"
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)


The image features a white ceramic mug with a red design. The design appears to be a vintage or retro-style postage stamp. It has an image of what looks like Che Guevara, and the text "CUBA" and "CUBANO 1960" are prominently displayed. The mug is placed on a surface, and there is a blurred background that prevents identification of the specific setting.


# Quantization

