# Mistral AI: Open Models in Action on AWS

In this notebook, we will explore the capabilities of Mistral AI's open language and vision models, and demonstrate how to leverage services such as Amazon Bedrock and Amazon SageMaker to build powerful AI-powered applications.

Mistral AI is a leading provider of state-of-the-art AI models for a wide range of use cases. Their open models are designed to be highly flexible and customizable, allowing developers to quickly integrate advanced AI capabilities into their applications.

In this notebook, we will cover the following topics:

1. **Using Pixtral Large to Explore and Implement Vision Capabilities on Amazon Bedrock**:
   - Introduce Pixtral Large, Mistral AI's state-of-the-art text and vision model
   - Demonstrate how to use Amazon Bedrock to leverage Pixtral Large for fully serverless inference
   - Explore the performance and capabilities of Pixtral Large on various use cases.

2. **Leveraging Mistral Small 3 to Analyze Text Data and Generate Insights on Amazon SageMaker and Amazon Bedrock**:
   - Introduce Mistral Small 3, Mistral AI's versatile language model
   - Show how to use Amazon SageMaker deploy and Amazon Bedrock to integrate Mistral Small for natural language processing tasks, such as text classification, sentiment analysis, and language generation

3. **Building a Pipeline of Open Models Combining Multiple AWS Services**:
   - Demonstrate how to combine Pixtral Large and Mistral Small 3 in a **unified AI pipeline**
   - Leverage additional AWS services (e.g., Amazon S3, Amazon Lambda) to create a complete end-to-end solution for a specific use case
   - Discuss the benefits and challenges of using open-source models in a production environment

By the end of this notebook, you will have a solid understanding of how to leverage Mistral AI's open models and integrate them with AWS services to build powerful, scalable AI-powered applications.

## Getting Started
To begin, let's import the necessary libraries and set up our environment.

In [None]:
from IPython.display import display, Image as IPImage
import boto3
import json

## 1. Using Pixtral Large to Explore and Implement Vision Capabilities on Amazon Bedrock
Pixtral Large represents a significant advancement in multimodal AI technology, combining sophisticated image understanding capabilities with powerful language processing. Built upon the foundation of Mistral Large 2, this 124B parameter model (123B multimodal decoder + 1B parameter vision encoder) demonstrates exceptional performance across various visual and textual understanding tasks while maintaining a substantial 128K context window that can accommodate at least 30 high-resolution images.

## Model Architecture and Capabilities

- Total Parameters: 124B
- Multimodal Decoder: 123B
- Vision Encoder: 1B
- Context Window: 128K
- Image Capacity: up to 30 images per request

In [None]:
def get_mistral_response(prompt_text, image_path=None, show_image=True, temperature=0.6):
    
    model_id = "us.mistral.pixtral-large-2502-v1:0"
    bedrock_client = boto3.client("bedrock-runtime", region_name="us-east-1")


    image_paths = image_path if isinstance(image_path, list) else [image_path] if image_path else []

    if image_paths and show_image:
        for img_path in image_paths:
            print("Input Image:\n")
            display(IPImage(filename=img_path))
            print("\n")

    # Initialize message content with prompt text
    message_content = [{"text": prompt_text}]

    # Add images to content if provided
    if image_paths:
        for img_path in image_paths:
            image_ext = img_path.split(".")[-1]

            if (image_ext.lower() == 'jpg'):
                image_ext = 'jpeg'

            with open(img_path, "rb") as f:
                image = f.read()
            
            message_content.append({
                "image": {
                    "format": image_ext,
                    "source": {"bytes": image}
                }
            })

    message = {
        "role": "user",
        "content": message_content
    }

    try:
        response = bedrock_client.converse(
            modelId=model_id,
            messages=[message],
            inferenceConfig={
                "temperature": temperature
            }
        )

        output_message = response['output']['message']
        content_blocks = output_message['content']
        result_text = "\n".join(f"{block['text']}" for block in content_blocks)
        return result_text

    except Exception as err:
        return f"A client error occurred: {err}"

## Use cases

In [None]:
prompt = """Examine this visualization:

1. First, describe what this visualization represents:
   - What information is being shown?
   - How is the data displayed?
   - What do the different components represent?
   - What does the size variation indicate?

2. Analyze specific patterns:
   - Which region shows the highest proportion of the first category?
   - Which region shows the highest proportion of the second category?
   - Where do you observe the largest total values?
   - Which areas show the most balanced distribution?

3. Compare regional trends:
   - How do the proportions differ between continents?
   - What patterns emerge between different hemispheres?
   - Are there clear differences between different economic zones?

4. Identify interesting outliers:
   - Which regions stand out from their neighbors?
   - Can you identify any unexpected patterns?
   - Where do you notice significant data variations?

5. Consider geographical and demographic factors:
   - How might local conditions influence these patterns?
   - What socioeconomic factors might explain the variations?
   - Can you identify any correlation between size and proportions?

6. Make comparisons between:
   - Different geographical zones
   - Various population densities
   - Different economic development levels"""
image_path = "../Pixtral-samples/Pixtral_data/Map_Motorcycles_vs_cars_by_population_millions_2002.png"
response = get_mistral_response(prompt, image_path)

print("\nModel Response:")
print(response)

By Dennis Bratland - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15186498

In [None]:
prompt = """
Extract organization hierarchy from the given org structure. provide response in a structured json format with below:
- role
- name
- reporting_manager

"""
image_path = "../Pixtral-samples/Pixtral_data/org_hierarchy.jpeg"
response = get_mistral_response(prompt, image_path, temperature=0.1)

print("\nModel Response:")
print(response)

## 2. Leveraging Mistral Small 3 to Analyze Text Data and Generate Insights on Amazon SageMaker and Amazon Bedrock:
Mistral Small 3 is a 24B parameter Large Language Model that achieves remarkable performance while maintaining exceptional efficiency. Released under **Apache 2.0 license**, the model demonstrates 81% MMLU accuracy and processes 150 tokens per second, rivaling larger models like Llama 3.3 70B while operating three times faster on identical hardware. Through practical examples in fraud detection, customer service, sentiment analysis, and emergency triage, we showcase its versatility in handling complex enterprise tasks while maintaining rapid response times.


---

### Model Card
---

**Available regions:** *us-east-2, eu-west-3*

**Model ID:** [*Mistral-Small-24B-Instruct-2501*](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)

**Multilingual:** *Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.*

**Agent-Centric:** *Offers best-in-class agentic capabilities with native function calling and JSON outputting.*

**Advanced Reasoning:** *State-of-the-art conversational and reasoning capabilities.*

**Apache 2.0 License:** *Open license allowing usage and modification for both commercial and non-commercial purposes.*

**Context Window:** *A 32k context window.*

**System Prompt:** *Maintains strong adherence and support for system prompts.*

**Tokenizer:** *Utilizes a Tekken tokenizer with a 131k vocabulary size.*

Let's important the additional libraries that we need

In [None]:
import boto3
import json
import sagemaker
import time
from botocore.exceptions import ClientError

In [None]:
# Set configuration
MODEL_SOURCE_ID = 'huggingface-llm-mistral-small-24B-Instruct-2501'
MODEL_SOURCE_ARN = 'arn:aws:sagemaker:{region}:aws:hub-content/SageMakerPublicHub/Model/huggingface-llm-mistral-small-24B-Instruct-2501/2.0.3'
INSTANCE_TYPE = 'ml.g6.12xlarge'
ENDPOINT_NAME = 'mistral-small-serverfull'

In [None]:
# function to grab aws account id, sagemaker execution role and region
def get_current_session_info():
    sagemaker_role_arn = sagemaker.get_execution_role()
    session = sagemaker.Session()
    account_id = session.account_id()
    region = session._region_name

    return account_id, region, sagemaker_role_arn

aws_account_id, aws_region, sagemaker_role_arn = get_current_session_info()

print(f'aws region: {aws_region}')

MODEL_SOURCE_ARN = MODEL_SOURCE_ARN.format(region=aws_region)


In [None]:
# Create bedrock client object
bedrock_client = boto3.client('bedrock')

In [None]:
# create bedrock marketplace endpoint
def create_endpoint(model_source_arn: str, 
                    endpoint_name: str,
                    instance_type: str, 
                    instance_count: int = 1):

    response = bedrock_client.create_marketplace_model_endpoint(
            modelSourceIdentifier=model_source_arn,
            endpointConfig={
                'sageMaker': {
                    'initialInstanceCount': instance_count,
                    'instanceType': instance_type,
                    'executionRole': sagemaker_role_arn,
                }
            },
            acceptEula=True,
            endpointName=endpoint_name
        )
    return response

create_response = create_endpoint(model_source_arn=MODEL_SOURCE_ARN, endpoint_name=ENDPOINT_NAME, instance_type=INSTANCE_TYPE)


In [None]:
# Create bedrock runtime object

bedrock_runtime = boto3.client("bedrock-runtime")

# Helper function to invoke model using Converse API (without streaming)
def invoke_model(system_prompt: str, messages: list, display_usage=False):
    system = [ { "text": system_prompt } ]
    
    inf_params = {"max_tokens": 2000, "temperature": 1.0}
    
    response = bedrock_runtime.converse(modelId=endpoint_arn, 
                                        messages=messages,
                                        system=system,
                                        additionalModelRequestFields=inf_params)

    # Print Response
    output_message = response['output']['message']
    output_content = ''
    for content in output_message['content']:
        output_content = output_content.join(content['text'])

    if (display_usage):
        token_usage = response['usage']
        print("\t--- Token Usage ---")
        print(f"\tInput tokens:  {token_usage['inputTokens']}")
        print(f"\tOutput tokens:  {token_usage['outputTokens']}")
        print(f"\tTotal tokens:  {token_usage['totalTokens']}")
        
        print(f"\tLatency: {response['metrics']['latencyMs']}")

    
    return output_content
    
# Helper function to invoke model using Converse API with streaming
def invoke_model_with_stream(system_prompt: str, messages: list):
    system = [ { "text": system_prompt } ]
    
    inf_params = {"max_tokens": 2000, "temperature": 0.4}
    
    response = bedrock_runtime.converse_stream(modelId=endpoint_arn, 
                                        messages=messages,
                                        system=system,
                                        additionalModelRequestFields=inf_params)
    stream = response.get('stream')
    output_content = ''
    if stream:
        for event in stream:

            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                print(event['contentBlockDelta']['delta']['text'], end="")
                output_content = output_content.join(event['contentBlockDelta']['delta']['text'])

            if 'messageStop' in event:
                print(f"\nStop reason: {event['messageStop']['stopReason']}")

            if 'metadata' in event:
                print(event)
                metadata = event['metadata']
                if 'usage' in metadata:
                    print("\nToken usage")
                    print(f"Input tokens: {metadata['usage']['inputTokens']}")
                    print(
                        f":Output tokens: {metadata['usage']['outputTokens']}")
                    print(f":Total tokens: {metadata['usage']['totalTokens']}")
                if 'metrics' in event['metadata']:
                    print(
                        f"Latency: {metadata['metrics']['latencyMs']} milliseconds")

    return output_content


In [None]:
# Retrieve endpoint arn from response text

endpoint_arn = create_response['marketplaceModelEndpoint']['endpointArn']


In [None]:
# Check endpoint creation status until it's in service

while(True):
    endpoint_reponse = bedrock_client.get_marketplace_model_endpoint(endpointArn=endpoint_arn)
    status = endpoint_reponse['marketplaceModelEndpoint']['endpointStatus']
    print(f'endpoint status: {status}')
    if (status != 'Creating'):
        break

    # wait for 10 seconds
    time.sleep(10)

## Usage Cases


In [None]:
# Create bedrock runtime object

bedrock_runtime = boto3.client("bedrock-runtime")


# Helper function to invoke model using Converse API (without streaming)
def invoke_model(system_prompt: str, messages: list, display_usage=False):
    system = [ { "text": system_prompt } ]
    
    inf_params = {"max_tokens": 2000, "temperature": 1.0}
    
    response = bedrock_runtime.converse(modelId=endpoint_arn, 
                                        messages=messages,
                                        system=system,
                                        additionalModelRequestFields=inf_params)

    # Print Response
    output_message = response['output']['message']
    output_content = ''
    for content in output_message['content']:
        output_content = output_content.join(content['text'])

    if (display_usage):
        token_usage = response['usage']
        print("\t--- Token Usage ---")
        print(f"\tInput tokens:  {token_usage['inputTokens']}")
        print(f"\tOutput tokens:  {token_usage['outputTokens']}")
        print(f"\tTotal tokens:  {token_usage['totalTokens']}")
        
        print(f"\tLatency: {response['metrics']['latencyMs']}")

    
    return output_content

# Helper function to invoke model using Converse API with streaming
def invoke_model_with_stream(system_prompt: str, messages: list):
    system = [ { "text": system_prompt } ]
    
    inf_params = {"max_tokens": 2000, "temperature": 0.4}
    
    response = bedrock_runtime.converse_stream(modelId=endpoint_arn, 
                                        messages=messages,
                                        system=system,
                                        additionalModelRequestFields=inf_params)
    stream = response.get('stream')
    output_content = ''
    if stream:
        for event in stream:

            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                print(event['contentBlockDelta']['delta']['text'], end="")
                output_content = output_content.join(event['contentBlockDelta']['delta']['text'])

            if 'messageStop' in event:
                print(f"\nStop reason: {event['messageStop']['stopReason']}")

            if 'metadata' in event:
                print(event)
                metadata = event['metadata']
                if 'usage' in metadata:
                    print("\nToken usage")
                    print(f"Input tokens: {metadata['usage']['inputTokens']}")
                    print(
                        f":Output tokens: {metadata['usage']['outputTokens']}")
                    print(f":Total tokens: {metadata['usage']['totalTokens']}")
                if 'metrics' in event['metadata']:
                    print(
                        f"Latency: {metadata['metrics']['latencyMs']} milliseconds")

    return output_content


### Text Classification


In [None]:
system_prompt = '''
You are an AI assistant tasked with classifying data based on its sensitivity level. The sensitivity levels and their definitions are:

Sensitive: Data that is to have the most limited access and requires a high degree of integrity. This is typically data that will do the most damage to the organization should it be disclosed.
Confidential: Data that might be less restrictive within the company but might cause damage if disclosed.
Private: Private data is usually compartmental data that might not do the company damage but must be kept private for other reasons. Human resources data is one example of data that can be classified as private.
Proprietary: Proprietary data is data that is disclosed outside the company on a limited basis or contains information that could reduce the company's competitive advantage, such as the technical specifications of a new product.
Public: Public data is the least sensitive data used by the company and would cause the least harm if disclosed. This could be anything from data used for marketing to the number of employees in the company.

For each user inquery provided, classify it into one of the above sensitivity levels. Do not include the word "Category". Do not provide explanations or notes.
'''

In [None]:
messages = [
    {
        "role": "user",
        "content": [{"text": "I'm an HR recruiter. What data classifiction category are resumes gathered based on referral by employees?"}]
    }
]

invoke_model_with_stream(system_prompt, messages)

In [None]:
messages = [
    {
        "role": "user",
        "content": [{"text": "I require the financial statements for the past three fiscal years."}]
    }
]
invoke_model_with_stream(system_prompt, messages)

In [None]:
messages = [
    {
        "role": "user",
        "content": [{"text": "Company's source code"}]
    }
]
invoke_model_with_stream(system_prompt, messages)

### Fraudulent Call Detection

Mistral Small 3 analyzes transcripts of suspicious phone calls, identifying common deception tactics and social engineering patterns used by scammers. The system automatically flags potential fraud indicators - like urgency manipulation, impersonation of authority figures, and unusual payment requests - helping financial institutions and call centers rapidly detect and respond to emerging scam attempts while protecting vulnerable customers.

In [None]:
system_prompt = '''
Please analyze this call/message for potential scam activity. Rate each indicator as 1 (present) or 0 (absent):

[ID] Missing/incomplete identification (name/company/ID): [0/1]
[OFFER] Suspicious offers or too-good-to-be-true promises: [0/1]
[VAGUE] Non-specific references instead of account details: [0/1]
[REDIRECT] Unsolicited direction to unofficial channels: [0/1]
[URGENT] Pressure tactics or urgent deadlines: [0/1]
Total flags: [X/5]
Brief analysis: [1-2 sentence conclusion]'''

user_prompt = ''' 
Hi there, this is Jessie calling in regards to your Honda warranty. The warranty is up for renewal. 
Iâ€™d like to congratulate you on your $2,000 instant rebate and free maintenance and oil change package for being a loyal customer. 
Call me back at 934-153-XXXX to redeem now. Once again that number was 934-153-XXXX. Thank you so much. Have a great day. 
'''

In [None]:
messages = [
    {"role": "user", "content": [{"text": user_prompt}]}
]
invoke_model_with_stream(system_prompt, messages)

### Virtual Customer Service
The customer service demonstration showcases Mistral Small 3's prowess in maintaining contextual awareness during technical support conversations. We test its ability to provide accurate AWS-specific guidance while maintaining conversation flow and context. This example particularly highlights the model's fast-response capabilities and efficient memory management in multi-turn dialogues, essential features for real-world customer support applications.

## 3. Building a Pipeline of Open Models Combining Multiple AWS Services:

### Multimodal Content Generation for Technical Documentation

Using a combination of a Vision Language Model (VLM) and a Language Model (LLM) can be a powerful approach for generating comprehensive and user-friendly technical documentation. The VLM serves as the visual processing engine, capable of analyzing and extracting relevant information from various types of technical images, diagrams, and schematics. This could include identifying key components, systems, and features present in the visual content.



In [None]:
# Vision Language Model (VLM) Prompt
vlm_prompt = """
Analyze the provided technical image and identify the key components, systems, and features present. For each identified element, provide a concise label or description that accurately represents its function and role within the overall system or design.
"""

# Language Model (LLM) Prompt
llm_prompt = """
Based on the technical elements identified in the provided image, please generate a detailed, user-friendly explanation of the overall system or device. The explanation should cover the purpose, functionality, and interrelationships of the key components in a clear and concise manner, suitable for a non-technical audience.
"""

def get_vlm_response(image_path):
    """
    Function to call the Vision Language Model and extract the technical components.
    """
    # Call the VLM and get the response
    vlm_response = get_mistral_response(vlm_prompt, image_path, temperature=0.1)
    return vlm_response

def get_llm_response(vlm_response):
    """
    Function to call the Language Model and generate the technical documentation.
    """
    # Call the LLM and get the response
    llm_messages = [
        {
            "role": "user",
            "content": [{"text": vlm_response}]
        }
    ]
    invoke_model_with_stream(llm_prompt, llm_messages)

def run_pipeline(image_path):
    """
    Main function to run the pipeline.
    """
    # Get the VLM response
    vlm_response = get_vlm_response(image_path)
    print("VLM Response:")
    print(vlm_response)

    # Get the LLM response
    get_llm_response(vlm_response)

In [None]:

# Example usage
image_path = "technical_diagram.jpg"
run_pipeline(image_path)

In [None]:

# Example usage
image_path = "robotics.png"
run_pipeline(image_path)

### Agentic Pipeline

In [None]:
# TODO

### Clean up

In [None]:
bedrock_client.delete_marketplace_model_endpoint(endpointArn=endpoint_arn)