# Optimizing Enterprise Use Cases with Mistral Small 3: High-Performance Large Language Model with Low Latency


This notebook explores Mistral Small 3, a 24B parameter Large Language Model that achieves remarkable performance while maintaining exceptional efficiency. Released under Apache 2.0 license, the model demonstrates 81% MMLU accuracy and processes 150 tokens per second, rivaling larger models like Llama 3.3 70B while operating three times faster on identical hardware. Through practical examples in fraud detection, customer service, sentiment analysis, and emergency triage, we showcase its versatility in handling complex enterprise tasks while maintaining rapid response times.

---

## Model Card

**Available regions:** *us-east-2, eu-west-3*

**Model ID:** [*Mistral-Small-24B-Instruct-2501*](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)

**Multilingual:** *Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.*

**Agent-Centric:** *Offers best-in-class agentic capabilities with native function calling and JSON outputting.*

**Advanced Reasoning:** *State-of-the-art conversational and reasoning capabilities.*

**Apache 2.0 License:** *Open license allowing usage and modification for both commercial and non-commercial purposes.*

**Context Window:** *A 32k context window.*

**System Prompt:** *Maintains strong adherence and support for system prompts.*

**Tokenizer:** *Utilizes a Tekken tokenizer with a 131k vocabulary size.*

---
## Benchmarks

### Human evaluations

| Category | Gemma-2-27B | Qwen-2.5-32B | Llama-3.3-70B | Gpt4o-mini |
|----------|-------------|--------------|---------------|------------|
| Mistral is better | 0.536 | 0.496 | 0.192 | 0.200 |
| Mistral is slightly better | 0.196 | 0.184 | 0.164 | 0.204 |
| Ties | 0.052 | 0.060 | 0.236 | 0.160 |
| Other is slightly better | 0.060 | 0.088 | 0.112 | 0.124 |
| Other is better | 0.156 | 0.172 | 0.296 | 0.312 |

### Instruct performance

**Reasoning & Knowledge**

| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|------------|---------------|--------------|---------------|---------------|-------------|
| mmlu_pro_5shot_cot_instruct | 0.663 | 0.536 | 0.666 | 0.683 | 0.617 |
| gpqa_main_cot_5shot_instruct | 0.453 | 0.344 | 0.531 | 0.404 | 0.377 |

**Math & Coding**

| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|------------|---------------|--------------|---------------|---------------|-------------|
| humaneval_instruct_pass@1 | 0.848 | 0.732 | 0.854 | 0.909 | 0.890 |
| math_instruct | 0.706 | 0.535 | 0.743 | 0.819 | 0.761 |

**Instruct following**

| Evaluation | mistral-small-24B-instruct-2501 | gemma-2b-27b | llama-3.3-70b | qwen2.5-32b | gpt-4o-mini-2024-07-18 |
|------------|---------------|--------------|---------------|---------------|-------------|
| mtbench_dev | 8.35 | 7.86 | 7.96 | 8.26 | 8.33 |
| wildbench | 52.27 | 48.21 | 50.04 | 52.73 | 56.13 |
| arena_hard | 0.873 | 0.788 | 0.840 | 0.860 | 0.897 |
| ifeval | 0.829 | 0.8065 | 0.8835 | 0.8401 | 0.8499 |


---
## Deploy model from Amazon Bedrock Marketplace

Please follow below instruction: 
1. On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane.
2. Search for "Mistral-Small-24B-Instruct-2501" in the ‘Filter for a model’ input box.
![Model Catalog](./images/search-model.png)

<br>

3. To begin using Mistral Small 3 model, choose Deploy.

![Model Information](./images/model-information.png)

<br>

4. For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters).
5. For Number of instances, enter a number of instances (between 1–100).
6. For Instance type, choose your instance type. For optimal performance with Mistral Small 3, a GPU-based instance type like ml.g6.12xlarge is recommended.

Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.

![Model Configuration](./images/model-configuration.png)


**❗Note: Model deployment may take 10 mins.**

In [None]:
import boto3
import json
import sagemaker
import time
from PIL import Image
from botocore.exceptions import ClientError

**Copy the endpoint ARN from AWS Console and set this in endpoint_arn variable below.**

![Model Endpoint](./images/model-endpoint.png)

In [None]:
endpoint_arn = '<REPLACE THIS WITH ENDPOINT ARN FROM AWS CONSOLE>'

In [None]:
# Set up a function to invoke mistral model via Bedrock Converse  

inferenceConfig={
    'maxTokens': 2000,
    'temperature':0.1,
}

def run_mistral_converse(user_prompt, system_prompt=None, model_id=endpoint_arn, inferenceConfig=inferenceConfig, additionalConfig=None):
    bedrock = boto3.client('bedrock-runtime', 'us-west-2')
    
    request_payload = {
        "modelId": model_id,
        "messages": [
            {
                "role": "user",
                "content": [{"text": user_prompt}]
            }
        ],
        "inferenceConfig": inferenceConfig,
        "additionalModelRequestFields": additionalConfig
    }
    
    # Only add 'system' if system_prompt is provided
    if system_prompt:
        request_payload["system"] = [{"text": system_prompt}]
    
    response = bedrock.converse(**request_payload)
    
    return response['output']['message']['content'][0]['text']


## Use Case 1: Fraud Detection

In this example, we evaluate Mistral Small 3's ability to detect financial fraud and security threats. The model demonstrates advanced pattern recognition capabilities essential for financial services, including real-time phishing detection, transaction anomaly identification, and risk assessment. This application highlights the model's strength in providing structured reasoning with low-latency responses, crucial for time-sensitive security decisions.

### Phishing Email detection

Mistral Small 3 analyzse email content and detect sophisticated phishing attempts that evade traditional rule-based filters. The model examines linguistic patterns, sender impersonation tactics, and psychological manipulation techniques to identify both common and novel phishing strategies.

In [None]:
system_prompt = '''
Please analyze this email for potential phishing activity. Rate each indicator as 1 (present) or 0 (absent):

[SENSITIVE] Requests for sensitive data (passwords, financial, personal): [0/1]
[DOMAIN] Sender's email domain mismatches claimed company: [0/1]
[LINKS] Embedded links don't match legitimate organization domain: [0/1]
[GREETING] Generic greetings used instead of personal name: [0/1]
[ERRORS] Presence of spelling or grammatical mistakes: [0/1]
[URGENT] Uses urgent or threatening language to create panic: [0/1]

Total flags: [X/6]
Brief analysis: [1-2 sentence conclusion on whether this appears to be a phishing attempt and why]
'''
user_prompt = '''
To: john.doe@company.com
Subject: URGENT: Your Octank account has been locked

Dear Valued Customer,

We have detected unusual login activity on your Octank account from an unrecognized device. As a precautionary measure, we have temporarily limited access to your account.

To restore full account access, please verify your identity within 24 hours by clicking below:

https://0ctank-account-verify.com/unlock-account

If you do not complete this verification process within 24 hours, your account will be permanently suspended.

Best regards,
Account Security Team

-------------------
This is an automated security alert from Octank
For support: support@octank-accounts.net

'''

In [None]:
print(run_mistral_converse(user_prompt, system_prompt))

### Fraudulent Call Detection

Mistral Small 3 analyzes transcripts of suspicious phone calls, identifying common deception tactics and social engineering patterns used by scammers. The system automatically flags potential fraud indicators - like urgency manipulation, impersonation of authority figures, and unusual payment requests - helping financial institutions and call centers rapidly detect and respond to emerging scam attempts while protecting vulnerable customers.

In [None]:
system_prompt = '''
Please analyze this call/message for potential scam activity. Rate each indicator as 1 (present) or 0 (absent):

[ID] Missing/incomplete identification (name/company/ID): [0/1]
[OFFER] Suspicious offers or too-good-to-be-true promises: [0/1]
[VAGUE] Non-specific references instead of account details: [0/1]
[REDIRECT] Unsolicited direction to unofficial channels: [0/1]
[URGENT] Pressure tactics or urgent deadlines: [0/1]
Total flags: [X/5]
Brief analysis: [1-2 sentence conclusion]'''

user_prompt = ''' 
Hi there, this is Jessie calling in regards to your Honda warranty. The warranty is up for renewal. 
I’d like to congratulate you on your $2,000 instant rebate and free maintenance and oil change package for being a loyal customer. 
Call me back at 934-153-XXXX to redeem now. Once again that number was 934-153-XXXX. Thank you so much. Have a great day. 
'''

In [None]:
print(run_mistral_converse(user_prompt, system_prompt))

## Use Case 2: Virtual Customer Service

The customer service demonstration showcases Mistral Small 3's prowess in maintaining contextual awareness during technical support conversations. We test its ability to provide accurate AWS-specific guidance while maintaining conversation flow and context. This example particularly highlights the model's fast-response capabilities and efficient memory management in multi-turn dialogues, essential features for real-world customer support applications.

In [None]:
from typing import Dict, List
from IPython.display import Markdown, display


def chat_history_to_string(memory):
    history_str = ""
    for chat_item in memory:
        role = chat_item.get("role", "")
        content = chat_item.get("content", "")
        history_str += f"{role}: {content}\n\n"
    return history_str.strip()


def format_conversation(user_input: str, memory: List[Dict[str, str]] = []) -> str:
    
    history = chat_history_to_string(memory)
    
    prompt = f"""
    You are a knowledgeable helpful AWS customer service assistant. You are helpful and provide general guidance from the context less than 100 words in the scope.
    {history} 
    {user_input}
    """
    return prompt


def chat_with_agent(user_input: str, memory: List[Dict[str, str]]):
    # Format the conversation using the helper functions
    formatted_prompt = format_conversation(user_input, memory)
    
    # Get response
    response_content = run_mistral_converse(user_prompt = formatted_prompt)
    
    # Display response
    display(Markdown(response_content))
    
    # Update memory
    memory.append({"role": "user", "content": user_input})
    memory.append({"role": "assistant", "content": response_content})
    
    return response_content

In [None]:
memory = []
chat_with_agent("Hi there", memory);

In [None]:
chat_with_agent("How to select an EC2 instance type?", memory);

In [None]:
chat_with_agent("Cool. Will that work for my Linux workload?", memory);

## Use Case 3: Sentiment Analysis

Our sentiment analysis implementation leverages Mistral Small 3's high-throughput text processing capabilities, focusing on financial news and market sentiment. The model exhibits consistent performance in categorical classification while maintaining rapid processing speeds. This example demonstrates the model's ability to handle varied content types while providing reliable, structured outputs suitable for automated trading and market analysis systems.

In [None]:
system_prompt = """ Respond with "positive" or "neutral" or "negative" and nothing else for the following 3 statements. """

user_prompt1 = """ I absolutely love this new phone, it's amazing how fast it works and the camera quality is incredible. """
user_prompt2 = """ The weather today is cloudy with some sun, temperature around 70 degrees. """
user_prompt3 = """ This is the worst customer service I've ever experienced. They lost my order and refused to help me resolve the issue. """

In [None]:
user_prompt = user_prompt1+user_prompt2+user_prompt3
print(run_mistral_converse(user_prompt,system_prompt))

## Use Case 4: Healthcare

Healthcare professionals struggle to efficiently process large volumes of medical text under time pressure. These examples showcases how Mistral Small 3 can transform complex medical content - from patient records to research papers - into focused, actionable insights. Mistral extracts and prioritizes critical information while maintaining clinical accuracy and relevance, enabling healthcare providers to make better-informed decisions quickly. 

### Medical Research Paper Analysis

Medical professionals face an overwhelming challenge keeping pace with the exponentially growing body of clinical research, where thousands of new studies are published daily across numerous specialties. Mistral Small 3 can help clinicians efficiently process and evaluate research papers by automatically identifying crucial elements including study design, population characteristics, primary outcomes, statistical significance, and potential methodological limitations.

**Dataset Details:**

+ **Repository:** [PubMed Central (PMC)](https://pmc.ncbi.nlm.nih.gov/) 
+ Research article: [Clinical Trials and Clinical Research: A Comprehensive Review](https://pmc.ncbi.nlm.nih.gov/articles/PMC10023071/pdf/cureus-0015-00000035077.pdf)

**Description:**

PubMed Central (PMC) serves as a central repository to archive biomedical and life sciences literature maintained by the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM) for free full-text research papers, enabling broad access to scientific knowledge in the medical and life sciences domains.

Key aspects:

+ Acts as a freely accessible digital repository of full-text biomedical and life sciences research articles, promoting open access to scientific literature
+ Maintained by NIH/NLM, ensuring high-quality curation and reliable archiving of scientific publications
+ Facilitates comprehensive access to clinical trials and research papers, supporting evidence-based medical practice and research advancement

In [None]:
import requests

def download_pdf(url, output_filename):
    # Headers to mimic a browser request
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
        'Referer': 'https://www.google.com'
    }
    
    try:
        # Send GET request with headers
        response = requests.get(url, headers=headers, stream=True)
        
        # Check if the request was successful
        if response.status_code == 200:
            return response.content
        else:
            print(f"Failed to download PDF. Status code: {response.status_code}")
            
    except Exception as e:
        print(f"An error occurred: {str(e)}")

In [None]:
url = "https://pmc.ncbi.nlm.nih.gov/articles/PMC10023071/pdf/cureus-0015-00000035077.pdf"
filename = 'Clinical Trials and Clinical Research.pdf'
research_article = download_pdf(url, filename)

In [None]:
import io
from PyPDF2 import PdfReader

def parse_pdf_bytes(pdf_bytes):
    """Parses PDF content from bytes and returns the text."""
    pdf_file = io.BytesIO(pdf_bytes)
    reader = PdfReader(pdf_file)
    text_content = ""
    for page in reader.pages:
        text_content += page.extract_text() + "\n"
    return text_content

In [None]:
doc = parse_pdf_bytes(research_article)

In [None]:
prompt = f""""
You are a medical research analyst helping clinicians evaluate scientific papers. For each paper:
- Summarize the key findings and clinical implications
- Evaluate the methodology and study design
- Highlight potential limitations or biases
- Compare findings with existing literature
- Suggest practical applications
Use evidence-based analysis and maintain scientific rigor. Flag any potential conflicts of interest or statistical concerns.

<RESEARCH_PAPER>
{doc}
</RESEARCH_PAPER>
"""

In [None]:
%%time
print(run_mistral_converse(user_prompt=prompt))
# this step may take 1 min

### Clinical Note Summarization

Emergency department physicians face the critical challenge of rapidly processing extensive patient histories while making time-sensitive medical decisions. This example showcases how to transform lengthy clinical documentation into concise, clinically relevant summaries. Mistral Small 3 analyzes multiple sources including previous visit notes, discharge summaries, and specialist consultations, extracting key information about past medical conditions, current medications, allergies, and recent interventions.

**Dataset Details:**

+ **Dataset Name:** [aisc-team-a1/augmented-clinical-notes](https://huggingface.co/datasets/aisc-team-a1/augmented-clinical-notes) 
+ **Language(s):** English only
+ **Repository:** [EPFL-IC-Make-Team/ClinicalNotes](https://github.com/EPFL-IC-Make-Team/medinote)
+ Paper: MediNote: Automated Clinical Notes

**Description:**

This is a medical dataset developed for the AISC class at Harvard Medical School, comprising 30,000 clinical note triplets. The dataset combines real clinical notes from PMC-Patients, synthetic doctor-patient dialogues (NoteChat), and structured patient information, serving as the foundation for training MediNote-7B and MediNote-13B clinical note generators.

Key components:

+ Real clinical notes are sourced from PubMed Central case studies through the PMC-Patients dataset, providing detailed patient summaries, medical histories, treatments, and outcomes
+ Synthetic dialogues were generated using GPT-3.5 to address the scarcity of real patient-doctor conversation data due to confidentiality constraints
+ Structured patient information was extracted from the 30,000 longest clinical notes using GPT-4, following a specialized medical information template

In [None]:
from datasets import load_dataset

In [None]:
dataset = load_dataset("aisc-team-a1/augmented-clinical-notes", split='train[:1%]')

In [None]:
medical_note = dataset['full_note'][0]

prompt = f""""
You are a clinical documentation specialist tasked with summarizing patient records. Focus on:
- Key diagnoses and dates
- Current medications and allergies
- Recent procedures or hospitalizations
- Relevant family history
- Critical lab values
Present information in a structured format prioritizing recent and clinically significant findings. Flag any critical values or concerning trends. Use standard medical terminology and maintain all dates and specific values exactly as provided.

<MEDICAL_NOTE>
{medical_note}
</MEDICAL_NOTE>
"""

In [None]:
%%time
print(run_mistral_converse(user_prompt=prompt))

## Conclusions:

Mistral Small 3 establishes itself as a powerful, versatile model that successfully balances performance with efficiency. Its ability to match larger models while maintaining significantly lower latency makes it particularly suitable for production deployments. The model's Apache 2.0 license and efficient resource utilization make it an attractive option for both enterprise applications and local deployments. Our testing confirms its capability to handle complex tasks while maintaining the speed and reliability necessary for real-world applications.

## Cleanup

It's important to cleanup the provisioned resources to avoid incurring costs. You have two options to delete the endpoint created in this notebook.
In AWS Console, navigate to Amazon Bedrock service and click on Marketplace deployments under Foundation models. Here, select the deployed endpoint and click on Delete button. 

![Delete-Endpoint](./images/endpoint-delete.png)

Upon clicking the Delete button, a confirmation popup shows up. Here you read the warning carefully and confirm deletion.


Or you can run below cell to delete the endpoint.

In [None]:
bedrock_client=boto3.client('bedrock')
bedrock_client.delete_marketplace_model_endpoint(endpointArn=endpoint_arn)