# Transitioning from OpenAI to Mistral: A Guide

*This notebook should work well with the Data Science 3.0 kernel in SageMaker Studio*

This guide is designed to assist you in transitioning your OpenAI prompts and workloads to Mistral Models. Our goal is to demonstrate how Mistral models can match the capabilities of their OpenAI counterparts in handling specific tasks. While this notebook may not cover every edge case or unique use case scenario, it will provide a strong foundation for starting your journey with Mistral.

We will begin by reviewing how to format prompts and create a chat completion interface. Additionally, we will highlight the performance and efficiency of Mistral models through various examples, illustrating their effectiveness in different applications 

## Mistral Large

Mistral Large is the top-tier reasoning model for high-complexity tasks. The most powerful model of the Mistral AI family.

-Fluent in English, French, Italian, German, Spanish, and strong in code<br>
-Context window of 32k tokens, with excellent recall for retrieval augmentation<br>
-Native function calling capacities (coming soon to Bedrock), JSON outputs<br>
-Concise, useful, unopinionated, with fully modular<br>

Mistral Large has the following inference parameters on Bedrock:<br>

{<br>
    "prompt": string,<br>
    "max_tokens" : int,<br>
    "stop" : [string],<br>
    "temperature": float,<br>
    "top_p": float,<br>
    "top_k": int<br>
}<br>

Context window: 32k

## Mixtral 8x7b

Mixtral is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.


-It gracefully handles a context of 32k tokens.<br>
-It handles English, French, Italian, German and Spanish.<br>
-It shows strong performance in code generation.<br>

Mixtral 8x7b has the following inference parameters on Bedrock:<br>

{<br>
    "prompt": string,<br>
    "max_tokens" : int,<br>
    "stop" : [string],<br>
    "temperature": float,<br>
    "top_p": float,<br>
    "top_k": int<br>
}<br>

Context window: 32k


## Benchmarks


Mistral-Large


| Model        | MMLU  | HellaS | WinoG | Arc C (5) | Arc C (25) | TriQA | TruthfulQA |
|--------------|-------|--------|-------|-----------|------------|-------|------------|
| Mistral Large| 81.2% | 89.2%  | 86.7% | 94.2%     | 94.0%      | 82.7% | 50.5%      |
| LLAMA 2 70B  | 69.9% | 87.1%  | 83.2% | 86.0%     | 85.1%      | 77.6% | 44.7%      |
| GPT 3.5      | 70.0% | 85.5%  | 81.6% | 85.2%     | 85.2%      | —     | —          |
| GPT 4        | 86.4% | 95.3%  | 87.5% | —         | 96.3%      | —     | —          |
| Claude 2     | 78.5% | —      | —     | 91.0%     | —          | 87.5% | —          |
| Gemini Pro 1.0 | 71.8% | 84.7% | —     | —         | —          | —     | —          |



Mixtral 8x7b

|                    | LLAMA 2 70B | GPT-3.5 | Mixtral 8x7B |
|--------------------|-------------|---------|--------------|
| **MMLU (MCQ in 57 subjects)** | 69.9%      | 70.0%   | 70.6%        |
| **HellaSwag (10-shot)**       | 87.1%      | 85.5%   | 86.7%        |
| **ARC Challenge (25-shot)**   | 85.1%      | 85.2%   | 85.8%        |
| **WinoGrande (5-shot)**       | 83.2%      | 81.6%   | 81.2%        |
| **MBPP (pass@1)**             | 49.8%      | 52.2%   | 60.7%        |
| **GSM-8K (5-shot)**           | 53.6%      | 57.1%   | 58.4%        |
| **MT Bench (for Instruct Models)** | 6.86       | 8.32    | 8.30         |


In [None]:
import boto3
import json
import time

In [None]:
DEFAULT_MODEL = "mistral.mistral-large-2402-v1:0"
mixtral_model = "mistral.mixtral-8x7b-instruct-v0:1"

In [None]:
class LLM:
    def __init__(self, model_id):
        self.model_id = model_id
        self.bedrock = boto3.client(service_name="bedrock-runtime")
        
    def invoke(self, prompt, temperature=0.0, max_tokens=3000):
        body = json.dumps({
            "temperature": temperature,
            "max_tokens": max_tokens,
            "prompt": prompt,
            "stop": ["</s>"]
        })
        response = self.bedrock.invoke_model(
            body=body, 
            modelId=self.model_id)

        response_body = json.loads(response.get("body").read())
        return response_body['outputs'][0]['text']


DEFAULT_MODEL = "mistral.mistral-large-2402-v1:0"
llm_mistral_large = LLM(DEFAULT_MODEL)


mixtral_model = "mistral.mixtral-8x7b-instruct-v0:1"
llm_mixtral = LLM(mixtral_model)

## How to Format Prompts for Mistral on Bedrock

- Start with `<s>` to indicate the beginning and end of your input.
- Include your instructions between `[INST]` and `[/INST]`.

*`<s>` token denotes the beginning of a prompt - Mistral will understand that there will be no other tokens before this special beginning of sequence token.*

*`<s>` token denotes the end of a prompt - Mistral will understand that there will be no other tokens after this special end of sequence token.*

### Comparison with OpenAI Prompts

OpenAI models use a chat completion API where your instructions for the model go into:

System Role: This role sets the context or instructions for the model.<br>
User Role: This role includes the specific task or question posed to the model.<br>
Assistant Role: This role is the model's response.

Mistral models on Bedrock behave differently. There are no assigned roles - your instructions for the model go between the `[INST]` strings.

Mistral models are designed for text input and text output only; they do not currently support multimodal capabilities.

### Best Practices for Prompt Engineering

Like many LLMs today, Mistral Models generally benefit from the following:

- Be clear in your instructions.
- Delimiters like `###`, `<<< >>>` specify the boundary between different sections of the text.
- A few-shot approach (multiple examples in context) is typically more performative than a zero-shot approach (no examples).
- For more complex reasoning tasks, chain of thought or thinking step by step can help the model reason through problems.
- Bad data in can lead to poor data out - make sure to preprocess your data and if your data is larger than the context window, make sure to chunk your text.

The typical rules of prompt engineering that you are familiar with apply to Mistral as they do to OpenAI.

## Use Mistral Large for your Reasoning Tasks

Mistral Large is the closest comparison to the GPT-4 family of models offered at the time of this writing. If you are looking to migrate from GPT-4 and you want to use Bedrock, then Mistral Large is the model you should evaulate first from Mistral. Similar to GPT-4, Mistral Large excels in handling complex reasoning and mathematical tasks. If you're looking for Mixtral 8x22b, please use SagemMaker Jumpstart. Let's examine a few examples:

In [None]:
reasoning_problem = "<s>[INST] Sara, Mary, and Jane are at a party. Sara is wearing red pants, and the person wearing green pants is standing next to the person in blue pants. Jane is standing next to Sara. What color pants is Jane wearing? [/INST] "

In [None]:
reasoning_prompt = llm_mistral_large.invoke(reasoning_problem, temperature=0.0, max_tokens=300)
print(reasoning_prompt)

Mistral Large thought out loud and correctly answered our trick question

In [None]:
car_calculation = """<s>[INST]

Sara bought a car that is valued at $20,000 today in 2024. According to her research, the car will depreciate in value at a rate of 7% per year. She also went to a car dealership that told her the car would decrease at a rate of 8.5% per year.

Sara wants to plan for the future. She is planning to sell her car in 6 years directly to a friend. What will the car be worth then? Walk Sara through the math and how much the car will be worth starting today to six years from now. [/INST] """

In [None]:
reasoning_prompt_2= llm_mistral_large.invoke(car_calculation, temperature=0.0, max_tokens=500)
print(reasoning_prompt_2)

Mistral Large correctly calculated the depreciation for both Sara & the dealership.

In [None]:
horizon_calculation = """<s>[INST]

Calculate the visible horizon for the following people: 

Preston is 6'3
Amy is 6'1
John is 5'11
Emma is 6'5
Tim is 6'5

Show your work for each person and then decide who can see the farthest.[/INST]"""

In [None]:
horizon_distance= llm_mistral_large.invoke(horizon_calculation, temperature=0.0, max_tokens=500)
print(horizon_distance)

## Creating a Chatbot with Mistral Large on Bedrock

On Bedrock, you can create a chat completion by following these steps:

1. **Send the Initial Prompt**: Start by sending your initial prompt to the model.
2. **Receive the Response**: Get the response from the model.
3. **Append and Format**: For the next turn, append the initial prompt and the response. Bookend the prompts with `[INST]` and the responses with `</s>`.

For more detailed guidance and resources, visit: [How to Prompt Mistral AI Models and Why](https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why?lang=en).

### Example

Below is an example demonstrating how this process works:


In [None]:
# Initial prompt
prompt_3 = "<s>[INST] I enjoy hiking in the mountains. [/INST]"

# Get the first response from the model
chat_completion = llm_mistral_large.invoke(prompt_3, temperature=0.3, max_tokens=100)
print(chat_completion)

# Append the first prompt and the response to create the next prompt
next_prompt = (
    f"<s>[INST] I enjoy hiking in the mountains. [/INST] {chat_completion}</s> "
    "[INST] What are some tips for mountain hiking? [/INST]"
)

# Get the response for the next prompt
next_response = llm_mistral_large.invoke(next_prompt, temperature=0.3, max_tokens=100)
print(next_response)

## Complex Reasoning Task: Bedrock Pricing Calculations

Lastly, we'll perform a complex reasoning task involving Bedrock pricing for models. We'll follow a specific formula for calculations, reasoning, and decision making. This will demonstrate how Mistral Large can handle intricate tasks involving both reasoning and mathematical computations within a chatbot interaction.

In [None]:
def multiturn_conversation(llm, conversation_history):
    """
    Conducts a multi-turn conversation using the provided LLM instance.

    Args:
    - llm: An instance of the LLM class.
    - conversation_history: The conversation history containing prompts and responses.

    Returns:
    - A response from the model for the latest prompt.
    """
    # Join conversation history to create prompt
    prompt = " ".join(conversation_history)

    # Invoke the model with the prompt
    response = llm.invoke(prompt)

    return response

In [None]:
model_data = [
    {"Model": "Claude 3", "Version": "Haiku", "Pricing Metric": "per 1k tokens", "Input Price": 0.00025, "Output Price": 0.00125, "Input Task Size": 0, "Output Task Size": 0, "Throughput": 0, "Input Usage": 0, "Output Usage": 0, "Estimated Cost": 0, "Pricing Model": "On-Demand", "Fine-tuning Available": "No", "Available": "Yes"},
    {"Model": "Claude 3", "Version": "Sonnet", "Pricing Metric": "per 1k tokens", "Input Price": 0.00300, "Output Price": 0.01500, "Input Task Size": 0, "Output Task Size": 0, "Throughput": 0, "Input Usage": 0, "Output Usage": 0, "Estimated Cost": 0, "Pricing Model": "On-Demand", "Fine-tuning Available": "No", "Available": "Yes"},
    {"Model": "Claude 3", "Version": "Opus", "Pricing Metric": "per 1k tokens", "Input Price": 0.01500, "Output Price": 0.07500, "Pricing Model": "On-Demand", "Fine-tuning Available": "No", "Available": "Yes"},
    {"Model": "Claude Instant", "Version": "Claude Instant", "Pricing Metric": "per 1k tokens", "Input Price": 0.00080, "Output Price": 0.00240, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Claude 2.0/2.1", "Version": "Claude 2.0/2.1", "Pricing Metric": "per 1k tokens", "Input Price": 0.00800, "Output Price": 0.02400, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Jurassic", "Version": "Jurassic-2 Mid", "Pricing Metric": "per 1k tokens", "Input Price": 0.01250, "Output Price": 0.01250, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Jurassic", "Version": "Jurassic-2 Ultra", "Pricing Metric": "per 1k tokens", "Input Price": 0.01880, "Output Price": 0.01880, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Command", "Version": "Command-Light", "Pricing Metric": "per 1k tokens", "Input Price": 0.00030, "Output Price": 0.00060, "Pricing Model": "On-Demand", "Price to Train 1,000 Tokens": 0.0010, "Price to Store per Month": 1.95, "Fine-tuning Available": "Yes"},
    {"Model": "Command", "Version": "Command", "Pricing Metric": "per 1k tokens", "Input Price": 0.00150, "Output Price": 0.00200, "Pricing Model": "On-Demand", "Price to Train 1,000 Tokens": 0.0040, "Price to Store per Month": 1.95, "Fine-tuning Available": "Yes"},
    {"Model": "Llama-2", "Version": "Llama-2 Chat (13b)", "Pricing Metric": "per 1k tokens", "Input Price": 0.00075, "Output Price": 0.00100, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Llama-2", "Version": "Llama-2 Chat (70b)", "Pricing Metric": "per 1k tokens", "Input Price": 0.00195, "Output Price": 0.00256, "Pricing Model": "On-Demand", "Fine-tuning Available": "No"},
    {"Model": "Llama-2", "Version": "Llama 2 Pre-trained (13b)", "Pricing Metric":"per 1k tokens", "Price to Train 1,000 Tokens": 0.0015, "Price to Store per Month": 1.95, "Fine-tuning Available": "Yes"} 
]

In [None]:
prompt_4 =f"""<s>[INST]

Data:{model_data}

The table includes model details, such as model version, pricing metric, input and output prices, average token counts for input and output tasks, and requests per minute. Based on this, you will compute the monthly input and output token usage and the estimated cost for each model.

Objective:

You are a generative AI expert and an expert in Amazon ML and AI products and services. Focus only on utilizing models available on Amazon Bedrock provided in the model data.

User Interaction Guide:

1. Understanding the Use Case: Engage with users to comprehend their specific generative AI needs.
2. Data Availability: Inquire if they possess substantial in-house data. This helps determine the necessity for model fine-tuning. Typically, fine-tuning is not recommended unless required for extensive classification tasks or specific copywriting applications.
3. Task Sizes: Ascertain the average token counts for both input (the task size) and output (the result size).
4. Request Frequency: Determine their expected number of requests per minute.

Calculations:

Using the gathered information, calculate the following for the models that you choose for the user:
- Monthly input usage in tokens. (Input task size * Throughput * 60 * 24 * 30)
- Monthly output usage in tokens. (Output task size * Throughput * 60 *24 * 30)
- Estimated monthly cost. (Monthy input usage in tokens * Input price) + (Monthly output usage in tokens * Output price)
- Prices should be calculated per 1000 tokens

It's ok for you to make some assumptions and use averages based on what people tell you. Always round up from one token to many and when you give a recommendation make sure its two different model providers. Always show the customer the math you are doing. 

[/INST]"""


In [None]:
initial_response = llm_mistral_large.invoke(prompt_4, temperature=0.1, max_tokens=1000)
print(initial_response)

Let's run a test to ensure our initial work is working correctly:

In [None]:
next_prompt = f"<s>[INST] {prompt_4} {initial_response}</s> [INST] We have a classification use case that has the following details: each input is a paragraph, each output is a word in english and we expect to be classiying 100 inputs per minute. [/INST]"

# Invoke the model for the next response
next_response = llm_mistral_large.invoke(next_prompt, temperature=0.1, max_tokens=2000)
print("Next response from model:", next_response)

Let's put everything we just reviewed together. Below we will create a multiturn chatbot that takes the user input and the model responses and appends them to the next response.

In [None]:
# Initialize conversation history with the initial prompt and response
conversation_history = [prompt_4, initial_response]

# Conduct the conversation interactively
while True:
    # Get user input
    user_input = input("You: ")

    # Check for exit condition
    if user_input.lower() == 'exit':
        print("Exiting conversation...")
        break

    # Append user input to conversation history
    conversation_history.append(f"</s> [INST] {user_input} [/INST]")

    # Conduct multiturn conversation
    response = multiturn_conversation(llm_mistral_large, conversation_history)

    # Print model response
    print("Model:", response)

    # Update conversation history with model response
    conversation_history.append(f"{response}</s>")


## Additional Examples

Let's examine several examples of Mistral Large completing tasks that would be benefitial to an enterprise - involving translation, classification, and Q&A.

We'll utilize Mistral Large to accomplish common downstream tasks necessary for extracting insights from a call transcript. The transcript we're using is a generated transcript initiated by the author. JSON mode for Mistral Large is coming soon to Bedrock - Large can still output reliable JSON objects when prompted to do so.

In [None]:
transcript = """Customer: Hi, I purchased a green widget from your store about three months ago, and it's not working properly. I need some urgent help troubleshooting it.

Customer Service Agent: I'm sorry to hear that your green widget isn't functioning as expected. I'll do my best to assist you. Could you please provide me with your order number or any other details about your purchase?

Customer: My order number is 123456789. I just can't believe this widget stopped working already. I spent a lot of money on it and expected better quality.

Customer Service Agent: I understand your frustration, and I apologize for the inconvenience. Let's see what we can do to resolve this issue. Can you describe the problem you're encountering with the green widget?

Customer: Well, it seems like the widget doesn't hold a charge anymore. I've tried charging it multiple times, but it still won't turn on. It's incredibly frustrating, especially since I rely on it for my daily tasks.

Customer Service Agent: I see. Let's try a few troubleshooting steps to see if we can find a solution. Could you please try using a different charging cable and power source to rule out any issues with the charger?

Customer Alright, I'll give it a shot. No luck, it's still not turning on.

Customer Service Agent: Thank you for trying that. It sounds like there might be a problem with the widget itself. Since you purchased it three months ago, it should still be under warranty. I can assist you with initiating a warranty claim for a replacement.

Customer: I appreciate that, but I really need a solution as soon as possible. I can't afford to be without a functioning widget for too long, especially with my workload.

Customer Service Agent: I completely understand your urgency, and I'll do my best to expedite the process for you. Can you confirm the serial number of the green widget for me? It should be located on the back or bottom of the device.

Customer: Let me check... Okay, the serial number is GW123456789.

Customer Service Agent: Thank you for providing that. I'll prioritize your warranty claim and ensure it's processed as quickly as possible. You should receive an email confirmation shortly with further instructions on how to proceed.

Customer: Alright, I'll keep an eye out for the email. I hope this gets resolved soon.

Customer Service Agent: I understand, and I'll make sure to follow up to ensure everything is resolved to your satisfaction. If you have any further questions or concerns, feel free to reach out to us. We're here to help."""

In [None]:
transcript_extraction=f"""<s>[Inst] 

Transcript: {transcript}

You are an expert in working with transcripts. Given the transcript below, I need you to accomplish the following tasks: 

1. Extract the intent of the customer/caller
2. Extract the order number, serial number, customer name and details if provided and label them as such
3. Answer the question: Did the service agent resolve the customer issue? Yes/No & why with specific examples from the transcript
4. Grade the overall interaction on a scale of 1-5 with 5 being excellent. Explain why you gave the interaction that grade.

Format your response as the following: 

1: Customer Intent, Customer Issue
2. Product Data
3. Interaction Grade

Format #2 as a JSON object.

[/INST]"""

In [None]:
transcription= llm_mistral_large.invoke(transcript_extraction, temperature=0.0, max_tokens=2000)
print(transcription)

In [None]:
emails= """

"I recently bought your RGB gaming keyboard and absolutely love the customizable lighting features! Can you guide me on how to set up different profiles for each game I play?"
"I'm trying to use the macro keys on the gaming keyboard I just purchased, but they don't seem to be registering my inputs. Could you help me figure out what might be going wrong?"
"I'm considering buying your gaming keyboard and I'm curious about the key switch types. What options are available and what are their main differences?"
"I wanted to report a small issue where my keyboard's space bar is a bit squeaky. However, your quick-start guide was super helpful and I fixed it easily by following the lubrication tips. Just thought you might want to know!"
"My new gaming keyboard stopped working within a week of purchase. None of the keys respond, and the lights don't turn on. I need a solution or a replacement as soon as possible."
"I've noticed that the letters on the keys of my gaming keyboard are starting to fade after several months of use. Is this covered by the warranty?"
"I had an issue where my keyboard settings would reset every time I restarted my PC. I figured out it was due to a software conflict and resolved it by updating the firmware. Just wanted to ask if there are any new updates coming soon?"
"I've been having trouble with the keyboard software not saving my configurations, and it's starting to get frustrating. What can be done to ensure my settings are saved permanently?"
"""

In [None]:
email_tasks = f"""<s> [INST]

You're a customer service expert. Your job is to follow my instructions and perform the actions described below./n

email={emails}

1. [Task 1] Classify the following customer service emails into two categories - Inquiry (a customer question) or Issue (a customer has a problem we need to help them solve)./n
2. [Task 2] Once you classify each customer service email, translate the email into french. /n

The format for output should look like:/n

<Label> /n <english version of email, french translation>/n

[/INST]<s>"""

In [None]:
multi_step = llm_mistral_large.invoke(email_tasks,temperature=0.1, max_tokens=2000)
print(multi_step)

## Using Mixtral 8x7b for Your GPT-3.5-Turbo Tasks

Here we'll review common tasks that GPT-3.5-Turbo would typically handle and showcase how Mixtral can accomplish similar tasks with better performance. We'll also measure the time it takes to execute our API calls, referred to as Wall Time, to illustrate how fast Mixtral is. Generally, this comparison between both models hinges on fast responses and time to first token.

In our examples below, we are looking at the total time to execute our specific code blocks, which includes waiting for external resources and networking requests.

In [None]:
fast_objects="<s>[INST]List a two very fast objects that are man made or created by nature that exist on earth only, not in space.[/INST]<s> "

In [None]:
%time mixtral_response_3=llm_mixtral.invoke(fast_objects, temperature=0.5, max_tokens=500)
print(mixtral_response_3)

### Answering Questions from a Transcript

In this example, we provide a transcript and ask Mixtral to answer specific questions based on the information given.

In [None]:
transcript_mixtral=f"""<s>[INST]

transcript={transcript}

Given the transcript above, answering the following questions. If you do not know the answer, then state 'Answer not provided in transcript'. 

1. What was the customer's name?
2. What color was the widget that the customer was calling about?
3. What was the order number?
4. Did we resolve the customer issue?
5. What was the solution to the customer's problem?
[/INST]<s>"""

In [None]:
%time mixtral_response_1=llm_mixtral.invoke(transcript_mixtral, temperature=0.0, max_tokens=500)
print(mixtral_response_1)

Mixtral was able to provide us with the correct information here; however, Mistral Large would most likely give us a more comprehensive completion.

### Translation

We'll use the generated emails as our context and translate them into the following languages:

In [None]:
translate_mixtral=f"""<s>[INST]

emails={emails}

Translate the following customer emails into these languages only: 

1. French
2. German
3. Spanish

Label each language section accordingly [/INST]"""

In [None]:
%time mixtral_prompt_2=llm_mixtral.invoke(translate_mixtral, temperature=0.0, max_tokens=3000)
print(mixtral_prompt_2)

Mixtral continues to demonstrate fast and efficient performance even with complex and lengthy tasks.