# Getting Started with `mistral-inference`

This notebook will guide you through the process of running Mistral models locally. We will cover the following:
- How to chat with Mistral 7B Instruct
- How to run Mistral 7B Instruct with function calling capabilities

We recommend using a GPU such as the A100 to run this notebook.

In [1]:
!pip install mistral-inference

Collecting mistral-inference
  Downloading mistral_inference-1.6.0-py3-none-any.whl.metadata (17 kB)
Collecting fire>=0.6.0 (from mistral-inference)
  Downloading fire-0.7.0.tar.gz (87 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.2/87.2 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mistral_common>=1.5.4 (from mistral-inference)
  Downloading mistral_common-1.5.4-py3-none-any.whl.metadata (4.5 kB)
Collecting xformers>=0.0.24 (from mistral-inference)
  Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting tiktoken>=0.7.0 (from mistral_common>=1.5.4->mistral-inference)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.6.0->xformers>=0.0.24->mistral-inference)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.w

## Download Mistral 7B Instruct

In [2]:
!wget https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar

--2025-04-10 20:56:30--  https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar
Resolving models.mistralcdn.com (models.mistralcdn.com)... 172.67.70.68, 104.26.6.117, 104.26.7.117, ...
Connecting to models.mistralcdn.com (models.mistralcdn.com)|172.67.70.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14496675840 (14G) [application/x-tar]
Saving to: ‘mistral-7B-Instruct-v0.3.tar’


2025-04-10 21:03:19 (33.9 MB/s) - ‘mistral-7B-Instruct-v0.3.tar’ saved [14496675840/14496675840]



In [None]:
!DIR=$HOME/mistral_7b_instruct_v3 && mkdir -p $DIR && tar -xf mistral-7B-Instruct-v0.3.tar -C $DIR

In [None]:
!ls mistral_7b_instruct_v3

## Chat with the model

In [None]:
import os

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

# load tokenizer
mistral_tokenizer = MistralTokenizer.from_file(os.path.expanduser("~")+"/mistral_7b_instruct_v3/tokenizer.model.v3")
# chat completion request
completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
# encode message
tokens = mistral_tokenizer.encode_chat_completion(completion_request).tokens
# load model
model = Transformer.from_folder(os.path.expanduser("~")+"/mistral_7b_instruct_v3")
# generate results
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=mistral_tokenizer.instruct_tokenizer.tokenizer.eos_id)
# decode generated tokens
result = mistral_tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)

## Function calling

Mistral 7B Instruct v3 also supports function calling!

Let's start by creating a function calling example

In [None]:
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import Function, Tool

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

Since we have already loaded the tokenizer and the model in the example above. We will skip these steps here.

Now we can encode the message with our tokenizer using `MistralTokenizer`.

In [None]:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokens = mistral_tokenizer.encode_chat_completion(completion_request).tokens

and run `generate` to get a response. Don't forget to pass the EOS id!

In [None]:
from mistral_inference.generate import generate

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=mistral_tokenizer.instruct_tokenizer.tokenizer.eos_id)

Finally, we can decode the generated tokens.

In [None]:
result = mistral_tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens)[0]
result