# Mistral.rs Python API Cookbook

The Mistral.rs API is broken into 2 parts: loading and running. We provide several loader classes which can create a `Runner` class.
Lets look at an example of loading a Mistral GGUF model.

In [None]:
from mistralrs import MistralLoader, QuantizedLoader, ChatCompletionRequest

loader = QuantizedLoader(
    MistralLoader,
    is_gguf=True,
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()
res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)

Lets walk through this code.
```python
from mistralrs import MistralLoader, QuantizedLoader, ChatCompletionRequest
```

This imports the requires classes for our example. The `QuantizedLoader` needs to know which model architecture to load, and so we also import the `MistralLoader` which implements that functionality. The `ChatCompletionRequest` is an OpenAI API compatible class which allows you to send requests.

```python
loader = QuantizedLoader(
    MistralLoader,
    is_gguf=True,
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
```

Here, we tell the `QuantizedLoader` to get ready to load a Mistral model (because we gave it a `MistralLoader`) with the model ID and other information that we specified. It is also important to specify if the quantized model is GGUF, if it is not (i.e. it is GGML), the `is_gguf=False` should be passed.

```python
runner = loader.load()
```

This tells the `QuantizedLoader` to actually load the model. It will use a CUDA, Metal, or CPU device depending on what `features` you set during compilation: [here](https://github.com/EricLBuehler/mistral.rs?tab=readme-ov-file#supported-accelerators).

```python
res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)
```

Now we actually send a request! We can specify the messages just like with an OpenAI API.

## Loading a Mistral + GGUF model

In [None]:
from mistralrs import MistralLoader, QuantizedLoader

loader = QuantizedLoader(
    MistralLoader,
    is_gguf=True,
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()

## Loading a plain Mistral model

In [None]:
from mistralrs import MistralLoader, NormalLoader

loader = NormalLoader(
    MistralLoader,
    is_gguf=True,
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    no_kv_cache=False,
    repeat_last_n=64,
)

## Loading an X-LoRA Zephyr model

In [None]:
from mistralrs import MistralLoader, XLoraLoader

loader = XLoraLoader(
    MistralLoader,
    model_id="HuggingFaceH4/zephyr-7b-beta",
    no_kv_cache=False,
    repeat_last_n=64,
    xlora_model_id="lamm-mit/x-lora",
    order_file="xlora-paper-ordering.json",
)

## Running the Runner

In [None]:
from mistralrs import ChatCompletionRequest

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)
