<img src="images/ragna-logo.png" width="200px" align="right"/>

# Use Local LLM with Ragna

<hr>

## Create a new Ragna assistant

@pmeier TODO - Add explanations for `make_prompt`, `answer` functions


<details>
<summary> <b>Expand to read <code>local_llm.py</code> 👇🏼 </b></summary>

```python
from pathlib import Path
from typing import Iterator

from ragna.core import Assistant, PackageRequirement, Source


class Mistral7BInstruct(Assistant):
    @classmethod
    def display_name(cls):
        return "turboderp/Mistral-7B-v0.2-exl2"

    @classmethod
    def requirements(cls):
        return [
            PackageRequirement("torch"),
            PackageRequirement("exllamav2"),
        ]

    @classmethod
    def is_available(cls):
        requirements_available = super().is_available()
        if not requirements_available:
            return False

        import torch

        return torch.cuda.is_available()

    def __init__(self):
        super().__init__()
        from exllamav2 import (
            ExLlamaV2,
            ExLlamaV2Cache,
            ExLlamaV2Config,
            ExLlamaV2Tokenizer,
        )
        from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2StreamingGenerator

        config = ExLlamaV2Config()
        config.model_dir = str(Path.home() / "shared/analyst/models" / self.display_name())
        config.prepare()

        self.tokenizer = ExLlamaV2Tokenizer(config)

        model = ExLlamaV2(config)
        cache = ExLlamaV2Cache(model, lazy=True)
        model.load_autosplit(cache)
        self.generator = ExLlamaV2StreamingGenerator(model, cache, self.tokenizer)
        self.generator.set_stop_conditions({self.tokenizer.eos_token})

        self.settings = ExLlamaV2Sampler.Settings()
        self.settings.temperature = 0.0

    def make_prompt(self, prompt: str, sources: list[Source]) -> str:
        return "".join(
            [
                f"<s>[INST] ",
                f"You are a helpful assistant that answers prompts by only using the documents listed below. ",
                f"Each individual document is started pattern <doc> and ended by </doc>. ",
                f"If you can't answer a question based on the sources you are given, just say so. Do not make up information.",
                *[f"<doc> {source.content} </doc>" for source in sources],
                f"Reply with OK if you have understood these instructions.",
                f" [/INST]OK</s>[INST] {prompt} [/INST]",
            ]
        )

    def answer(
        self, prompt: str, sources: list[Source], *, max_new_tokens: int = 256
    ) -> Iterator[str]:
        input_ids = self.tokenizer.encode(
            self.make_prompt(prompt, sources), add_bos=False
        )

        self.generator.begin_stream_ex(input_ids, self.settings)

        for _ in range(max_new_tokens):
            result = self.generator.stream_ex()
            if result["eos"]:
                break
            yield result["chunk"]
```

</details>

## Use the assistant

You can directly import and start using Mistral 7B.

In [None]:
from local_llm import Mistral7BInstruct

In [None]:
Mistral7BInstruct.display_name()

In [None]:
Mistral7BInstruct.is_available()

In [None]:
assistant = Mistral7BInstruct()

Let's share the Python Software Foundation's annual reports, and ask questions about the PSF.

In [None]:
from ragna import Rag, source_storages

documents = [
    "files/psf-report-2021.pdf",
    "files/psf-report-2022.pdf",
]

chat = Rag().chat(
    documents=documents,
    source_storage=source_storages.Chroma,
    assistant=assistant,
)

await chat.prepare()

In [None]:
message = await chat.answer("Who is the Python Developer in Residence?", stream=True)

async for chunk in message:
    print(chunk, end="")

We can also verify the sources used.

In [None]:
for idx, source in enumerate(message.sources, 1):
    print(f"{idx}. {source.document.name}: {source.location}:\n")
    print(source.content)
    print("#" * 80)

<hr>

_❗️ **Warning:** Make sure to stop the Jupyter Kernel (in the JupyterLab Menu Bar, click on "Kernel" -> "Interrupt Kernel") before proceeding._

<br>

**✨ Next: [RAG and LLM Experiments](04-UI-and-experiments.ipynb) →**

<hr>