<img src="images/ragna-logo.png" width=15% align="right"/>

# Use Local LLM with Ragna

<hr>

## Create a new Ragna assistant

TODO - Add explanations for `make_prompt`, `answer` functions


<details>
<summary> <b>Expand to read <code>local_llm.py</code> → </b></summary>

```python
from pathlib import Path
from typing import Iterator

from ragna.core import Assistant, PackageRequirement, Source


class Mistral7BInstruct(Assistant):
    @classmethod
    def display_name(cls):
        return "turboderp/Mistral-7B-v0.2-exl2"

    @classmethod
    def requirements(cls):
        return [
            PackageRequirement("exllamav2"),
            PackageRequirement("torch"),
        ]

    @classmethod
    def is_available(cls):
        requirements_available = super().is_available()
        if not requirements_available:
            return False

        import torch

        return torch.cuda.is_available()

    def __init__(self):
        super().__init__()
        from exllamav2 import (
            ExLlamaV2,
            ExLlamaV2Cache,
            ExLlamaV2Config,
            ExLlamaV2Tokenizer,
        )
        from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2StreamingGenerator

        config = ExLlamaV2Config()
        config.model_dir = str(Path.home() / "shared/analyst/models" / self.display_name())
        config.prepare()

        self.tokenizer = ExLlamaV2Tokenizer(config)

        model = ExLlamaV2(config)
        cache = ExLlamaV2Cache(model, lazy=True)
        model.load_autosplit(cache)
        self.generator = ExLlamaV2StreamingGenerator(model, cache, self.tokenizer)
        self.generator.set_stop_conditions({self.tokenizer.eos_token})

        self.settings = ExLlamaV2Sampler.Settings()
        self.settings.temperature = 0.0

    def make_prompt(self, prompt: str, sources: list[Source]) -> str:
        return "".join(
            [
                f"<s>[INST] ",
                f"You are a helpful assistant that answers prompts by only using the documents listed below. ",
                f"Each individual document is started pattern <doc> and ended by </doc>. ",
                f"If you can't answer a question based on the sources you are given, just say so. Do not make up information.",
                *[f"<doc> {source.content} </doc>" for source in sources],
                f"Reply with OK if you have understood these instructions.",
                f" [/INST]OK</s>[INST] {prompt} [/INST]",
            ]
        )

    def answer(
        self, prompt: str, sources: list[Source], *, max_new_tokens: int = 256
    ) -> Iterator[str]:
        input_ids = self.tokenizer.encode(
            self.make_prompt(prompt, sources), add_bos=False
        )

        self.generator.begin_stream_ex(input_ids, self.settings)

        for _ in range(max_new_tokens):
            result = self.generator.stream_ex()
            if result["eos"]:
                break
            yield result["chunk"]
```

</details>

## Use the assistant

In [5]:
from local_llm import Mistral7BInstruct

In [6]:
Mistral7BInstruct.display_name()

'turboderp/Mistral-7B-v0.2-exl2'

In [11]:
Mistral7BInstruct.is_available()

False

In [9]:
assistant = Mistral7BInstruct()

In [12]:
from ragna import Rag, source_storages

documents = [
    "files/psf-report-2022.pdf",
]

chat = Rag().chat(
    documents=documents,
    source_storage=source_storages.Chroma,
    assistant=assistant,
)

await chat.prepare()

Message(content=How can I help you with the documents?, role=MessageRole.SYSTEM, sources=[])

In [13]:
message = await chat.answer("What is the Developer in Residence program?", stream=True)

async for chunk in message:
    print(chunk, end="")

The Developer in Residence program is a new initiative to help the Python community grow and thrive. The program will provide funding for developers to work on projects that benefit the Python community. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to all developers, regardless of their experience level or background. The program will be open to 

In [14]:
for source in message.sources:
    print(source.content)

 1,281K
Grants (9.7%)   |   $ 215K
Packaging Work Group/Infrastructure/Other (26.6%)   |    $ 589K
Fiscal Sponsorees (5.2%)   |    $ 115K
Community Awards & Expenses (0.6%)   |    $ 14K
Code of Conduct (0.1%)   |    $ 1K
Total Program Service Expenses: $2,215K
16
Growth of Assets
Grant Disbursement
PSF Asset Trends from 2017-2022
PSF Grant Disbursement from 2016-2022
($ in thousands)
$5,764
$6,000
$4,903
$4,710
$5,000
$4,134
Python Conference Grants (52.9%)
PyLadies Workshops (1.1%)
$3,531
Outreach & Education (5.7%)
$3,300
$4,000
Other Grants (2.6%)
Meetup Subscription Grant (4.0%)
Kids Coding Camp (1.6%)
$3,000
Equipment & Hardware (0.4%)
Django Girls Workshops (11.8%)
Development (6.0%)
$2,000
Ambassador Program (1.7%)
Workshops (3.4%)
Training (2.7%)
Python Sprints (6.2%)
$1,000
$0
2017
2018
2020
2021
2022
2019
17
Grants by Continents
In 2022, the grants program focused on virtual events, as well as in-person events. 
The PSF distributed $215k in grants in 2022, to 138 groups in 42

<hr>

**✨ Next: TODO →**

<hr>