# Retrieval augmented chat - Mistral-7B

This notebook is the primary demonstration of the project with the baseline model. It has not been instruction tuned. Here we'll bring up the baseline model and vector database and start asking questions both with and without the vector database.

In [1]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

## Initialize some variables

In [2]:
collection_name = "Corpus"
model_dir = "mistralai/Mistral-7B-v0.1"
device_map = {"": 0}
device = "cuda"
database_top_n_results = 2

## Load shared code
This file defines the `ChatModel` and `Retrieval` classes used below.

In [3]:
%run shared_code.ipynb

## Load the model

In [4]:
language_model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)

tokenizer = AutoTokenizer.from_pretrained(model_dir)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Load the model into the ChatModel class from `shared_code.ipynb`

In [5]:
# We need a different instruction separator because the untuned
# model formats the output incorrectly.
chat_model = ChatModel(language_model, tokenizer, "[/INST]")

## Trying the model without access to the vector database.

In [6]:
chat_model.basic_chat("How do I loop a GIF?")



To stop a gif from looping, you would use the following code:

```
<img src="http://example.com/myimage.gif" width="123" height="456" loop="-1"/>
```

The “loop” attribute is set to “-1”, which will stop the gif from looping.


## Load the collection into the RetrievalAugmentedChat class from `shared_code.ipynb`

In [7]:
rac = RetrievalAugmentedChat("db/", collection_name, database_top_n_results, chat_model)

## Run retrieval augmented chat
Notice that the responses are empty. In several iterations with the model it either doesn't run the inference very long or just spits out an identical copy of the input without summarization.

In [14]:
rac.markdown_chat("How do I loop a GIF?")



 **Reference documents:** 

* [corpus/imagemagick/set-a-gif-to-loop.md](corpus/imagemagick/set-a-gif-to-loop.md) distance: 0.97
* [corpus/imagemagick/compress-animated-gif.md](corpus/imagemagick/compress-animated-gif.md) distance: 1.28

**Inference time in seconds 0.2102**


In [15]:
rac.markdown_chat("Can I use npx with GitHub actions?")



 **Reference documents:** 

* [corpus/github-actions/npm-cache-with-npx-no-package.md](corpus/github-actions/npm-cache-with-npx-no-package.md) distance: 0.88
* [corpus/github-actions/attach-generated-file-to-release.md](corpus/github-actions/attach-generated-file-to-release.md) distance: 1.11

**Inference time in seconds 0.1733**
