# Show-Us-Your-RAG-Skillz
## Alessandro Corvi

 In order to develop a fast and reliable RAG which impersonates a fast-foot cashier some design choices had to be made. The most impactful one is for sure to use the LLM framework `haystack` in order to facilitate the whole process and obstain a more scalable product.

In [1]:
%%capture
! pip install farm-haystack[inference]

The choice of the model also has a great impact over the system performance and accuracy. Mixtral-8x7B-Instruct-v0.1 will be used thanks to its fenomenal scores when compared even to a bigger model such as LLama 13B.

In order to obtain the model an huggingface token is required.

In [2]:
from haystack.nodes import PromptNode
from getpass import getpass

In [3]:
HF_TOKEN = getpass("Please input your HF access token: ")

Please input your HF access token: ··········


In [4]:
pn = PromptNode(model_name_or_path="mistralai/Mixtral-8x7B-Instruct-v0.1",
                max_length=800,
                api_key=HF_TOKEN)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Load and prepare data

After loading `menu.json` some work has to be done. The json structures when parsed creates a nested dict which if left untouched would result in extremely poor accuracy from the LLM.
In order to avoid this issue we expand the json hierarchy by splitting it in more focused instances to then create numerical vector representations. By using the embedding model `bert` the information inside the json are transformed into vectors in order for the LLM to maximise semantic understanding by using such a structure.

#### Note:
Even if the vector database created is not directly maintanable here it can be interacted with via the StreamLit application.

In [5]:
import json
from haystack import Document
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import EmbeddingRetriever, PromptTemplate

with open('menu.json', 'r') as file:
    menu_data = json.load(file)

def create_description(item_name, details):
    description = f"{item_name}: "
    if isinstance(details, list):
        description += f"{details[0]}, Price: {details[1]}, "
        if len(details) > 2 and isinstance(details[2], dict):
            for key, value in details[2].get("nutritionalInfo", {}).items():
                description += f"{key}: {value}, "
            description += f"Available: {details[2]['available']}"
    elif isinstance(details, dict):
        description += f"Name: {details.get('name', '')}, Price: {details.get('price', '')}"
        if 'contents' in details:
            description += ", Contents: ["
            for item in details['contents']:
                if isinstance(item, list):
                    description += f"({item[0]}, {item[1]}), "
                elif isinstance(item, dict):
                    description += f"({item.get('from', '')}, Size: {item.get('size', '')}), "
            description = description.rstrip(", ")
            description += "]"
    return description

documents = []
for category, items in menu_data.items():
    for item_name, details in items.items():
        description = create_description(item_name, details)
        documents.append(Document(content=description))

document_store = InMemoryDocumentStore()
document_store.write_documents(documents)

retriever = EmbeddingRetriever(document_store=document_store, embedding_model="deepset/sentence_bert",progress_bar=False)

document_store.update_embeddings(retriever)

  return self.fget.__get__(instance, owner)()
Updating Embedding:   0%|          | 0/91 [00:00<?, ? docs/s]
Inferencing Samples:   0%|          | 0/3 [00:00<?, ? Batches/s][A
Inferencing Samples:  33%|███▎      | 1/3 [00:02<00:04,  2.35s/ Batches][A
Inferencing Samples:  67%|██████▋   | 2/3 [00:03<00:01,  1.67s/ Batches][A
Inferencing Samples: 100%|██████████| 3/3 [00:04<00:00,  1.48s/ Batches]
Documents Processed: 10000 docs [00:04, 2201.40 docs/s]


The LLM and a pipeline for it are then created and customised.

In [6]:
qa_template = PromptTemplate(prompt=
  """<s>[INST] You are working as a drive-in cashier at a fast-food restaurant act like such, be accomodating and be careful about what you are asked. Using only the information contained in the context, answer the question. If the question is not fast-food related remember the customer your function.
  If the answer cannot be deduced from the context, answer \"I don't know.\"
  Context: {join(documents)};
  Question: {query}
  [/INST]""")

In [7]:
prompt_node = PromptNode(model_name_or_path="mistralai/Mixtral-8x7B-Instruct-v0.1",
                         api_key=HF_TOKEN,
                         default_prompt_template=qa_template,
                         max_length=5500,
                         model_kwargs={"model_max_length":8000})

In [8]:
from haystack import Pipeline

rag_pipeline = Pipeline()
rag_pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])
rag_pipeline.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

In [9]:
from pprint import pprint
print_answer = lambda out: pprint(out["results"][0].strip())

### Here queries can be run:

In [13]:
import time

start_time = time.time()
answer=rag_pipeline.run(query="How many calories does the Colonel have?")
end_time = time.time()

print_answer(answer)
total_time = end_time - start_time
print("Execution time: " + str(total_time*1000) + "ms")
# Note execution time fluctuates when the application has just been launched.

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.32 Batches/s]


("I'm sorry, I need a bit more information to answer your question. The menu "
 'items listed have calorie information, but "Colonel" could refer to multiple '
 'items such as the Colonel Stacker, Colonel Burger, or even a meal that '
 'includes one of those items. Could you please clarify which "Colonel" item '
 "you're interested in? That way, I can provide a more accurate answer. Thank "
 'you!')
Execution time: 351.99642181396484ms
