# fastRAG Overview

## Lexical Search

In [1]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store=document_store)

In [2]:
from haystack.schema import Document

# 3 example documents to index
examples = [
    "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.",
    "Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line.",
    "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 2021, the band shared details of their debut studio album, New Long Leg. They also shared the single 'Strong Feelings'.[9] The album, which was produced by John Parish, was released on 2 April 2021.[10]"
]
# examples = [
#     "There is a blue house on Oxford street",
#     "Paris is the capital of France",
#     "fastRAG had its first commit in 2022"
# ]

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store.write_documents(documents)

Updating BM25 representation...: 100%|██████████| 3/3 [00:00<00:00, 16152.65 docs/s]


In [10]:
res = retriever.retrieve("Who is the father of Arya Stark?")
[doc.content for doc in res]

['Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line.',
 "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic o

## Enhancing the Pipeline with Re-ranker and Reader

We can add a SOTA ranker [ColBERT](https://github.com/IntelLabs/fastRAG/blob/main/models.md#colbert-v2-with-plaid-engine) to do token-wise max-similarity, and [FiD](https://github.com/IntelLabs/fastRAG/blob/main/models.md#colbert-v2-with-plaid-engine) reader that processes all documents in parallel and generates an answer.

In [5]:
from fastrag.readers import FiDReader
from fastrag.rankers import ColBERTRanker

retriever = BM25Retriever(document_store=document_store)
reranker = ColBERTRanker(checkpoint_path="Intel/ColBERT-NQ")
reader = FiDReader(model_name_or_path="Intel/fid_flan_t5_base_nq", num_beams=1, min_length=2, max_length=50, use_gpu=False)

[07/17/2023 09:03:23] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0




  return self.fget.__get__(instance, owner)()


[Jul 17, 09:03:25] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...


[07/17/2023 09:03:28] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0
[07/17/2023 09:03:28] {FiD.py:108} INFO - tokenizer max length is:256
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


### Defining a Pipeline

In [6]:
from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reranker, name="Reranker", inputs=["Retriever"])
p.add_node(component=reader, name="Reader", inputs=["Reranker"])
p.save_to_yaml('my_new_pipeline.yaml')
p.get_config()
# p = p.load_from_yaml('my_new_pipeline.yaml')

({'components': [{'name': 'InMemoryDocumentStore',
    'type': 'InMemoryDocumentStore',
    'params': {'use_gpu': False, 'use_bm25': True}},
   {'name': 'Retriever',
    'type': 'BM25Retriever',
    'params': {'document_store': 'InMemoryDocumentStore'}},
   {'name': 'Reranker',
    'type': 'ColBERTRanker',
    'params': {'checkpoint_path': 'Intel/ColBERT-NQ'}},
   {'name': 'Reader',
    'type': 'FiDReader',
    'params': {'model_name_or_path': 'Intel/fid_flan_t5_base_nq',
     'min_length': 2,
     'max_length': 50,
     'use_gpu': False}}],
  'pipelines': [{'name': 'query',
    'nodes': [{'name': 'Retriever', 'inputs': ['Query']},
     {'name': 'Reranker', 'inputs': ['Retriever']},
     {'name': 'Reader', 'inputs': ['Reranker']}]}],
  'version': '1.17.2'},
 None)

Run queries using the pipeline:

In [11]:
res = p.run(query="Who created Python?")
print(res['answers'][0].answer)
res = p.run(query="Who is the father of Arya Stark??")
print(res['answers'][0].answer)



Gudo van Rossum




Eddard Stark


# Semantic Search using Qdrant Vector Store

This is an example of using the [Qdrant](https://qdrant.tech/) vector store with fastRAG. This is done using the dependency `qdrant_haystack` and `qdrant_client` python connector. We assume you have a running server, e.g. by calling `docker run -p 6333:6333 qdrant/qdrant` locally. 

Two important settings are the dimension of the vectors and HNSW parameters. Qdrant uses HNSW index for faster search, with a tradeoff between accuracy and latency. In general, higher numbers mean better accuracy, lower latency and larger RAM usage. 

The parameters are specified when connecting the server and creating a new index; these cannot be changed after the index was created. 

In [12]:
from fastrag.stores import QdrantDocumentStore
import numpy as np

In [13]:
dim = 384
index_name = "fastrag_demo_index"

## Creating a Qdrant Index

Need to specify the location of the Qdrant service, vector dimension, index name, similarity metric and optionally the HNSW configuration. 

In [45]:
q = QdrantDocumentStore(":memory:",
                        embedding_dim=dim,
                        timeout=60,
                        index=index_name,
                        embedding_field="embedding",
                        hnsw_config={"m": 128, "ef_construct": 100},
                        similarity='dot_product',
                        recreate_index=True)

## Insertion and Searching of Documents

We'll create a few documents; they must have an `id`, `content` and `embedding` keys but could contain more data such as text titles. 

In [46]:
# docs = [{"id": 1, "content": "I like to go to the beach", "embedding": np.ones(dim)},
#         {"id": 2, "content": "Where is my hat?", "embedding": np.ones(dim) * 2},
#         {"id": 3, "content": "GPT4 is very nice", "embedding": np.ones(dim) * 3},]
docs = [{"id": 1, "content": examples[0], "embedding": np.ones(dim)},
        {"id": 2, "content": examples[1], "embedding": np.ones(dim) * 2},
        {"id": 3, "content": examples[2], "embedding": np.ones(dim) * 3},]

Writing the documents to index with batching; deduplication of documents is on by default. 

In [47]:
q.write_documents(docs, index_name, batch_size=500)

500it [00:00, 127030.83it/s]         


In [48]:
q.get_document_count(index=index_name)

3

## Query by embedding
Need to provide a vector and `top_k` value. In general can also query by text search which we won't show here. 

In [38]:
q.query_by_embedding(np.ones(dim), top_k=1, index=index_name)

[<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 202

## Adding an Embedder

In [39]:
from haystack.nodes import EmbeddingRetriever

embedder = EmbeddingRetriever(embedding_model="cross-encoder/ms-marco-MiniLM-L-12-v2", document_store=q)

[07/17/2023 09:37:45] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0
[07/17/2023 09:37:46] {dense.py:1552} INFO - Init retriever using embeddings of model cross-encoder/ms-marco-MiniLM-L-12-v2
[07/17/2023 09:37:46] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0
[07/17/2023 09:37:46] {language_model.py:900} INFO -  * LOADING MODEL: 'cross-encoder/ms-marco-MiniLM-L-12-v2' (Bert)
[07/17/2023 09:37:46] {language_model.py:977} INFO - Auto-detected model language: english
[07/17/2023 09:37:46] {language_model.py:913} INFO - Loaded 'cross-encoder/ms-marco-MiniLM-L-12-v2' (Bert model) from model hub.
[07/17/2023 09:37:46] {processor.py:1880} INFO - Initialized processor without tasks. Supply `metric` and `label_list` to the constructor for using the default task or add a custom task later via processor.add_task()
[07/17/2023 09:37:46] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0


### Embed all the documents in the index with the Dense Model

In [49]:
q.update_embeddings(embedder)

[07/17/2023 09:38:40] {qdrant.py:358} INFO - Updating embeddings for 3 docs...
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.44 Batches/s]
Updating embeddings: 32 Docs [00:00, 76.43 Docs/s]           


In [54]:
embedder.retrieve("Band from England?")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.56 Batches/s]


[<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 202

### Define a pipeline

from haystack import Pipeline

p = Pipeline()

In [53]:
from haystack import Pipeline

p = Pipeline()
p.add_node(component=embedder, name="Retriever", inputs=["Query"])

p.run("Band from England?")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.03 Batches/s]


{'documents': [<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] I

## Adding a LLM

### Define a Prompt Template

In [55]:
import torch
from haystack.nodes import  PromptNode, PromptTemplate
prompt_template = PromptTemplate(name="qa",
                             prompt_text="Answer the question using the provided context. Your answer should be in your own words and be no longer than 2 or 3 sentences.\n\n### Instruction:\n {query}\n\n### Input:\n{join(documents)}\n\n### Response:",
                             output_parser={"type": "AnswerParser"}) 


### Load a LLM and attach the prompt template

In [56]:
prompt = PromptNode(model_name_or_path="MBZUAI/LaMini-Flan-T5-783M", default_prompt_template=prompt_template,
                    model_kwargs={"model_max_length": 2048, "torch_dtype": torch.bfloat16}, use_gpu=False)

[07/17/2023 09:39:23] {utils.py:130} INFO - Using devices: CPU - Number of GPUs: 0
[07/17/2023 09:39:23] {hugging_face.py:130} INFO - Using model input kwargs {'torch_dtype': torch.bfloat16} in HFLocalInvocationLayer


Additional implementations of LLM can be defined by other `InvocationLayer` implementations.

See: https://github.com/deepset-ai/haystack/tree/v1.17.2/haystack/nodes/prompt/invocation_layer

In [57]:
from haystack import Pipeline
p = Pipeline()
p.add_node(component=embedder, name="Retriever", inputs=["Query"])
p.add_node(component=prompt, name="prompt_node", inputs=["Retriever"])

In [61]:
res = p.run("Band from England?", params={"Retriever": {"top_k": 1}, "prompt_node": {"generation_kwargs": {"max_new_tokens": 50, "do_sample": True}}})

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.80 Batches/s]
Both `max_new_tokens` (=50) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In [62]:
print("Prompt: " + res['answers'][0].meta['prompt'])
print("---")
print("Answer: " + res['answers'][0].answer)

Prompt: Answer the question using the provided context. Your answer should be in your own words and be no longer than 2 or 3 sentences.

### Instruction:
 Band from England?

### Input:
Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 1