In [29]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [1]:
!git clone https://github.com/IntelLabs/fastRAG.git


Cloning into 'fastRAG'...
remote: Enumerating objects: 502, done.[K
remote: Counting objects: 100% (52/52), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 502 (delta 32), reused 30 (delta 23), pack-reused 450[K
Receiving objects: 100% (502/502), 1.12 MiB | 24.88 MiB/s, done.
Resolving deltas: 100% (172/172), done.


In [None]:
!pip install -e fastRAG
!pip install -e fastRAG[qdrant]
!pip install -e fastRAG/libs/colbert


In [6]:
!pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -U transformers

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
  Downloading https://download.pytorch.org/whl/cu118/torch-2.0.1%2Bcu118-cp310-cp310-linux_x86_64.whl (2267.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 GB[0m [31m453.6 kB/s[0m eta [36m0:00:00[0m
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu118/torchvision-0.15.2%2Bcu118-cp310-cp310-linux_x86_64.whl (6.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.1/6.1 MB[0m [31m67.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 2.0.0
    Uninstalling torch-2.0.0:
      Successfully uninstalled torch-2.0.0
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.15.1
    Uninstalling torchvision-0.15.1:
      Successfully uninstalled torchvision-0.15.1
[31mERROR: pip's dependency resolver does not currently 

# fastRAG Overview

## Lexical Search

In [1]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)

In [2]:
from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store=document_store)

In [3]:
from haystack.schema import Document

# 3 example documents to index
examples = [
    "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace.",
    "Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line.",
    "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 2021, the band shared details of their debut studio album, New Long Leg. They also shared the single 'Strong Feelings'.[9] The album, which was produced by John Parish, was released on 2 April 2021.[10]"
]

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store.write_documents(documents)

Updating BM25 representation...:   0%|          | 0/3 [00:00<?, ? docs/s]

In [4]:
res = retriever.retrieve("Who is the father of Arya Stark?")
[doc.content for doc in res]

['Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line.',
 "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic o

## Enhancing the Pipeline with Re-ranker and Reader

We can add a SOTA ranker [ColBERT](https://github.com/IntelLabs/fastRAG/blob/main/models.md#colbert-v2-with-plaid-engine) to do token-wise max-similarity, and [FiD](https://github.com/IntelLabs/fastRAG/blob/main/models.md#colbert-v2-with-plaid-engine) reader that processes all documents in parallel and generates an answer.

In [5]:
from fastrag.readers import FiDReader
from fastrag.rankers import ColBERTRanker

retriever = BM25Retriever(document_store=document_store)
reranker = ColBERTRanker(checkpoint_path="Intel/ColBERT-NQ")
reader = FiDReader(model_name_or_path="Intel/fid_flan_t5_base_nq", num_beams=1, min_length=2, max_length=50, use_gpu=False)



Downloading (…)lve/main/config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


Downloading (…)okenizer_config.json:   0%|          | 0.00/348 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading (…)ve/main/spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/495M [00:00<?, ?B/s]

### Defining a Pipeline

In [6]:
from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reranker, name="Reranker", inputs=["Retriever"])
p.add_node(component=reader, name="Reader", inputs=["Reranker"])
p.save_to_yaml('my_new_pipeline.yaml')
p.get_config()

{'components': [{'name': 'InMemoryDocumentStore',
   'type': 'InMemoryDocumentStore',
   'params': {'use_gpu': False, 'use_bm25': True}},
  {'name': 'Retriever',
   'type': 'BM25Retriever',
   'params': {'document_store': 'InMemoryDocumentStore'}},
  {'name': 'Reranker',
   'type': 'ColBERTRanker',
   'params': {'checkpoint_path': 'Intel/ColBERT-NQ'}},
  {'name': 'Reader',
   'type': 'FiDReader',
   'params': {'model_name_or_path': 'Intel/fid_flan_t5_base_nq',
    'min_length': 2,
    'max_length': 50,
    'use_gpu': False}}],
 'pipelines': [{'name': 'query',
   'nodes': [{'name': 'Retriever', 'inputs': ['Query']},
    {'name': 'Reranker', 'inputs': ['Retriever']},
    {'name': 'Reader', 'inputs': ['Reranker']}]}],
 'version': '1.17.2'}

Run queries using the pipeline:

In [7]:
res = p.run(query="Who created Python?")
print(res['answers'][0].answer)


Guido van Rossum


In [8]:
res = p.run(query="Who is the father of Arya Stark??")
print(res['answers'][0].answer)

Lord Eddard Stark


# Semantic Search using Qdrant Vector Store

This is an example of using the [Qdrant](https://qdrant.tech/) vector store with fastRAG. This is done using the dependency `qdrant_haystack` and `qdrant_client` python connector. We assume you have a running server, e.g. by calling `docker run -p 6333:6333 qdrant/qdrant` locally.

Two important settings are the dimension of the vectors and HNSW parameters. Qdrant uses HNSW index for faster search, with a tradeoff between accuracy and latency. In general, higher numbers mean better accuracy, lower latency and larger RAM usage.

The parameters are specified when connecting the server and creating a new index; these cannot be changed after the index was created.

In [9]:
from fastrag.stores import QdrantDocumentStore
import numpy as np

In [10]:
dim = 384
index_name = "fastrag_demo_index"

## Creating a Qdrant Index

Need to specify the location of the Qdrant service, vector dimension, index name, similarity metric and optionally the HNSW configuration.

In [11]:
q = QdrantDocumentStore(":memory:",
                        embedding_dim=dim,
                        timeout=60,
                        index=index_name,
                        embedding_field="embedding",
                        hnsw_config={"m": 128, "ef_construct": 100},
                        similarity='dot_product',
                        recreate_index=True)

## Insertion and Searching of Documents

We'll create a few documents; they must have an `id`, `content` and `embedding` keys but could contain more data such as text titles.

In [12]:
docs = [{"id": 1, "content": examples[0], "embedding": np.ones(dim)},
        {"id": 2, "content": examples[1], "embedding": np.ones(dim) * 2},
        {"id": 3, "content": examples[2], "embedding": np.ones(dim) * 3},]

Writing the documents to index with batching; deduplication of documents is on by default.

In [13]:
q.write_documents(docs, index_name, batch_size=500)

500it [00:00, 42287.26it/s]          


In [14]:
q.get_document_count(index=index_name)

3

## Query by embedding
Need to provide a vector and `top_k` value. In general can also query by text search which we won't show here.

In [15]:
q.query_by_embedding(np.ones(dim), top_k=1, index=index_name)

[<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 202

## Adding an Embedder

In [16]:
from haystack.nodes import EmbeddingRetriever

embedder = EmbeddingRetriever(embedding_model="cross-encoder/ms-marco-MiniLM-L-12-v2", document_store=q)

Downloading (…)lve/main/config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

### Embed all the documents in the index with the Dense Model

In [17]:
q.update_embeddings(embedder)

Updating embeddings:   0%|          | 0/3 [00:00<?, ? Docs/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Updating embeddings: 32 Docs [00:02, 14.46 Docs/s]


In [18]:
embedder.retrieve("Band from England?")

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

[<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 202

### Define a pipeline

In [19]:
from haystack import Pipeline

p = Pipeline()
p.add_node(component=embedder, name="Retriever", inputs=["Query"])

p.run("Band from England?")

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

{'documents': [<Document: {'content': "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] I

## Adding a LLM

### Define a Prompt Template

In [20]:
import torch
from haystack.nodes import  PromptNode, PromptTemplate
prompt_template = PromptTemplate(name="qa",
                             prompt_text="Answer the question using the provided context. Your answer should be in your own words and be no longer than 2 or 3 sentences.\n\n### Instruction:\n {query}\n\n### Input:\n{join(documents)}\n\n### Response:",
                             output_parser={"type": "AnswerParser"})


### Load a LLM and attach the prompt template

In [24]:
prompt = PromptNode(model_name_or_path="MBZUAI/LaMini-Flan-T5-783M", default_prompt_template=prompt_template,
                    model_kwargs={"model_max_length": 2048, "torch_dtype": torch.bfloat16}, use_gpu=True)

Additional implementations of LLM can be defined by other `InvocationLayer` implementations.

See: https://github.com/deepset-ai/haystack/tree/v1.17.2/haystack/nodes/prompt/invocation_layer

In [26]:
from haystack import Pipeline
p = Pipeline()
p.add_node(component=embedder, name="Retriever", inputs=["Query"])
p.add_node(component=prompt, name="prompt_node", inputs=["Retriever"])

In [43]:
res = p.run("What was Dry Cleaning's debut single", params={"Retriever": {"top_k": 1}, "prompt_node": {"generation_kwargs": {"max_new_tokens": 100, "do_sample": False, "temperature": 1.0}}})

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Both `max_new_tokens` (=100) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In [44]:
print("Prompt: " + res['answers'][0].meta['prompt'])
print("---")
print("Answer: " + res['answers'][0].answer)

Prompt: Answer the question using the provided context. Your answer should be in your own words and be no longer than 2 or 3 sentences.

### Instruction:
 What was Dry Cleaning's debut single

### Input:
Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as