# A simple open-domain QA pipeline

Below we will demonstrate how to build an open-domain QA pipeline using the unique components from fastRAG. 

We will use a simple `BM25Retriever` retriever, a neural re-ranker (based on SBERT)  `SentenceTransformersRanker` model and a `Fusion-in-Decoder` model to generate answers given the retrieved evidence. 

## Build a local in-memory index and store sample documents

In [1]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)

[2023-09-27 09:25:21,124] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)


In [2]:
from haystack.schema import Document

# 3 example documents to index
examples = [
    "There is a blue house on Oxford street",
    "Paris is the capital of France",
    "fastRAG had its first commit in 2022"
]

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store.write_documents(documents)

Updating BM25 representation...: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 41255.45 docs/s]


## Initialize the pipeline components

Initialize the components we are going to use in our pipeline.

In [4]:
from haystack.nodes import BM25Retriever, SentenceTransformersRanker

# define a BM25 retriever, ST re-ranker and FiD reader based on a local model
retriever = BM25Retriever(document_store=document_store)
reranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")

[09/27/2023 09:25:31] {utils.py:130} INFO - Using devices: CUDA:0, CUDA:1, CUDA:2, CUDA:3, CUDA:4, CUDA:5, CUDA:6, CUDA:7 - Number of GPUs: 8


Downloading (…)lve/main/config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


Downloading (…)okenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [5]:
from fastrag.prompters.invocation_layers import fid 
from haystack.nodes import PromptModel
from haystack.nodes.prompt.prompt_template import PromptTemplate
from haystack.nodes.prompt import PromptNode
import torch

PrompterModel = PromptModel(
    model_name_or_path= "Intel/fid_flan_t5_base_nq",
    use_gpu= True,
    invocation_layer_class=fid.FiDHFLocalInvocationLayer,
    model_kwargs= dict(
        model_kwargs= dict(
            device_map= {"": 0},
            torch_dtype  = torch.bfloat16,
            do_sample=False
        ),
        generation_kwargs=dict(
            max_length=10
        )
    )
)

reader = PromptNode(
    model_name_or_path= PrompterModel,
    default_prompt_template=PromptTemplate("{query}")
)

[09/27/2023 09:25:51] {utils.py:130} INFO - Using devices: CUDA:0 - Number of GPUs: 1
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
The model 'FusionInDecoderForConditionalGeneration' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaFor

## Create a pipeline

In [6]:
from haystack import Pipeline

p = Pipeline()

### Add the components in the right order

In [7]:
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reranker, name="Reranker", inputs=["Retriever"])
p.add_node(component=reader, name="Reader", inputs=["Reranker"])

### Run a query through the pipeline

In [8]:
res = p.run(query="What is Paris?")

### Display the answer

In [10]:
res['results'][0]

'the capital of France'