# A simple open-domain QA pipeline

Below we will demonstrate how to build an open-domain QA pipeline using the unique components from fastRAG. 

We will use a simple `BM25Retriever` retriever, a neural re-ranker (based on SBERT)  `SentenceTransformersRanker` model and a `Fusion-in-Decoder` model to generate answers given the retrieved evidence. 

## Build a local in-memory index and store sample documents

In [1]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)

In [2]:
from haystack.schema import Document

# 3 example documents to index
examples = [
    "There is a blue house on Oxford street",
    "Paris is the capital of France",
    "fastRAG had its first commit in 2022"
]

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store.write_documents(documents)

Updating BM25 representation...:   0%|          | 0/3 [00:00<?, ? docs/s]

## Initialize the pipeline components

Initialize the components we are going to use in our pipeline.

In [8]:
from haystack.nodes import BM25Retriever, SentenceTransformersRanker
from fastrag.readers import FiDReader

fid_model_path = None  ## change this to the local FID model
assert fid_model_path is not None, "Please change fid_model_path to the path of your trained FiD model"

# define a BM2% retriever, ST re-ranker and FiD reader based on a local model
retriever = BM25Retriever(document_store=document_store)
reranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2")
reader = FiDReader(model_name_or_path=fid_model_path, num_beams=1, min_length=2, max_length=50, use_gpu=False)

## Create a pipeline

In [9]:
from haystack import Pipeline

p = Pipeline()

### Add the components in the right order

In [10]:
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reranker, name="Reranker", inputs=["Retriever"])
p.add_node(component=reader, name="Reader", inputs=["Reranker"])

### Run a query through the pipeline

In [11]:
res = p.run(query="What is Paris?")



### Display the answer

In [12]:
res['answers'][0].answer

'the capital of France'