# Retriever and Reader
In this notebook we implement the dense passage retriever, that returns the most probable document and passage for answering the question. The reader, on the other hand, processes the passage, in order to extract the specific answer to the question

---

In [2]:
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import DensePassageRetriever, ElasticsearchRetriever
import os
from haystack.nodes import FARMReader,TransformersReader
from haystack.pipelines import ExtractiveQAPipeline

# Retriever

<b>ds_astronomy:</b> document store with processed data 

In [4]:
host = os.environ.get("ELASTICSEARCH_HOST", "localhost")

ds_astronomy = ElasticsearchDocumentStore(
    host=host,
    username="",
    password="",
    index="ds_astronomy"
)

curr_store = ds_astronomy
localhost = 'http://localhost:9200/ds_astronomy/_count'

In [30]:
# TF-IDF Retriever
retriever = DensePassageRetriever(
    document_store=ds_astronomy,
    query_embedding_model='facebook/dpr-question_encoder-single-nq-base',
    passage_embedding_model='facebook/dpr-ctx_encoder-single-nq-base',
    use_gpu=True,
    embed_title=True
)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.


Update the document store with the embbedings of the previously stored passages

In [8]:
ds_astronomy.update_embeddings(retriever=retriever)

Updating embeddings:   0%|          | 0/271 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/272 [00:00<?, ? Docs/s]

In [38]:
# BM25 Retriever
retriever = ElasticsearchRetriever(
    ds_astronomy
)



---

<h1>Reader</h1>

In [9]:
roberta = "deepset/roberta-base-squad2"

reader = FARMReader(
    model_name_or_path=roberta, 
    use_gpu=True, 
    return_no_answer=True, 
    no_ans_boost=0, 
    top_k=5
)

---

<h3>Questions</h3>

In [32]:
query1 = 'What is dark matter?'
query2 = 'When was the theory of inflation developed?'
query3 = 'What size is the smallest black hole?'
query4 = 'What is the mass of red dwarfs'
query5 = 'How massive are red dwarfs'

In [74]:
pipeline = ExtractiveQAPipeline(reader, retriever)

result = pipeline.run(query=query1)

Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.39 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.90 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.05 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.06s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.88 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.79 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.73 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.08 Batches/s]
Inferencing Samples: 100%|██████████████

In [75]:
search = True
idx = 0
while search:
    answer = str(result['answers'][idx])
    answer = answer.split("answer=")[1]
    answer = answer.split("score=")[0]
    
    if answer.count(' ') <= 1:
        idx += 1
    else:
        search = False
    
print(f'Q: {result["query"]}\nA: {answer}')

Q: What is dark matter?
A: 'an opportunity to learn more about the fundamental order of the Universe', 
