# Retriever Reader Pipeline

We've setup our Elasticsearch instance and are now ready to prepare the remainder of our ODQA pipeline - our *retriever* and *reader*.

First we initialize the connection to our Elasticsearch document store.

In [1]:
from haystack.document_stores.elasticsearch import ElasticsearchDocumentStore

doc_store = ElasticsearchDocumentStore(
    host='localhost',
    username='', password='',
    index='aiqrunbook'
)

  from .autonotebook import tqdm as notebook_tqdm


Then we initialize our retriever and reader models. We will be using [deepset/bert-base-cased-squad2](https://huggingface.co/deepset/bert-base-cased-squad2) as our reader model.

In [2]:
from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(document_store=doc_store, model_format="sentence_transformers", embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1",)
doc_store.update_embeddings(retriever)

Downloading (…)16ebc/.gitattributes: 100%|██████████| 737/737 [00:00<00:00, 236kB/s]
Downloading (…)_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 130kB/s]
Downloading (…)b6b5d16ebc/README.md: 100%|██████████| 8.65k/8.65k [00:00<00:00, 4.58MB/s]
Downloading (…)b5d16ebc/config.json: 100%|██████████| 571/571 [00:00<00:00, 340kB/s]
Downloading (…)ce_transformers.json: 100%|██████████| 116/116 [00:00<00:00, 81.0kB/s]
Downloading (…)ebc/data_config.json: 100%|██████████| 25.5k/25.5k [00:00<00:00, 1.25MB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████| 438M/438M [00:07<00:00, 61.3MB/s] 
Downloading (…)nce_bert_config.json: 100%|██████████| 53.0/53.0 [00:00<00:00, 10.8kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 165kB/s]
Downloading (…)16ebc/tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 5.09MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 363/363 [00:00<00:00, 261kB/s]
Downloading (…)6ebc/train_script.py: 100%|██

And now we initialize our ODQA pipeline.

In [3]:
from haystack.nodes import FARMReader
reader = FARMReader(model_name_or_path="deepset/tinyroberta-squad2", use_gpu=True)

Downloading (…)lve/main/config.json: 100%|██████████| 835/835 [00:00<00:00, 643kB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████| 326M/326M [00:05<00:00, 57.3MB/s] 
Downloading (…)okenizer_config.json: 100%|██████████| 383/383 [00:00<00:00, 260kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 6.68MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 5.47MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 9.85MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 190kB/s]


In [None]:
And now we initialize our ODQA pipeline.

In [4]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader=reader, retriever=retriever)

Now we can begin asking questions!

In [6]:
from haystack.utils import print_answers
prediction = pipe.run(query="what are the Possible Impact for Logs Missing In Teletraan?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})
print_answers(prediction)



Batches: 100%|██████████| 1/1 [00:00<00:00,  3.22it/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.92s/ Batches]


Query: what are the Possible Impact for Logs Missing In Teletraan?
Answers:
[   <Answer {'answer': 'All transient logs shall be lost and does not appear in Kibana', 'type': 'extractive', 'score': 0.6503626108169556, 'context': '1) There shall be no logs sent to Teletraan via this node. All transient logs shall be lost and does not appear in Kibana.', 'offsets_in_document': [{'start': 59, 'end': 121}], 'offsets_in_context': [{'start': 59, 'end': 121}], 'document_ids': ['a40e81afa7e77d547f1d45221915692e'], 'meta': {'source': 'aiqrunbook'}}>,
    <Answer {'answer': 'an overhead on resources and waiting for CPU cycles', 'type': 'extractive', 'score': 0.1814725399017334, 'context': 'd Free memory available \non instance. It could be an overhead on resources and waiting for CPU cycles.\nMonitoring and other troubleshooting\nCheck Kiba', 'offsets_in_document': [{'start': 1843, 'end': 1894}], 'offsets_in_context': [{'start': 50, 'end': 101}], 'document_ids': ['d9f03760e8c25c2bc328ce4f1f620773




In [8]:
from haystack.utils import print_answers
prediction = pipe.run(query="what are the  Hazelcast Database in Production environment fails?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})
print_answers(prediction)


Batches: 100%|██████████| 1/1 [00:00<00:00,  3.29it/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.39s/ Batches]


Query: what are impact of Logs missing in ONE region?
Answers:
[   <Answer {'answer': 'AIQ Production', 'type': 'extractive', 'score': 0.08877314627170563, 'context': '1.3  AIQ Production - Logs Missing In Teletraan In One Region  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .', 'offsets_in_document': [{'start': 5, 'end': 19}], 'offsets_in_context': [{'start': 5, 'end': 19}], 'document_ids': ['23b984665cfae131314a6de6dd231fe3'], 'meta': {'source': 'aiqrunbook'}}>,
    <Answer {'answer': 'latency factor', 'type': 'extractive', 'score': 0.0048276325687766075, 'context': 'Kibana on the other region shall contain the logs pertinent to datacenter. However, there could be a latency factor, since they are in-time writes ', 'offsets_in_document': [{'start': 101, 'end': 115}], 'offsets_in_context': [{'start': 101, 'end': 115}], 'document_ids': ['87c851bc810d6def0b453134831667fa'], 'meta': {'source': 'aiqrunbook'}}>,
    <Answer {'answer': 'AIQ Productio




In [6]:
qa.run(query='What are Einsteins daughter name?',
       params={"Reader": {"top_k": 3}})

Inferencing Samples: 100%|██████████| 1/1 [00:04<00:00,  4.67s/ Batches]


{'query': 'What are Einsteins daughter name?',
 'no_ans_gap': 11.428391933441162,
 'answers': [<Answer {'answer': 'Lieserl', 'type': 'extractive', 'score': 0.9841803312301636, 'context': 'Comcast Confidential - Any unauthorized disclosure or use is strictly prohibited\nComcast Confidential - Any unauthorized disclosure or use is strictly prohibitedAlbert Einstein and Einstein, 1912\nEarly correspondence between Einstein and Mari was discovered and published in 1987 which revealed that the couple had a daughter named , \xa0"Lieserl"\nborn in early 1902 in where Mari was staying with her parents. Mari returned to Switzerland without the child, whose real name and fate are \xa0Novi Sad\xa0\nunknown. The contents of Einstein\'s letter in September 1903 suggest that the girl was either given up for adoption or died of in infancy. \xa0scarlet fever \xa0[48][49]\nEinstein and Mari married in January 1903. In May 1904, their son was born in , Switzerland. Their son was born in \xa0Hans Albert 

In [11]:
qa.run(query='what is Column Based database?',
       params={"Reader": {"top_k": 3}})

Inferencing Samples: 100%|██████████| 1/1 [00:08<00:00,  8.61s/ Batches]


{'query': 'what is Column Based database?',
 'no_ans_gap': -1.461627721786499,
 'answers': [<Answer {'answer': 'RDBMS', 'type': 'extractive', 'score': 0.25249844789505005, 'context': 'Comcast Confidential - Any unauthorized disclosure or use is strictly prohibited\nComcast Confidential - Any unauthorized disclosure or use is strictly prohibitedType Column Based\nThis model looks at first glance at a table in \nRDBMS except that with a column-oriented NoSQL \nDB, the number of columns is dynamic. Indeed, in \na relational table, the number of columns is fixed \nfrom the creation of the table schema and this \nnumber remains the same for all the records in this \nThis model is based on the key-value paradigm. \nThe value, in this case, is a JSON or XML \ndocument. The advantage is to be able to \nretrieve, via a single key, a hierarchically \nstructured information set. The same operation \nin the relational world would involve several \njoins.Document +Key / Value Based\nKey-Value: \xa0

In [8]:


qa.run(query='What position does Harry play on the Gryffindor Quidditch team?',
       params={"Reader": {"top_k": 3}})

Inferencing Samples: 100%|██████████| 3/3 [00:36<00:00, 12.02s/ Batches]


{'query': 'What position does Harry play on the Gryffindor Quidditch team?',
 'no_ans_gap': 15.92770516872406,
 'answers': [<Answer {'answer': 'Captain', 'type': 'extractive', 'score': 0.9925570487976074, 'context': 'osing the person she thought would annoy Ron the most. He accepted, but spent most of the evening boasting about himself, to her irritation. He was very forward with her under the mistletoe, prompting Hermione to abandon him to seek out Harry Potter and Luna Lovegood and to avoid him for the rest of the party. He was very annoyed when he was unable to find her. This indicates that while McLaggen was romantically interested in Hermione, she found him to be a rude braggart.\n\n\n===Harry Potter===\nAlthough they were in the same House, Harry strongly disliked McLaggen. The two first met in 1996 when they were both questioned by Professor Slughorn. They met again when McLaggen tried out for the Gryffindor Quidditch team, of which Harry was the Captain. It seemed that McLaggen