# SQUAD 2.0 - Question answering 

## Open Domain - Retriever-reader architecture
In this notebook, we will implement a simple question answering system using the retriever-reader architecture which is something like this:

![photo](https://d3i71xaburhd42.cloudfront.net/2fe7dba5a58aee5156594b4d78634ecd6c7dcabd/2-Figure1-1.png)

[photo source](https://www.semanticscholar.org/paper/End-to-End-Open-Domain-Question-Answering-with-Yang-Xie/2fe7dba5a58aee5156594b4d78634ecd6c7dcabd)


## 1. Index contexts

In [1]:
from haystack.document_stores import ElasticsearchDocumentStore

2023-02-23 21:48:20.911124: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [23]:
document_store = ElasticsearchDocumentStore(
    host='localhost',
    username='', password='',
    index='squad2'
)

### Load data

In [25]:
import pickle as pkl
data = pkl.load(open('data/train-df.pkl', 'rb'))

In [26]:
data = data.rename(columns={'context': 'content'})
data['meta'] = data[['question', 'answer']].to_dict('records')
data.drop(['question', 'answer'], axis=1, inplace=True)

In [27]:
docs = data.to_dict("records")

### Index data

In [29]:
document_store.write_documents(docs, index='squad2')

### Configure dense retriever

In [32]:
from haystack.nodes import DensePassageRetriever

retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model='facebook/dpr-question_encoder-single-nq-base',
    passage_embedding_model='facebook/dpr-ctx_encoder-single-nq-base'
)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.


### Create embeddings

In [33]:
document_store.update_embeddings(retriever=retriever)

Updating embeddings:   0%|          | 0/442 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/448 [00:00<?, ? Docs/s]

## Create reader

In [35]:
from haystack.nodes import FARMReader


In [50]:
model = "deepset/roberta-base-squad2"
reader = FARMReader(model, use_gpu=False)

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/496M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

## QA pipeline

In [51]:
from haystack.pipelines import ExtractiveQAPipeline

qa = ExtractiveQAPipeline(reader=reader, retriever=retriever)

In [52]:
qa.run(query='How many people watched the Apollo 11 landing?', params={"Retriever": {"top_k": 3}, "Reader": {"top_k": 1}})

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

{'query': 'How many people watched the Apollo 11 landing?',
 'no_ans_gap': 10.19414234161377,
 'answers': [<Answer {'answer': 'over 500 million', 'type': 'extractive', 'score': 0.6901662349700928, 'context': "rld's population. The landing of Apollo 11 was an event watched by over 500 million people around the world and is widely recognized as one of the def", 'offsets_in_document': [{'start': 931, 'end': 947}], 'offsets_in_context': [{'start': 67, 'end': 83}], 'document_id': '7fd2d8c6a839512ffe0cfcb090d9e34c', 'meta': {'question': 'How many people watched the Apollo 11 landing?', 'answer': '500 million'}}>],
 'documents': [<Document: {'content': "The Space Age is a period encompassing the activities related to the Space Race, space exploration, space technology, and the cultural developments influenced by these events. The Space Age began with the development of several technologies that culminated with the launch of Sputnik 1 by the Soviet Union. This was the world's first artificial 