Let's try to implement a QA system based on a pipeline composed of retriever and reader. I will try to answer questions on "Harry Potter and The Sorcerer’s Stone" (HP).
I will preprocess the HP pdf utilizing the Haystack suite and then store the documents in elasticsearch.
First, I will use a normal sparse retriever and then a will try to apply a DPR to compare the results.

We start by setting up Haystack and Elasticsearch

In [2]:
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest master of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab,ocr]

!wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz
!tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin

Collecting pip
  Downloading pip-22.0.4-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 5.1 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.0.4
Collecting farm-haystack[colab,ocr]
  Cloning https://github.com/deepset-ai/haystack.git to /tmp/pip-install-2hyf6m_b/farm-haystack_61aa2ce21f4944a4b3d3c9c9e0332662
  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-install-2hyf6m_b/farm-haystack_61aa2ce21f4944a4b3d3c9c9e0332662
  Resolved https://github.com/deepset-ai/haystack.git to commit ae712fe6bf087c717f3e38e4e87d2347165fc12b
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting seqeval
  Downloading seqeval-1.2

In [3]:
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
    ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1)  # as daemon
)
# wait until ES has started
! sleep 30

In [4]:
# Connect to Elasticsearch

from haystack.document_stores import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")


INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.github.io/apex/
INFO - haystack.telemetry -  Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by calling disable_telemetry() or by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems on the documentation page. More information at https://haystack.deepset.ai/guides/telemetry


Let's preprocess our Harry Potter pdf - http://www.passuneb.com/elibrary/ebooks/Harry%20Potter%20and%20The%20Sorcerer%E2%80%99s%20Stone.pdf

In [5]:
# Here are the imports we need
from haystack.nodes import PDFToTextConverter,  PreProcessor
from haystack.utils import convert_files_to_docs

In [6]:
converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["en"])
doc_pdf = converter.convert(file_path="/content/Harry Potter and The Sorcerer’s Stone.pdf", meta=None)[0]

In [7]:
preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=False,
    split_by="word",
    split_length=100,
    split_respect_sentence_boundary=True,
)
docs = preprocessor.process([doc_pdf])
print(f"n_docs_input: 1\nn_docs_output: {len(docs)}")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


100%|██████████| 1/1 [00:00<00:00,  2.74docs/s]

n_docs_input: 1
n_docs_output: 887





Let's add the preprocessed docs into Elasticsearch

In [8]:
# Now, let's write the dicts containing documents to our DB.
document_store.write_documents(docs)

Now, I will proceed setting up the pipeline with the sparse retriever.

step 1: retriever - BM25 implemented by elasticsearch

In [9]:
from haystack.nodes import ElasticsearchRetriever

retriever = ElasticsearchRetriever(document_store=document_store)

step 2: reader - let's try with roberta-base-squad2. Suggested model in the docs

In [10]:
from haystack.nodes import FARMReader, TransformersReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

INFO - haystack.modeling.utils -  Using devices: CUDA
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find deepset/roberta-base-squad2 locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/473M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded deepset/roberta-base-squad2


Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

INFO - haystack.modeling.logger -  ML Logging is turned off. No parameters, metrics or artifacts will be logged to MLFlow.
INFO - haystack.modeling.utils -  Using devices: CUDA
INFO - haystack.modeling.utils -  Number of GPUs: 1
INFO - haystack.modeling.infer -  Got ya 2 parallel workers to do inference ...
INFO - haystack.modeling.infer -   0     0  
INFO - haystack.modeling.infer -  /w\   /w\ 
INFO - haystack.modeling.infer -  /'\   / \ 


For this first attempt, I will  leverage the ready-made pipeline ExtractiveQAPipeline

In [11]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)


Let's now ask some questions to our system

In [12]:
questions = [
             'Who is Dumbledore?', 
             "How is it called Harry's aunt?", 
             "what are the four houses names?", 
             "How is it called Harry's uncle?", 
             "who is Norbert"
             ]

In [13]:
from haystack.utils import print_answers
QA_set = {}
for q in questions:
  prediction = pipe.run(
    query=q, params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
  )
  QA_set[q] = [(answer.answer, answer.score) for answer in prediction['answers']]
  print_answers(prediction, details="minimum")

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.64 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 14.59 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 12.52 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 13.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.78 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.11 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.01 Batches/s]



Query: Who is Dumbledore?
Answers:
[   {   'answer': 'Albus Dumbledore',
        'context': 'ver hair, beard, and mustache. Underneath the picture was '
                   'the name Albus Dumbledore. "So this is Dumbledore!" said '
                   'Harry. "Don\'t tell me you\'d never h'},
    {   'answer': 'a very great wizard',
        'context': '\'s gone?" said Harry frantically. "Now?" "Professor '
                   'Dumbledore is a very great wizard, Potter, he has many '
                   'demands on his time  "\n'
                   '"But this is importa'},
    {   'answer': 'Professor Dumbledore',
        'context': 'rey as she straightened his many candy boxes. "I can, '
                   'can\'t I?" "Professor Dumbledore says you are to be '
                   'allowed to go," she said stiffly, as though i'},
    {   'answer': 'the only one You-Know-Who was ever afraid of',
        'context': 'www.passuneb.com\n'
                   '\n'
                   '"Harry, ever

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.50 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.84 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.57 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.92 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.66 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.85 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.37 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.60 Batches/s]



Query: How is it called Harry's aunt?
Answers:
[   {   'answer': 'Aunt Petunia',
        'context': 'in my cupboard. There was suddenly a loud tapping noise. '
                   "And there's Aunt Petunia knocking on the door, Harry "
                   'thought, his heart sinking. But he still'},
    {   'answer': 'Aunt Petunia',
        'context': 'At that moment the telephone rang and Aunt Petunia went to '
                   'answer it while Harry and Uncle Vernon watched Dudley '
                   'unwrap the racing bike, a video camer'},
    {   'answer': 'Nothing, nothing..."',
        'context': 'groaned. "What did you say?" his aunt snapped through the '
                   'door. "Nothing, nothing..."\n'
                   "Dudley's birthday -- how could he have forgotten? Harry "
                   'got slow'},
    {   'answer': 'Aunt Petunia',
        'context': 'ver, he had gotten up to find his hair exactly as it had '
                   'been before Aunt Petunia had she

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 15.16 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 10.37 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.28 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 11.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.81 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.70 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.69 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.78 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.60 Batches/s]



Query: what are the four houses names?
Answers:
[   {   'answer': 'Gryffindor, Hufflepuff, Ravenclaw, and Slytherin',
        'context': 'our house common room. "The four houses are called '
                   'Gryffindor, Hufflepuff, Ravenclaw, and Slytherin. Each '
                   'house has its own noble history and each has'},
    {   'answer': 'School houses',
        'context': 'r explain the rules." "And what are Slytherin and '
                   'Hufflepuff?" Get free e-books and video tutorials at '
                   'www.passuneb.com\n'
                   '\n'
                   '"School houses. There\'s four.'},
    {   'answer': 'Gryffindor',
        'context': ' cup here needs awarding, and the points stand thus: In '
                   'fourth place, Gryffindor, with three hundred and twelve '
                   'points; in third, Hufflepuff, with thr'},
    {   'answer': 'Some sort of test',
        'context': 'allowed. "How exactly do they sort us into houses?" he

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.21 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.86 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.43 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.52 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.66 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.59 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.64 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.20 Batches/s]



Query: How is it called Harry's uncle?
Answers:
[   {   'answer': 'Uncle Vernon',
        'context': "Uncle Vernon dumped Harry's trunk onto a cart and wheeled "
                   'it into the station for him. Harry thought this was '
                   'strangely kind until Uncle Vernon stoppe'},
    {   'answer': 'Uncle Vernon',
        'context': 'latform what?" "Nine and three-quarters." "Don\'t talk '
                   'rubbish," said Uncle Vernon. "There is no platform nine '
                   'and three-quarters." "It\'s on my ticket.'},
    {   'answer': 'Uncle Vernon',
        'context': ' like he was wearing bits of old elephant skin, probably. '
                   'Dudley and Uncle Vernon came in, both with wrinkled noses '
                   "because of the smell from Harry's "},
    {   'answer': 'Uncle Vernon',
        'context': 'rry went back to the kitchen, still staring at his letter. '
                   'He handed Uncle Vernon the bill and the postcard, sat '
  

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.19 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.36 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.33 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 15.72 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.75 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.67 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.25 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.55 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.77 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.64 Batches/s]


Query: who is Norbert
Answers:
[   {   'answer': 'eating dead rats by the crate',
        'context': " down at Hagrid's hut, helping him feed Norbert, who was "
                   'now eating dead rats by the crate. "It bit me!" he said, '
                   'showing them his hand, which was wra'},
    {   'answer': 'his crate',
        'context': 'd her. Chuckling about Malfoy, they waited, Norbert '
                   'thrashing about in his crate. About ten minutes later, '
                   'four broomsticks came swooping down out of '},
    {   'answer': 'Hagrid had Norbert packed and ready in a large crate',
        'context': "s to get out of their way in the entrance hall, where he'd "
                   'been playing tennis against the wall. Hagrid had Norbert '
                   'packed and ready in a large crate.'},
    {   'answer': 'a baby',
        'context': "Aargh! It's all right, he only got my boot -- jus' playin' "
                   '-- he\'s only a baby, after a




In [14]:
for k, v in QA_set.items():
  print(k)
  for answer in v:
    print('\t{}'.format(answer))
  print("\n")

Who is Dumbledore?
	('Albus Dumbledore', 0.8194479644298553)
	('a very great wizard', 0.4702693670988083)
	('Professor Dumbledore', 0.23260757327079773)
	('the only one You-Know-Who was ever afraid of', 0.19613751024007797)
	('Lily and James Potter', 0.19224070757627487)


How is it called Harry's aunt?
	('Aunt Petunia', 0.7305570542812347)
	('Aunt Petunia', 0.2939871773123741)
	('Nothing, nothing..."', 0.27101390808820724)
	('Aunt Petunia', 0.21833521127700806)
	("Devil's Snare", 0.05505102686583996)


what are the four houses names?
	('Gryffindor, Hufflepuff, Ravenclaw, and Slytherin', 0.9716241955757141)
	('School houses', 0.6905372440814972)
	('Gryffindor', 0.5084449350833893)
	('Some sort of test', 0.10287788510322571)
	('Houses', 0.06106731854379177)


How is it called Harry's uncle?
	('Uncle Vernon', 0.704220324754715)
	('Uncle Vernon', 0.6269576549530029)
	('Uncle Vernon', 0.4061504751443863)
	('Uncle Vernon', 0.24284610897302628)
	('Quirrell', 0.04651808366179466)


who is Nor

The obtained answers are pretty valid. I also tried with other questions but the results are not as good. (Who are Harry Potter's parents, Hermione Granger's hair color etc).

Now let's try with a DPR:

1.   Load the new retriever
2.   Update the stored embeddings
3.   Rebuild the pipeline with the new retriever but same reader
4.   Run the quetions and check the results



In [15]:
from haystack.nodes import DensePassageRetriever

retriever_DPR = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
    passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
    max_seq_len_query=64,
    max_seq_len_passage=256,
    batch_size=16,
    use_gpu=True,
    embed_title=True,
    use_fast_tokenizers=True,
)
# Important:
# Now that after we have the DPR initialized, we need to call update_embeddings() to iterate over all
# previously indexed documents and update their embedding representation.
# While this can be a time consuming operation (depending on corpus size), it only needs to be done once.
# At query time, we only need to embed the query and compare it the existing doc embeddings which is very fast.
document_store.update_embeddings(retriever_DPR)

INFO - haystack.modeling.utils -  Using devices: CUDA:0
INFO - haystack.modeling.utils -  Number of GPUs: 1


Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/493 [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-question_encoder-single-nq-base locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded facebook/dpr-question_encoder-single-nq-base


Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/492 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-ctx_encoder-single-nq-base locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...


Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

INFO - haystack.modeling.model.language_model -  Loaded facebook/dpr-ctx_encoder-single-nq-base
INFO - haystack.document_stores.elasticsearch -  Updating embeddings for all 887 docs ...


Updating embeddings:   0%|          | 0/887 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/896 [00:00<?, ? Docs/s]

In [17]:
pipe_DPR = ExtractiveQAPipeline(reader, retriever_DPR)

QA_set_DPR = {}
for q in questions:
  prediction = pipe_DPR.run(
    query=q, params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
  )
  print_answers(prediction, details='medium')
  QA_set_DPR[q] = [(answer.answer, answer.score) for answer in prediction['answers']]

for k, v in QA_set_DPR.items():
  print(k)
  for answer in v:
    print('\t{}'.format(answer))
  print("\n")

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.75 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.75 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 15.28 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.23 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.88 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.67 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.48 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.12 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.30 Batches/s]



Query: Who is Dumbledore?
Answers:
[   {   'answer': 'Albus Dumbledore',
        'context': ", as though it had been broken at least twice. This man's "
                   "name was Albus Dumbledore. Albus Dumbledore didn't seem to "
                   'realize that he had just arrived ',
        'score': 0.821822464466095},
    {   'answer': 'Professor Dumbledore, sir',
        'context': 'e celebrations." "Yeah," said Hagrid in a very muffled '
                   'voice, "I best get this bike away. G\'night, Professor '
                   'McGonagall -- Professor Dumbledore, sir."',
        'score': 0.401757150888443},
    {   'answer': 'Hagrid',
        'context': ' to Dumbledore, though, because he put it back in his '
                   'pocket and said, "Hagrid\'s late. I suppose it was he who '
                   'told you I\'d be here, by the way?" "Yes',
        'score': 0.3709588944911957},
    {   'answer': 'Hagrid',
        'context': 'suddenly as though she thought he mig

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.94 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.76 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.95 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.80 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.32 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.38 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.74 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.84 Batches/s]



Query: How is it called Harry's aunt?
Answers:
[   {   'answer': 'Aunt Petunia',
        'context': 'Once, Aunt Petunia, tired of Harry coming back from the '
                   "barbers looking as though he hadn't been at all, had taken "
                   'a pair of kitchen scissors and cut ',
        'score': 0.8935014009475708},
    {   'answer': 'Aunt Petunia',
        'context': " made him look at photographs of all the cats she'd ever "
                   'owned. "Now what?" said Aunt Petunia, looking furiously at '
                   "Harry as though he'd planned this.",
        'score': 0.8920823037624359},
    {   'answer': 'Great Auntie Enid',
        'context': ' he was hanging me out of an upstairs window by the ankles '
                   'when my Great Auntie Enid offered him a meringue and he '
                   'accidentally let go. But I bounced ',
        'score': 0.5380969643592834},
    {   'answer': 'Hagrid',
        'context': 'flepuff," said Harry gloomily. "

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.58 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.67 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.54 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.98 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.76 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.63 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.68 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.79 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.96 Batches/s]



Query: what are the four houses names?
Answers:
[   {   'answer': 'Gryffindor, Hufflepuff, Ravenclaw, and Slytherin',
        'context': 'our house common room. "The four houses are called '
                   'Gryffindor, Hufflepuff, Ravenclaw, and Slytherin. Each '
                   'house has its own noble history and each has',
        'score': 0.9716241955757141},
    {   'answer': 'Gryffindor',
        'context': ' cup here needs awarding, and the points stand thus: In '
                   'fourth place, Gryffindor, with three hundred and twelve '
                   'points; in third, Hufflepuff, with thr',
        'score': 0.5084449350833893},
    {   'answer': 'Gryffindor',
        'context': ' through it -- Neville needed a leg up -- and found '
                   'themselves in the Gryffindor common room, a cozy, round '
                   'room full of squashy armchairs. Percy dire',
        'score': 0.07008694484829903},
    {   'answer': 'your houses',
        'context':

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  6.61 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.28 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.86 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.34 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.71 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.74 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.83 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.90 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.20 Batches/s]



Query: How is it called Harry's uncle?
Answers:
[   {   'answer': 'Uncle Vernon',
        'context': 'arts." Get free e-books and video tutorials at '
                   'www.passuneb.com\n'
                   '\n'
                   "But Uncle Vernon wasn't going to give in without a fight. "
                   '"Haven\'t I told you he\'s no',
        'score': 0.8343906700611115},
    {   'answer': 'Uncle Vernon',
        'context': 'Harry had learned from Uncle Vernon that people liked to '
                   'be left alone while they did this, but it was very '
                   "difficult, he'd never had so many question",
        'score': 0.7464466989040375},
    {   'answer': 'Harvey',
        'context': "phew was called Harry. He'd never even seen the boy. It "
                   'might have been Harvey. Or Harold. There was no point in '
                   'worrying Mrs. Dursley; she always got',
        'score': 0.04913013614714146},
    {   'answer': 'Dudley',
        'cont

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  6.39 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.66 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.34 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 15.77 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.89 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.58 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.72 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.08 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.36 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.71 Batches/s]


Query: who is Norbert
Answers:
[   {   'answer': 'Norwegian Ridgeback',
        'context': 'nd video tutorials at www.passuneb.com\n'
                   '\n'
                   'CHAPTER FOURTEEN\n'
                   'Norbert the Norwegian Ridgeback\n'
                   "Quirrell, however, must have been braver than they'd "
                   'thought.',
        'score': 0.7710871398448944},
    {   'answer': 'GRYFFINDOR',
        'context': 'When it finally shouted, "GRYFFINDOR," Neville ran off '
                   'still wearing it, and had to jog back amid gales of '
                   'laughter to give it to "MacDougal, Morag." ',
        'score': 0.34064121544361115},
    {   'answer': 'going... going... gone',
        'context': 'ith the others and thanked them very much. At last, '
                   'Norbert was going... going... gone. They slipped back down '
                   'the spiral staircase, their hearts as l',
        'score': 0.28310466930270195},
    {   'answe




The results are quite similar, I'd say. The major difference is in the score. In some occurances it is higher than in the sparse method.

Now, I will try with some questions which failed with the sparse retriever.

In [18]:
questions_hard = [
                  "Who are Harry Potter's parents", 
                  "lily hair color", 
                  "who is Norbert?"
                  ]

In [19]:
QA_set_DPR_hard = {}
for q in questions_hard:
  prediction = pipe_DPR.run(
    query=q, params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
  )
  print_answers(prediction, details='minimum')
  QA_set_DPR_hard[q] = [(answer.answer, answer.score) for answer in prediction['answers']]

for k, v in QA_set_DPR_hard.items():
  print(k)
  for answer in v:
    print('\t{}'.format(answer))
  print("\n")

  start_indices = flat_sorted_indices // max_seq_len
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.23 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.34 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  6.59 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.13 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 14.53 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.37 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 15.94 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.16 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.62 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.93 Batches/s]



Query: Who are Harry Potter's parents
Answers:
[   {   'answer': "mum an' dad",
        'context': "thumpin' good `un, I'd say, once yeh've been trained up a "
                   "bit. With a mum an' dad like yours, what else would yeh "
                   "be? An' I reckon it's abou' time yeh"},
    {   'answer': 'Mr. and Mrs. Dursley',
        'context': "e Sorcerer's Stone\n"
                   'By J.K. Rowling\n'
                   'CHAPTER ONE\n'
                   'The Boy Who Lived\n'
                   'Mr. and Mrs. Dursley, of number four, Privet Drive, were '
                   'proud to say that they were '},
    {   'answer': 'Weasley twins',
        'context': 't. Percy the Prefect got up and shook his hand vigorously, '
                   'while the Weasley twins yelled, "We got Potter! We got '
                   'Potter!" Harry sat down opposite the'},
    {   'answer': 'aunt and uncle',
        'context': ' here!" "It\'s the best place for him," said Dumbledore '
    

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.54 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.47 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 12.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  4.74 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.49 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 16.41 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.20 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.83 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.85 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.93 Batches/s]



Query: lily hair color
Answers:
[   {   'answer': 'ebony and unicorn',
        'context': 'hen it, too, was snatched back by Mr. Ollivander. "No, no '
                   '-- here, ebony and unicorn hair, eight and a half inches, '
                   'springy. Go on, go on, try it out.'},
    {   'answer': 'dark red',
        'context': 'She had dark red hair and her eyes --her eyes are just '
                   'like mine, Harry thought, edging a little closer to the '
                   'glass. Bright green -- exactly the same'},
    {   'answer': 'silver',
        'context': "Dumbledore's silver hair was the only thing in the whole "
                   'hall that shone as brightly as the ghosts. Harry spotted '
                   'Professor Quirrell, too, the nervous'},
    {   'answer': 'holly',
        'context': 'ere somewhere -- I wonder, now -- yes, why not -- unusual '
                   'combination -- holly and phoenix feather, eleven inches, '
                   'nice and s

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.75 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.37 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  8.25 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 17.18 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 18.88 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.09 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.87 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.46 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 23.32 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 21.15 Batches/s]


Query: who is Norbert?
Answers:
[   {   'answer': 'Norwegian Ridgeback',
        'context': 'nd video tutorials at www.passuneb.com\n'
                   '\n'
                   'CHAPTER FOURTEEN\n'
                   'Norbert the Norwegian Ridgeback\n'
                   "Quirrell, however, must have been braver than they'd "
                   'thought.'},
    {   'answer': 'GRYFFINDOR',
        'context': 'When it finally shouted, "GRYFFINDOR," Neville ran off '
                   'still wearing it, and had to jog back amid gales of '
                   'laughter to give it to "MacDougal, Morag." '},
    {   'answer': 'Malfoy',
        'context': 'at the other two agreed with him. Anything to get rid of '
                   'Norbert -- and Malfoy. There was a hitch. By the next '
                   "morning, Ron's bitten hand had swollen "},
    {   'answer': "Hagrid hadn't been doing his gamekeeping duties because the "
                  'dragon was keeping him so busy. There were




I am very surprised about the last answer! Even though it is not perfect, it shows that the model was able to associate the egg's typology to the dragon's name!
The first two answers are still incorrect, unfortunately. I will try with generative model to understand if there will be some improvements. 