#Haystack Question-Answering Framework

**March 2023 update by Denis Rothman**: The goal of introducing Haystack in *Transformers for NLP, 2nd Edition, Chapter 11, Let Your Data DO the Talking: Story, Questions and Answers*, was to *introduce the reader to several platforms beyond using only one set of tools*.

Haystack is an interesting platform to explore for Q&A, and more.

[01_Basic_QA_Pipeline.ipynb](https://github.com/Denis2054/Transformers-for-NLP-2nd-Edition/blob/main/Chapter11/01_Basic_QA_Pipeline.ipynb), a Haystack Q&A program, replaces this notebook that has issues on Google Colab with the previous installation.

_________________________________________________________________
Former Notebook resources

Notebook Author: [Malte Pietsch](https://www.linkedin.com/in/maltepietsch/)

[Deepset AI Haystack GitHub Repository](https://github.com/deepset-ai/haystack/)


In [1]:
# Install Haystack
!pip install farm-haystack==0.6.0

# Install specific versions of urllib and torch to avoid conflicts with preinstalled versions on Colab
!pip install urllib3==1.25.4
!pip install torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html


Collecting farm-haystack==0.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/6d/c1/004081bfe50c20433718812321044b9d9dc7cf73bc5a63a2b335227bd21c/farm_haystack-0.6.0-py3-none-any.whl (104kB)
[K     |████████████████████████████████| 112kB 8.1MB/s 
[?25hCollecting uvloop; sys_platform != "win32" and sys_platform != "cygwin"
[?25l  Downloading https://files.pythonhosted.org/packages/41/48/586225bbb02d3bdca475b17e4be5ce5b3f09da2d6979f359916c1592a687/uvloop-0.14.0-cp36-cp36m-manylinux2010_x86_64.whl (3.9MB)
[K     |████████████████████████████████| 3.9MB 13.7MB/s 
Collecting elasticsearch<=7.10,>=7.7
[?25l  Downloading https://files.pythonhosted.org/packages/14/ba/f950bdd9164fb2bbbe5093700162234fbe61f446fe2300a8993761c132ca/elasticsearch-7.10.0-py2.py3-none-any.whl (321kB)
[K     |████████████████████████████████| 327kB 49.8MB/s 
[?25hCollecting farm==0.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/a3/e4/2f47c850732a1d729e74add867e967f058370f29a313da05

# Extractive QA in a closed domain (single text)

In [2]:
# Load a  local model or any of the QA models on Hugging Face's model hub (https://huggingface.co/models)
from haystack.reader.farm import FARMReader

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True, no_ans_boost=0, return_no_answer=False)


# Create document which the model should scan for answers.
from haystack import Document

text = "The traffic began to slow down on Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward Barstow. They planned to get to Las Vegas early enough in the evening to have a nice dinner and go see a show."
doc = Document(text=text)

12/31/2020 16:02:14 - INFO - faiss -   Loading faiss with AVX2 support.
12/31/2020 16:02:14 - INFO - faiss -   Loading faiss.
12/31/2020 16:02:15 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
12/31/2020 16:02:15 - INFO - farm.infer -   Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...
12/31/2020 16:02:15 - INFO - filelock -   Lock 139851960964880 acquired on /root/.cache/torch/transformers/f7d4b9379a9c487fa03ccf3d8e00058faa9d664cf01fc03409138246f48760da.6060f348ba2b58d6d30b5324910152ffc512e7c3891ed13f22844f1a9b5c0d0f.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=571.0, style=ProgressStyle(description_…

12/31/2020 16:02:16 - INFO - filelock -   Lock 139851960964880 released on /root/.cache/torch/transformers/f7d4b9379a9c487fa03ccf3d8e00058faa9d664cf01fc03409138246f48760da.6060f348ba2b58d6d30b5324910152ffc512e7c3891ed13f22844f1a9b5c0d0f.lock





12/31/2020 16:02:16 - INFO - filelock -   Lock 139849313884144 acquired on /root/.cache/torch/transformers/8c0c8b6371111ac5fbc176aefcf9dbe129db7be654c569b8375dd3712fc4dc67.a851909c96149f062acca04d647da88d0dcd3a52cd5a8c7169e89fc6e5971c7b.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=496313727.0, style=ProgressStyle(descri…

12/31/2020 16:02:30 - INFO - filelock -   Lock 139849313884144 released on /root/.cache/torch/transformers/8c0c8b6371111ac5fbc176aefcf9dbe129db7be654c569b8375dd3712fc4dc67.a851909c96149f062acca04d647da88d0dcd3a52cd5a8c7169e89fc6e5971c7b.lock





Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
12/31/2020 16:02:44 - INFO - filelock -   Lock 139849313883528 acquired on /root/.cache/torch/transformers/1e3af82648d7190d959a9d76d727ef629b1ca51b3da6ad04039122453cb56307.6a4061e8fc00057d21d80413635a86fdcf55b6e7594ad9e25257d2f99a02f4be.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…

12/31/2020 16:02:45 - INFO - filelock -   Lock 139849313883528 released on /root/.cache/torch/transformers/1e3af82648d7190d959a9d76d727ef629b1ca51b3da6ad04039122453cb56307.6a4061e8fc00057d21d80413635a86fdcf55b6e7594ad9e25257d2f99a02f4be.lock





12/31/2020 16:02:45 - INFO - filelock -   Lock 139849296850280 acquired on /root/.cache/torch/transformers/b901c69e8e7da4a24c635ad81d016d274f174261f4f5c144e43f4b00e242c3b0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…

12/31/2020 16:02:46 - INFO - filelock -   Lock 139849296850280 released on /root/.cache/torch/transformers/b901c69e8e7da4a24c635ad81d016d274f174261f4f5c144e43f4b00e242c3b0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock





12/31/2020 16:02:46 - INFO - filelock -   Lock 139849313883528 acquired on /root/.cache/torch/transformers/2d9b03b59a8af464bf4238025a3cf0e5a340b9d0ba77400011e23c130b452510.6e217123a3ada61145de1f20b1443a1ec9aac93492a4bd1ce6a695935f0fd97a.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=772.0, style=ProgressStyle(description_…

12/31/2020 16:02:47 - INFO - filelock -   Lock 139849313883528 released on /root/.cache/torch/transformers/2d9b03b59a8af464bf4238025a3cf0e5a340b9d0ba77400011e23c130b452510.6e217123a3ada61145de1f20b1443a1ec9aac93492a4bd1ce6a695935f0fd97a.lock





12/31/2020 16:02:47 - INFO - filelock -   Lock 139849313883528 acquired on /root/.cache/torch/transformers/507984f2e28c7dfed5db9a20acd68beb969c7f2833abc9e582e967fa0291f3dc.ec06af3e1b426682955dab3bd553eaf178b6eafac9079fc133925e0e2654213e.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=79.0, style=ProgressStyle(description_w…

12/31/2020 16:02:47 - INFO - filelock -   Lock 139849313883528 released on /root/.cache/torch/transformers/507984f2e28c7dfed5db9a20acd68beb969c7f2833abc9e582e967fa0291f3dc.ec06af3e1b426682955dab3bd553eaf178b6eafac9079fc133925e0e2654213e.lock





12/31/2020 16:02:48 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
12/31/2020 16:02:48 - INFO - farm.infer -   Got ya 1 parallel workers to do inference ...
12/31/2020 16:02:48 - INFO - farm.infer -    0 
12/31/2020 16:02:48 - INFO - farm.infer -   /w\
12/31/2020 16:02:48 - INFO - farm.infer -   /'\
12/31/2020 16:02:48 - INFO - farm.infer -   


In [3]:
# Some questions that "work":
reader.predict(query="Where is Pioneer Boulevard located?", documents=[doc])

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 22.86 Batches/s]


{'answers': [{'answer': 'Los Angeles',
   'context': 'The traffic began to slow down on Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool ja',
   'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403',
   'offset_end': 66,
   'offset_end_in_doc': 66,
   'offset_start': 55,
   'offset_start_in_doc': 55,
   'probability': 0.8022719840448774,
   'score': 11.204442024230957}],
 'no_ans_gap': 10.05622935295105,
 'query': 'Where is Pioneer Boulevard located?'}

In [4]:
reader.predict(query="Who drove to Las Vegas?", documents=[doc])


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 34.52 Batches/s]


{'answers': [{'answer': 'Jo and Maria',
   'context': 't of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward Barstow. They plann',
   'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403',
   'offset_end': 81,
   'offset_end_in_doc': 305,
   'offset_start': 69,
   'offset_start_in_doc': 293,
   'probability': 0.8081116565023317,
   'score': 11.50229263305664}],
 'no_ans_gap': 3.7832298278808594,
 'query': 'Who drove to Las Vegas?'}

In [5]:
reader.predict(query="Who is singing?", documents=[doc])


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 34.25 Batches/s]


{'answers': [{'answer': 'Nat King Cole',
   'context': 'r pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and dro',
   'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403',
   'offset_end': 82,
   'offset_end_in_doc': 277,
   'offset_start': 69,
   'offset_start_in_doc': 264,
   'probability': 0.8818636635368704,
   'score': 16.081584930419922}],
 'no_ans_gap': 12.141630411148071,
 'query': 'Who is singing?'}

In [6]:
reader.predict(query="What is the plan for the night?", documents=[doc])


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.32 Batches/s]


{'answers': [{'answer': 'They planned to get to Las Vegas early enough in the evening to have a nice dinner and go see a show',
   'context': 'de their way out of LA and drove toward Barstow. They planned to get to Las Vegas early enough in the evening to have a nice dinner and go see a show.',
   'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403',
   'offset_end': 149,
   'offset_end_in_doc': 464,
   'offset_start': 49,
   'offset_start_in_doc': 364,
   'probability': 0.7315710454025786,
   'score': 8.020864486694336}],
 'no_ans_gap': 6.077347040176392,
 'query': 'What is the plan for the night?'}

In [7]:
# Some questions where the answer is not in the text (and the model therefore cannot find it)
# If you inspect the results, you will see that the value "no_ans_gap" is negative for all these questions and actually indicates that the likelihood of "no answer" is higher than the best textual answer
questions = ["Where is Los Angeles located?","Where is LA located?","Where is Barstow located?","Where is Las Vegas located ?"]
for q in questions:
  result = reader.predict(query=q, documents=[doc])
  print(result)
  print("\n")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.07 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 31.49 Batches/s]


{'query': 'Where is Los Angeles located?', 'no_ans_gap': -0.41483497619628906, 'answers': [{'answer': 'Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward Barstow', 'score': 1.0702476501464844, 'probability': 0.5333954464343146, 'context': 'Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward Barstow', 'offset_start': 0, 'offset_end': 328, 'offset_start_in_doc': 34, 'offset_end_in_doc': 362, 'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403'}]}


{'query':

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 30.91 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.63 Batches/s]

{'query': 'Where is Barstow located?', 'no_ans_gap': -1.593643844127655, 'answers': [{'answer': 'Las Vegas', 'score': 0.7261489033699036, 'probability': 0.522676586113031, 'context': 'de their way out of LA and drove toward Barstow. They planned to get to Las Vegas early enough in the evening to have a nice dinner and go see a show.', 'offset_start': 72, 'offset_end': 81, 'offset_start_in_doc': 387, 'offset_end_in_doc': 396, 'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403'}]}


{'query': 'Where is Las Vegas located ?', 'no_ans_gap': -2.1370767652988434, 'answers': [{'answer': 'Los Angeles', 'score': -0.025329262018203735, 'probability': 0.49920846122316637, 'context': 'The traffic began to slow down on Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool ja', 'offset_start': 55, 'offset_end': 66, 'offset_start_in_doc': 55, 'offset_end_in_doc': 66, 'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403'}]}







In [8]:
# We can also directly make use of this "no answer" option and allow our reader to return "no answer" (indicated via "answer: None" in the results) by enabling the arg in the FARMreader:
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True, no_ans_boost=0, return_no_answer=True)
for q in questions:
  result = reader.predict(query=q, documents=[doc])
  print(result)
  print("\n")

12/31/2020 16:02:49 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
12/31/2020 16:02:49 - INFO - farm.infer -   Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...
Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
12/31/2020 16:03:00 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None
12/31/2020 16:03:00 - INFO - farm.infer -   Got ya 1 parallel workers to do inference ...
12/31/2020 16:03:00 - INFO - farm.infer -    0 
12/31/2020 16:03:00 - INFO - farm.infer 

{'query': 'Where is Los Angeles located?', 'no_ans_gap': -0.41483497619628906, 'answers': [{'answer': None, 'score': 1.4850826263427734, 'probability': 0.5462760172072342, 'context': None, 'offset_start': 0, 'offset_end': 0, 'document_id': None, 'meta': None}, {'answer': 'Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward Barstow', 'score': 1.0702476501464844, 'probability': 0.5333954464343146, 'context': 'Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. Nat King Cole was singing as Jo and Maria slowly made their way out of LA and drove toward

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.63 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.21 Batches/s]

{'query': 'Where is Barstow located?', 'no_ans_gap': -1.593643844127655, 'answers': [{'answer': None, 'score': 2.3197927474975586, 'probability': 0.5719897905641838, 'context': None, 'offset_start': 0, 'offset_end': 0, 'document_id': None, 'meta': None}, {'answer': 'Las Vegas', 'score': 0.7261489033699036, 'probability': 0.522676586113031, 'context': 'de their way out of LA and drove toward Barstow. They planned to get to Las Vegas early enough in the evening to have a nice dinner and go see a show.', 'offset_start': 72, 'offset_end': 81, 'offset_start_in_doc': 387, 'offset_end_in_doc': 396, 'document_id': '4fa8dd28-9694-47cb-bc5a-19a74f357403'}]}


{'query': 'Where is Las Vegas located ?', 'no_ans_gap': -2.1370767652988434, 'answers': [{'answer': None, 'score': 2.1370767652988434, 'probability': 0.5663893175959525, 'context': None, 'offset_start': 0, 'offset_end': 0, 'document_id': None, 'meta': None}, {'answer': 'Los Angeles', 'score': -0.025329262018203735, 'probability': 0.49920846


