# Question Answering

In [1]:
# Install the latest main of Haystack
!pip install --upgrade pip
!pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-22.2.2-py3-none-any.whl (2.0 MB)
[K     |████████████████████████████████| 2.0 MB 34.8 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.2.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting farm-haystack[colab]
  Cloning https://github.com/deepset-ai/haystack.git to /tmp/pip-install-z8gdexai/farm-haystack_b549a22f43e44fdf9099f41fee9726b7
  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-install-z8gdexai/farm-haystack_b549a22f43e44fdf9099f41fee9726b7
  Resolved https://github.com/deepset-ai/haystack.git to commit 2b803a265b6e66a46ec7e6688a2fd2b7b3114d77
  Installing build d

In [2]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

In [3]:
# Imports needed to run this notebook
from pprint import pprint
from tqdm import tqdm
from haystack.nodes import QuestionGenerator, BM25Retriever, FARMReader
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.pipelines import (
    QuestionGenerationPipeline,
    RetrieverQuestionGenerationPipeline,
    QuestionAnswerGenerationPipeline,
)
from haystack.utils import launch_es, print_questions

In [4]:
# Option 2: In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2


In [5]:
import os
from subprocess import Popen, PIPE, STDOUT

es_server = Popen(
    ["elasticsearch-7.9.2/bin/elasticsearch"], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1)  # as daemon
)
# wait until ES has started
! sleep 30

In [6]:
text1 = "machine learning ml is a field of inquiry devoted to understanding and building methods that learn, that is, methods that leverage data to improve performance on some set of tasks. it is seen as a part of artificial intelligence."
text2 = "the study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning. some implementations of machine learning use data and neural networks in a way that mimics the working of a biological brain.  in its application across business problems, machine learning is also referred to as predictive analytics.overviewlearning algorithms work on the basis that strategies, algorithms, and inferences that worked well in the past are likely to continue working well in the future. these inferences can be obvious, such as since the sun rose every morning for the last , days, it will probably rise tomorrow morning as well. they can be nuanced, such as x of families have geographically separate species with color variants, so there is a y chance that undiscovered black swans exist."
text3 = "machine learning programs can perform tasks without being explicitly programmed to do so. it involves computers learning from data provided so that they carry out certain tasks. for simple tasks assigned to computers, it is possible to program algorithms telling the machine how to execute all steps required to solve the problem at hand on the computers part, no learning is needed. for more advanced tasks, it can be challenging for a human to manually create the needed algorithms. in practice, it can turn out to be more effective to help the machine develop its own algorithm, rather than having human programmers specify every needed step.the discipline of machine learning employs various approaches to teach computers to accomplish tasks where no fully satisfactory algorithm is available. in cases where vast numbers of potential answers exist, one approach is to label some of the correct answers as valid. this can then be used as training data for the computer to improve the algorithms it uses to determine correct answers. for example, to train a system for the task of digital character recognition, the mnist dataset of handwritten digits has often been used.history and relationships to other fieldsthe term machine learning was coined in  by arthur samuel, an ibm employee and pioneer in the field of computer gaming and artificial intelligence. also the synonym selfteaching computers were used in this time period.by the early s an experimental learning machine with punched tape memory, called cybertron, had been developed by raytheon company to analyze sonar signals, electrocardiograms and speech patterns using rudimentary reinforcement learning. it was repetitively trained by a human operatorteacher to recognize patterns and equipped with a goof button to cause it to reevaluate incorrect decisions."
text4 = "a representative book on research into machine learning during the s was nilssons book on learning machines, dealing mostly with machine learning for pattern classification. interest related to pattern recognition continued into the s, as described by duda and hart in . in  a report was given on using teaching strategies so that a neural network learns to recognize  characters  letters,  digits, and  special symbols from a computer terminal.tom m. mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field a computer program is said to learn from experience e with respect to some class of tasks t and performance measure p if its performance at tasks in t, as measured by p,  improves with experience e. this definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms. this follows alan turings proposal in his paper computing machinery and intelligence, in which the question can machines think is replaced with the question can machines do what we as thinking entities can do."
text5 = "modern day machine learning has two objectives, one is to classify data based on models which have been developed, the other purpose is to make predictions for future outcomes based on these models."
docs = [{"content": text1}, {"content": text2}, {"content": text3}, {"content": text4}, {"content": text5}]

# Initialize document store and write in the documents
document_store = ElasticsearchDocumentStore()
document_store.write_documents(docs)

# Initialize Question Generator
question_generator = QuestionGenerator()

INFO:haystack.telemetry:Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by calling disable_telemetry() or by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems on the documentation page. More information at https://haystack.deepset.ai/guides/telemetry
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading config.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/850M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading added_tokens.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Using sep_token, but it is not set yet.


In [7]:
retriever = BM25Retriever(document_store=document_store)
rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)

print(f"\n * Generating questions for documents matching the query 'Machine Learning'\n")
result = rqg_pipeline.run(query="Machine Learning")
print_questions(result)


 * Generating questions for documents matching the query 'Machine Learning'


Generated questions:
 - What was a representative book on research into machine learning during the s?
 - What did nilssons book on learning machines deal mostly with?
 - Interest related to pattern recognition continued into what decade?
 - How did a neural network learn to recognize characters letters, digits, and special symbols from a computer terminal?
 - Tom m. mitchell provided a more formal definition of what?
 - What is the formal definition of the algorithms studied in the machine learning field?
 - A computer program is said to learn from experience e with respect to some class of tasks?
 - What does t, as measured by p, improve with experience?
 - What offers a fundamental operational definition of the tasks in which machine learning is concerned?
 - What is replaced by the question can machines think?
 - What type of programs can perform tasks without being explicitly programmed to do so?
 - Wha

In [8]:
reader = FARMReader("deepset/roberta-base-squad2")
qag_pipeline = QuestionAnswerGenerationPipeline(question_generator, reader)
for idx, document in enumerate(tqdm(document_store)):

    print(f"\n * Generating questions and answers for document {idx}: {document.content[:100]}...\n")
    result = qag_pipeline.run(documents=[document])
    print_questions(result)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)


Downloading pytorch_model.bin:   0%|          | 0.00/473M [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.


Downloading tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.infer:Got ya 2 parallel workers to do inference ...
INFO:haystack.modeling.infer: 0     0  
INFO:haystack.modeling.infer:/w\   /w\ 
INFO:haystack.modeling.infer:/'\   / \ 
0it [00:00, ?it/s]


 * Generating questions and answers for document 0: machine learning ml is a field of inquiry devoted to understanding and building methods that learn, ...




Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 13.13 Batches/s]
1it [00:01,  1.16s/it]


Generated pairs:
 - Q: Machine learning ml is a field of inquiry devoted to understanding and building methods that learn.
      A: artificial intelligence
 - Q: Machine learning is seen as a part of what?
      A: artificial intelligence

 * Generating questions and answers for document 1: the study of mathematical optimization delivers methods, theory and application domains to the field...




Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s][A
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.71 Batches/s]
2it [00:02,  1.37s/it]


Generated pairs:
 - Q: The study of mathematical optimization delivers methods, theory and application domains to what field?
      A: machine learning
 - Q: Data mining is a related field of study, focusing on exploratory data analysis through what?
      A: unsupervised learning
 - Q: Some implementations of machine learning use data and what else?
      A: neural networks
 - Q: Machine learning is also referred to as what?
      A: predictive analytics
 - Q: Overviewlearning algorithms work on the basis that what works well in the past is likely to continue?
      A: strategies, algorithms, and inferences
 - Q: What are the inferences that the sun rose every morning for the last, days?
      A: obvious
 - Q: What is an example of a nuanced inference?
      A: x of families have geographically separate species with color variants
 - Q: What is the chance that undiscovered black swans exist?
      A: y
 - Q: What can be nuanced?
      A: x of families have geographically separate spe


Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s][A
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  3.58 Batches/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s][A
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.36 Batches/s]
3it [00:04,  1.69s/it]


Generated pairs:
 - Q: What type of programs can perform tasks without being explicitly programmed to do so?
      A: machine learning
      A: artificial intelligence
 - Q: What involves computers learning from data provided so that they carry out certain tasks?
      A: machine learning
      A: training
 - Q: No learning is needed for a machine to execute all steps required to solve a problem at hand on what part?
      A: computers
      A: algorithm is available
 - Q: For more advanced tasks, it can be challenging for what person to manually create the needed algorithms?
      A: human
      A: human operatorteacher
 - Q: What can be more effective than having human programmers specify every needed step?
      A: help the machine develop its own algorithm
      A: a goof button
 - Q: What discipline uses various approaches to teach computers to accomplish tasks where no fully satisfactory algorithm is available?
      A: machine learning
      A: machine learning
 - Q: What can b


Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s][A
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  7.34 Batches/s]
4it [00:06,  1.56s/it]


Generated pairs:
 - Q: What was a representative book on research into machine learning during the s?
      A: nilssons book on learning machines
 - Q: What did nilssons book on learning machines deal mostly with?
      A: machine learning for pattern classification
 - Q: Interest related to pattern recognition continued into what decade?
      A: the s
 - Q: How did a neural network learn to recognize characters letters, digits, and special symbols from a computer terminal?
      A: using teaching strategies
 - Q: Tom m. mitchell provided a more formal definition of what?
      A: the algorithms studied in the machine learning field
 - Q: What is the formal definition of the algorithms studied in the machine learning field?
      A: a computer program is said to learn from experience e
 - Q: A computer program is said to learn from experience e with respect to some class of tasks?
      A: t
 - Q: What does t, as measured by p, improve with experience?
      A: e
 - Q: What offers a 


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 34.30 Batches/s]
5it [00:06,  1.35s/it]


Generated pairs:
 - Q: How many objectives does modern day machine learning have?
      A: two
 - Q: What is one of the objectives of modern day Machine Learning?
      A: to classify data based on models which have been developed



