# Question Generation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial13_Question_generation.ipynb)

This is a bare bones tutorial showing what is possible with the QuestionGenerator Nodes and Pipelines which automatically
generate questions which the question generation model thinks can be answered by a given document.

### Prepare environment

#### Colab: Enable the GPU runtime
Make sure you enable the GPU runtime to experience decent speed in this tutorial.  
**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg">

In [None]:
# Install needed libraries

!pip install grpcio-tools==1.34.1
!pip install git+https://github.com/deepset-ai/haystack.git

# If you run this notebook on Google Colab, you might need to
# restart the runtime after installing haystack.

In [3]:
# Imports needed to run this notebook

from pprint import pprint
from tqdm import tqdm
from haystack.nodes import QuestionGenerator, ElasticsearchRetriever, FARMReader
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.pipelines import QuestionGenerationPipeline, RetrieverQuestionGenerationPipeline, QuestionAnswerGenerationPipeline
from haystack.utils import launch_es

Let's start an Elasticsearch instance with one of the options below:

In [4]:
# Option 1: Start Elasticsearch service via Docker
launch_es()

8c25bfa0f71fbdaab81f7fca0820e49a47b528786bf1cd7389259b1d74a9366a


In [None]:
# Option 2: In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'],
                   stdout=PIPE, stderr=STDOUT,
                   preexec_fn=lambda: os.setuid(1)  # as daemon
                  )
# wait until ES has started
! sleep 30

Let's initialize some core components

In [5]:
text1 = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace."
text2 = "Princess Arya Stark is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark. She is the sister of the incumbent Westerosi monarchs, Sansa, Queen in the North, and Brandon, King of the Andals and the First Men. After narrowly escaping the persecution of House Stark by House Lannister, Arya is trained as a Faceless Man at the House of Black and White in Braavos, using her abilities to avenge her family. Upon her return to Westeros, she exacts retribution for the Red Wedding by exterminating the Frey male line."
text3 = "Dry Cleaning are an English post-punk band who formed in South London in 2018.[3] The band is composed of vocalist Florence Shaw, guitarist Tom Dowse, bassist Lewis Maynard and drummer Nick Buxton. They are noted for their use of spoken word primarily in lieu of sung vocals, as well as their unconventional lyrics. Their musical stylings have been compared to Wire, Magazine and Joy Division.[4] The band released their debut single, 'Magic of Meghan' in 2019. Shaw wrote the song after going through a break-up and moving out of her former partner's apartment the same day that Meghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 2021, the band shared details of their debut studio album, New Long Leg. They also shared the single 'Strong Feelings'.[9] The album, which was produced by John Parish, was released on 2 April 2021.[10]"

docs = [{"content": text1},
        {"content": text2},
        {"content": text3}]

# Initialize document store and write in the documents
document_store = ElasticsearchDocumentStore()
document_store.write_documents(docs)

# Initialize Question Generator
question_generator = QuestionGenerator()

## Question Generation Pipeline

The most basic version of a question generator pipeline takes a document as input and outputs generated questions
which the the document can answer.

In [7]:
question_generation_pipeline = QuestionGenerationPipeline(question_generator)
for idx, document in enumerate(document_store):
        
    print(f"\n * Generating questions for document {idx}: {document.content[:50]}...")
    result = question_generation_pipeline.run(documents=[document])

    print("Generated questions:")
    for result in result["generated_questions"]:
        for question in result["questions"]:
            print(f" - {question}")


 * Generating questions for document 0: Python is an interpreted, high-level, general-purp...
Generated questions:
 -  Who created Python?
 -  When was Python first released?
 -  What is Python's design philosophy?

 * Generating questions for document 1: Princess Arya Stark is the third child and second ...
Generated questions:
 -  Who is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark?
 -  Princess Arya Stark is the sister of what Westerosi monarchs?
 -  What is Sansa, Queen in the North, and Brandon, King of the Andals?
 -  What is Arya trained as?
 -  Where is the House of Black and White located?
 -  What is the name of the first men?
 -  What is the name of the line that Frey exterminates?
 -  Where does the Red Wedding take place?

 * Generating questions for document 2: Dry Cleaning are an English post-punk band who for...
Generated questions:
 -  What is the name of the English post-punk band that formed in South London in 2018?
 -  W

## Retriever Question Generation Pipeline

This pipeline takes a query as input. It retrieves relevant documents and then generates questions based on these.

In [8]:
retriever = ElasticsearchRetriever(document_store=document_store)
rqg_pipeline = RetrieverQuestionGenerationPipeline(retriever, question_generator)

print(f"\n * Generating questions for documents matching the query 'Arya Stark'")
result = rqg_pipeline.run(query="Arya Stark")

print("Generated questions:")
for result in result["generated_questions"]:
    for question in result["questions"]:
        print(f" - {question}")


 * Generating questions for documents matching the query 'Arya Stark'
Generated questions:
 -  Who is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark?
 -  Princess Arya Stark is the sister of what Westerosi monarchs?
 -  What is Sansa, Queen in the North, and Brandon, King of the Andals?
 -  What is Arya trained as?
 -  Where is the House of Black and White located?
 -  What is the name of the first men?
 -  What is the name of the line that Frey exterminates?
 -  Where does the Red Wedding take place?


## Question Answer Generation Pipeline

This pipeline takes a document as input, generates questions on it, and attempts to answer these questions using
a Reader model

In [6]:
reader = FARMReader("deepset/roberta-base-squad2")
qag_pipeline = QuestionAnswerGenerationPipeline(question_generator, reader)
for idx, document in enumerate(tqdm(document_store)):

    print(f"\n * Generating questions and answers for document {idx}: {document.content[:20]}...")
    result = qag_pipeline.run(documents=[document])

    for pair in result["results"]:
        print(f" - Q:{pair['query']}")
        for answer in pair["answers"]:
            print(f"      A: {answer.answer}")

Some weights of the model checkpoint at deepset/roberta-base-squad2 were not used when initializing RobertaModel: ['qa_outputs.bias', 'qa_outputs.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
ML Logging is turned off. No parameters, metrics or artifacts will be logged to 


 * Generating questions and answers for document 0: Python is an interpr...


To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.96 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
1it [00:05,  5.68s/it]

 - Q: Who created Python?
      A: Guido van Rossum
 - Q: When was Python first released?
      A: 1991
 - Q: What is Python's design philosophy?
      A: emphasizes code readability

 * Generating questions and answers for document 1: Princess Arya Stark ...


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.63 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.76 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.59 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.98 Batches/s]
2it [00:27, 15.46s/it]

 - Q: Who is the third child and second daughter of Lord Eddard Stark and his wife, Lady Catelyn Stark?
      A: Princess Arya Stark
 - Q: Princess Arya Stark is the sister of what Westerosi monarchs?
      A: Sansa, Queen in the North, and Brandon, King of the Andals and the First Men
 - Q: What is Sansa, Queen in the North, and Brandon, King of the Andals?
      A: sister
 - Q: What is Arya trained as?
      A: Faceless Man
 - Q: Where is the House of Black and White located?
      A: Braavos
 - Q: What is the name of the first men?
      A: Brandon
 - Q: What is the name of the line that Frey exterminates?
      A: Frey male line
 - Q: Where does the Red Wedding take place?
      A: Westeros

 * Generating questions and answers for document 2: Dry Cleaning are an ...


Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.06s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.06s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.01s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.16s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.06 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00

 - Q: What is the name of the English post-punk band that formed in South London in 2018?
      A: Dry Cleaning
      A: Boundary Road Snacks and Drinks
 - Q: Who is the vocalist of Dry Cleaning?
      A: Florence Shaw
      A: ghan Markle and Prince Harry announced they were engaged.[5] This was followed by the release of two EPs that year: Sweet Princess in August and Boundary Road Snacks and Drinks in October. The band were included as part of the NME 100 of 2020,[6] as well as DIY magazine's Class of 2020.[7] The band signed to 4AD in late 2020 and shared a new single, 'Scratchcard Lanyard'.[8] In February 2021, the band shared details of their debut studio album, New Long Leg. They also shared the single 'Strong Feelings'.[9] The album, which was produced by John Parish
 - Q: Where did Dry Cleaning form?
      A: South London
      A: Boundary Road
 - Q: What does the band use instead of sung vocals?
      A: spoken word
      A: 2020,[6] as well as DIY magazine's Class of 2020.[7




## About us

This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany

We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.

Some of our other work:
- [German BERT](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)
- [FARM](https://github.com/deepset-ai/FARM)

Get in touch:
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)

By the way: [we're hiring!](https://www.deepset.ai/jobs)