# Answering questions from a document corpus in an abstractive manner

In the previous recipe, we learned how to build a QA system based on the document corpora. The answers that were retrieved were extractive in nature (i.e., the answer snippet was a piece of text copied verbatim from the document source). There are techniques to generate an abstractive answer too, which is more readable by end users compared to an extractive one.

Getting ready

For this recipe, we will build a QA system that will provide answers that are abstractive in nature. We will load the bilgeyucel/seven-wonders dataset from the Hugging Face site and initialize a retriever from it. This dataset has content about the seven wonders of the ancient world. To generate the answers, we will use the PromptNode component from the Haystack framework to set up a pipeline that can generate answers in an abstractive fashion.

Imports

In [19]:
#%pip install farm-haystack

In [4]:
from datasets import load_dataset
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import(
    BM25Retriever, PromptNode,
    PromptTemplate, AnswerParser
)
from haystack.pipelines import Pipeline

As part of this step, we load the bilgeyucel/seven-wonders dataset into an in-memory document store. This dataset has been created out of the Wikipedia pages of Seven Wonders of the Ancient World (https://en.wikipedia.org/wiki/Wonders_of_the_World). This dataset has been preprocessed and uploaded to the Hugging Face site, and can be easily downloaded by using the datasets module from Hugging Face. We use InMemoryDocumentStore as our document store, with bm25 as the search algorithm. We write the documents from the dataset into the document store. To have a performant query time performance, the write_documents method automatically optimizes how the documents are written. Once the documents are written into, we initialize the retriever based on bm25, similar to our previous recipe:

In [6]:
dataset = load_dataset("bilgeyucel/seven-wonders",
                       split = "train")
document_store = InMemoryDocumentStore(use_bm25 = True)
document_store.write_documents(dataset)
retriever = BM25Retriever(document_store = document_store)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/46.0 [00:00<?, ?B/s]

data/train-00000-of-00001-4077bd623d5510(…):   0%|          | 0.00/119k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/151 [00:00<?, ? examples/s]

Updating BM25 representation...: 100%|██████████| 151/151 [00:00<00:00, 11824.43 docs/s]


As part of this step, we initialize a prompt template. We can define the task we want the model to perform as a simple instruction in English using the prompt argument. It also takes two inline arguments, document and query. These arguments are expected to be in the execution context at runtime. The second argument, output_parser, takes an AnswerParser object. This object instructs the PromptNode object to store the results in the answers element. After defining the prompt, we initialize a PromptNode object with a model and the prompt template. We use the google/flan-t5-large model as the answer generator. This model is based on the Google T5 language model and has been fine-tuned (flan stands for fine-tuning language models). Fine-tuning a language model with an instruction dataset allows the language model to perform tasks following simple instructions and generating text based on the given context and instruction. One of the fine-tuning steps as part of this model training was to operate on human written instructions as tasks. This allowed the model to perform different downstream tasks on instructions alone and reduced the need for any few-shot examples to be trained on.

In [16]:
from haystack.nodes.prompt.invocation_layer.hugging_face import HFLocalInvocationLayer
from haystack.nodes import PromptNode, PromptModel

rag_prompt = PromptTemplate(
    prompt = """Synthesize a comprehensive answer from the following text for the given question.
    Provide a clear and concise response that summarizes the key points and information presented in the text.
    Your answer should be in your own words and be no longer than 50 words.
    \n\n Related text:
    {join(documents)} \n\n Question: {query}
    \n\n Answer:""",
    output_parser = AnswerParser(),

)

# Create a PromptModel
prompt_model = PromptModel(
    model_name_or_path = "google/flan-t5-large",
    invocation_layer_class = HFLocalInvocationLayer,
    model_kwargs = {
        "task_name": "text2text-generation",
        "device": 0,  # 0 uses gpu(-1 is cpu)
    }
)

# Pass the model into PromptNode

prompt_node = PromptNode(
    model_name_or_path = prompt_model,
    default_prompt_template = rag_prompt
)

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


We now create a pipeline and add the retriever and prompt_node components that we initialized in the previous steps. The retriever component operates on the query supplied by the user and generates a set of results. These results are passed to the prompt node, which uses the configured flan-t5-model to generate the answer:

In [17]:
pipe = Pipeline()
pipe.add_node(

              component = retriever,
              name = "retriever",
              inputs = ["Query"]
)
pipe.add_node(
    component = prompt_node,
    name = "prompt_node",
    inputs = ["retriever"]

)

Once the pipeline is set up, we use it to answer questions on the content based on the dataset:

In [18]:
output = pipe.run(query = "What is the great Pyramid of Giza?")
print(output["answers"][0].answer)

output = pipe.run(query = "Where are the hanging gardens?")
print(output["answers"][0].answer)

Token indices sequence length is longer than the specified maximum sequence length for this model (2916 > 512). Running this sequence through the model will result in indexing errors
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The Great Pyramid of Giza is the oldest of the Seven Wonders of the Ancient World.


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The Hanging Gardens are the only one of the Seven Wonders for which the location has not been definitively established.
