# Natural Language Understanding 

In this chapter we will explore recipies that will allow us to interpret and understand the text contained in short as well as long passages. Natural Language Understanding (NLU) is a very broad term and and the various systems developed as part of NLU do not interpret or understand a passage of text the same way a human reader would. However, based on the specificity of the task, we can create some applications that can be combined together to generate an interpretation or understanding that can be used to solve a given problem related to text processing. As part of this chapter, we will build recipes for the following tasks.

* Answer questions from a short text passage
* Answer questions from a long text passage
* Answer questions from a document corpus in an extractive manner
* Answer questions from a document corpus in an abstractive manner
* Summarize text using pre-trained models based on Transformers
* Sentence Entailment detection

# <h1><center>Question Answering</center></h1>

To get started with question answering, we will start with a simple recipe which can answer a question from a short passage.

## Getting Ready

As part of this chapter, we will use the libraries from the HuggingFace site (huggingface.co). For this recipe, we will use the BertForQuestionAnswering and BertTokenizer modules from the transformers package. The `BertForQuestionAnswering` model uses the base BERT large uncased model that was trained on the SQuAD dataset and fine-tuned for the question answering task. This pre-trained model can be used to load a text passage and answer questions based on the contents of the passage.

`pip install transformers`

`pip install datasets`

## How to do it

In this recipe, we will load a pretrained model that has been trained on the SQuAD dataset (https://huggingface.co/datasets/squad). We will initialize a context passage and use the modeal to answer a couple of questions based on the passage.

1. Do the necessary imports

In [None]:
from transformers import pipeline, BertForQuestionAnswering, BertTokenizer
import torch

2. Initialize the model and tokenizer.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
qa_model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad', device_map=device)
qa_tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad', device=device)

3. Initialize a pipeline

In [None]:
question_answer_pipeline = pipeline("question-answering", model=qa_model, tokenizer=qa_tokenizer)

4. Initialize a text passage. This passage will be used as the context to answer questions.

In [None]:
`context = "The cat had no business entering the neighbors garage, but she was there to help. The neighbor, who asked not to be identified, said she didn't know what to make of the cat's behavior. She said it seemed like it was trying to get into her home, and that she was afraid for her life. The neighbor said that when she went to check on her cat, it ran into the neighbor's garage and hit her in the face, knocking her to the ground."`

5. Initialize a question with relevant text and use it to generate an answer via the pipeline and the context.

In [None]:
question = "Where was the cat trying to enter?"
result = question_answer_pipeline(question=question, context=context)

7. Print the result.

In [None]:
print(result)

`{'score': 0.25550469756126404, 'start': 33, 'end': 54, 'answer': 'the neighbors garage,'}`

8. Print the exact text answer.

In [None]:
print(result['answer'])

`the neighbors garage,`

9. Ask another question for the same context.

In [None]:
question = "What did the cat do after entering the garage"
result = question_answer_pipeline(question=question, context=context)
print(result['answer'])

`hit her in the face, knocking her to the ground.`

# How it works

The program does the following things.

1. It initializes a `question-answering` pipeline based on the pre-trained `BertForQuestionAnswering` model and `BertTokenizer` tokenizer.

2. It further initializes a context passage and a question and emits the output of the answer based on these two parameters. It also prints the exact text of the answer.

3. It asks a follow-up question to the same pipeline by just changing the question text and prints the exact text answer as to the question.

In step 1, we do the necessary imports of the required modules and packages.

In step 2, we initialize the model and tokenizer respectively using the pre-trained `bert-large-uncased-whole-word-masking-finetuned-squad` artifacts. These will be downloaded from the HuggingFace site if they are not present locally on the machine as part of these calls. We have chosen the specific model and tokenizer for our recipe, but feel free to explore the other models on the HuggingFace site that might suit your needs. As a generic step for this as well as the following recipe, we discover whether there are any GPU devices in the system and attempt to use it. If a GPU is not detected, we use the CPU instead.

In step 3, we initialize a `question-answering` pipeline with the model and tokenizer. The task type for this pipeline is set to `question-answering`.

In step 4, we initialize a context passage. This passage was generated as part of our `Text Generation via Transformers` example in the chapter on Transformers previously.

In step 5, we initialize a question text and invoke the pipeline with the context and question and store the result in a variable. The type of the result is a python `dict` object.

In step 6, we print the value of the result. The `score` value shows the probability of the answer. The `start` and `end` values denote the start and end character indices in the context passage that constitute the answer. The `answer` value denotes the actual text of the answer.

In step 7, we print the exact text of the answer.

In step 8, we initialize a different question text and print its exact text answer.


<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
import torch
from transformers import pipeline, BertForQuestionAnswering, BertTokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
qa_model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad', device_map=device)
qa_tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad', device=device)

question_answer_pipeline = pipeline("question-answering", model=qa_model, tokenizer=qa_tokenizer)

context = "The cat had no business entering the neighbors garage, but she was there to help. The neighbor, who asked not to be identified, said she didn't know what to make of the cat's behavior. She said it seemed like it was trying to get into her home, and that she was afraid for her life. The neighbor said that when she went to check on her cat, it ran into the neighbor's garage and hit her in the face, knocking her to the ground."

question = "Where was the cat trying to enter?"
result = question_answer_pipeline(question=question, context=context)

print(result)

print(result['answer'])

question = "What did the cat do after entering the garage"
result = question_answer_pipeline(question=question, context=context)
print(result['answer'])



# <h1><center>Answer questions from a long text passage</center></h1>

In the previous recipe we learnt an approach to extract answer for a question, given a context. This pattern involves the model to retrieve the answer from the given context. The model cannot answer a question that is not contained in the context. While this does serve a purpose where we want an answer from a given context. This type of question-answering system is defined as `Closed Domain Question Answering (CDQA)`.

There is another system of question answering that can answer questions that are general in nature. These systems are trained on larger corpora. This training provides them the ability to answer questions that are open in nature. These systems are called `Open Domain Question Answering` systems or ODQA.

In [None]:
!python -m deeppavlov install en_odqa_infer_wiki

In [None]:
from deeppavlov import build_model

odqa = build_model('en_odqa_infer_wiki', download=True)
result = odqa(['What is the name of the Doctor in Star Trek?'])
print(result)

In [None]:
!python -m deeppavlov install kbqa_cq_en

## Getting Ready

As part of this recipe, we will use the **Deep Pavlov** (https://deeppavlov.ai) based ODQA system to answer an open question. We will use the `deeppavlov` library along with the Knowledge Base Question Answering (KBQA) model. This model has been trained on Engligh Wikidata as a knowledge base. It uses various NLP techniques like entity linking and disambiguation, knowledge graphs etc to extract the exact answer to the question. 

*Install the deeppavlov library*

`pip install deeppavlov` 

*Install the model*

`python -m deeppavlov install kbqa_cq_en` 

## How to do it

In this recipe, we will initialize the KBQA model based on the DeepPavlov library and use it to answer an open question. The steps for the recipe are as follows.

1. Do the necessary imports

In [None]:
from deeppavlov import build_model

2. Load the KBQA model. The identifier for the model is *kbqa_cq_en* and that is passed to the *build_model* method as an argument.

In [None]:
kbqa_model = build_model('kbqa_cq_en', download=True)

3. We use the initialized model and pass it a couple of questions that we want to be answered.

In [None]:
result = kbqa_model(['What is the capital of Egypt?', 'Who is Bill Clinton\'s wife?'])

4. We print the result as returned by the model.

*[['Cairo', 'Hillary Clinton'], [['Q85'], ['Q6294']], [['SELECT ?answer WHERE { wd:Q79 wdt:P36 ?answer. }'], ['SELECT ?answer WHERE { wd:Q1124 wdt:P26 ?answer. }']]]*

# How it works

The program does the following things:

In step 1, we import the necessary module from the deeppavlov library.

In step 2, we initialize the KBQA model using the `build_model` method. We set the *download* argument to *True* so that the model is downloaded as well in case its missing locally.

In step 3, we invoke the QA model and pass two open-ended questions to it in a python string array.

In step 4, we print the result returned by the model. The result contains three arrays.
    
    - The first array contains the exact answers to the question ordered in the same way as the original input. In this case the answers "Cairo" and "Hillary Clinton" are in the same order as the questions they pertain to.
    - The other two arrays contain the internal artifacts that is used by the model to generate the answer. For more information on the internal details of the working of DeepPavlov, we recommend the reader to refer the package reference at https://deeppavlov.ai .

<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
from deeppavlov import build_model

kbqa_model = build_model('kbqa_cq_en', download=False)

result = kbqa_model(['What is the capital of Egypt?', 'Who is Bill Clinton\'s wife?'])

print(result)

# <h1><center>Question Answering on a Document Corpus</center></h1>

For the use cases where we have document corpus that contains a large number of documents, its not feasible to load the document content at runtime to answer a question. Such an approach would lead to long query times and would not be suitable for production grade systems. In this recipe we will learn how to pre-process the documents and tranform them in a form for faster reading, indexing, and retrieval that allows the system to extract the answer for a given question with short query times.


## Getting Ready

As part of this recipe, we will use the **Haystack** (https://haystack.deepset.ai/) framework to build a QA system that can answer questions off a document corpus. We will download a dataset based on Game of Thrones and index it. In order for our QA system to be performant, we will need index the documents beforehand. Once the documents are indexed, answering a question follows a two-step process.

1. Retriever - Since we have a large number of documents. Scanning each document for fetching as answer is not a feasible approach. We will first retrive a set of candidate documents that can possibly contain an answer to our question. This step is performed using a Retriever component. It searches through the pre-created index to filter the number of documents that we will need to scan to retrieve the exact answer.

2. Reader - Once we have a candidate set of documents which can contain the answer, we will search these documents to retrieve the exact answer to our question.

We will discuss the details of these components throughout this recipe. To start with, let's setup the pre-requisites.

*Install the latest release of haystack.*

`pip install farm-haystack` 


In [None]:
!pip install farm-haystack

# How to do it

1. Do the necessary imports
   



In [None]:
import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
from haystack.pipelines.standard_pipelines import TextIndexingPipeline
from haystack.utils import fetch_archive_from_http, print_answers

2. Fetch the documents from the source and save it to a local folder. Once the documents are fetched, load the files from the dataset that we need to index. We also print the number of files that we have in the dataset.

In [2]:
doc_dir = "data/got_dataset"
fetch_archive_from_http(
            url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip",
            output_dir=doc_dir,
        )
files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)]
print(len(files_to_index))

183


3. We initialize a document store based on the files. We create an indexing pipeline based on the document store and execute the indexing operation.

In [None]:
document_store = InMemoryDocumentStore(use_bm25=True)
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)

4. Once we have loaded the documents, we initialize our retriever and reader instances.

In [4]:
retriever = BM25Retriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

5. We now create a pipeline that we can use to answer questions.

In [None]:
pipe = ExtractiveQAPipeline(reader, retriever)
prediction = pipe.run(
            query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
        )

6. We print the answers to our question.

In [6]:
print_answers(prediction, details="all")  ## Choose from `minimum`, `medium`, and `all`

'Query: Who is the father of Arya Stark?'
'Answers:'
[   <Answer {'answer': 'Eddard', 'type': 'extractive', 'score': 0.993372917175293, 'context': "s Nymeria after a legendary warrior queen. She travels with her father, Eddard, to King's Landing when he is made Hand of the King. Before she leaves,", 'offsets_in_document': [{'start': 207, 'end': 213}], 'offsets_in_context': [{'start': 72, 'end': 78}], 'document_ids': ['9e3c863097d66aeed9992e0b6bf1f2f4'], 'meta': {'_split_id': 3}}>,
    <Answer {'answer': 'Ned', 'type': 'extractive', 'score': 0.9753613471984863, 'context': "k in the television series.\n\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before their departure, Arya's h", 'offsets_in_document': [{'start': 630, 'end': 633}], 'offsets_in_context': [{'start': 74, 'end': 77}], 'document_ids': ['7d3360fa29130e69ea6b2ba5c5a8f9c8'], 'meta': {'_split_id': 10}}>,
    <Answer {'answer': 'Lord Eddard Stark', 'type': 'extractive', 'score': 0.91

# How it works

In step 1, we do the necessary imports.

In step 2, we specify a folder that will be used to save our dataset. We retrieve the dataset from the source. The second parameter to the 
`fetch_archive_from_http` method is the folder where the dataset will be downloaded to. We set the parameter to the folder which we defined in the previous step. The `fetch_archive_from_http` decompresses the archive `.zip` file and extracts all of them in the same folder. We read from the folder and create a list of files contained in the folder. 

In step 3, we initialize an `InMemoryDocumentStore` instance. In this method call, we set the argument `use_bm25` as `True`. The document store uses *bm25* as the algorithm for the retriever step. The *bm25* (Best Match 25) algorithm is simple bag-of-words based algorithm that uses a scoring function that utilizes the number of instances a term is present in the document and also on the length of the document. Note that there are various other DocumentStore options like `ElasticSearch`, `OpenSearch` etc.  We used an `InMemoryDocumentStore` document store to keep the recipe simple and focus on the retriever and reader concepts. For a QA system to work in a high-performance production system, it is recommended to use a different document store than an in-memory one. We recommend the reader to refer to https://docs.haystack.deepset.ai/docs/document_store and use an appropriate document store based on their production grade requirements. Once the document store is initialized, we also create an indexing pipeline to index the files into the initialized document store. Indexing the files allows us to search through the content faster.

In step 4, we initialize the retriever and the reader components. The `BM25Retriever` uses the *bm25* scoring function to retrieve the initial set of documents. For the reader we initialize the `FARMReader` object. It is based on deepsets FARM framework that can utilize the QA models from HuggingFace. In our case we use the `deepset/roberta-base-squad2` model to be used as a reader. The `use_gpu` argument can be set appropriately based on whether your device has a GPU.

In step 5, after having initialized the retriever and reader in the previous step, we want to combine them for querying. The `pipeline` abstraction from Haystack framework allows us to integrate the reader and retriever together using a series of pipelines that address different use cases. In this instance we will use the `ExtractiveQAPipeline` for our QA system. After initialization of the pipeline, we generate the answer to a question from the "Game of Thrones" series. The `run` method takes the question as the query. The second argument `params` dictates how the results from the retriever and reader are combined to present the answer.

- "Retriever": {"top_k": 10} - The `top_k` keyword argument specifies that the top-k (in this case, 10) results from the retriever are used by the reader to search for the exact answer.
- "Reader": {"top_k": 5} - The `top_k` keyword argument specifies that the top-k (in this case, 5) results from the reader are presented as the output of the method.

In step 6, we print the answers as processed and returned by the pipeline. The system prints out the exact answers along with the associated context that it used to extract the answer from. Note that we use the value of *minimum* for the `details` argument. This argument only presents the answer and the context. Setting the value of *mediium* for the `details` argument provides the relative score of each-answer. This score can be used to filter out the results further based on the accuracy requirements of the system. Using the *all* value for the same argument prints out start and end spans for the answer along with all the auxilliary information. We encourage the reader to make a suitable choice on the basis of their requirement.




<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
from haystack.pipelines.standard_pipelines import TextIndexingPipeline
from haystack.utils import fetch_archive_from_http, print_answers

doc_dir = "data/got_dataset"
fetch_archive_from_http(
            url="https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip",
            output_dir=doc_dir,
        )
files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)]
print(len(files_to_index))

document_store = InMemoryDocumentStore(use_bm25=True)
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)


retriever = BM25Retriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

pipe = ExtractiveQAPipeline(reader, retriever)
prediction = pipe.run(
    query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)

print_answers(prediction, details="minimum")  ## Choose from `minimum`, `medium`, and `all`

# <h1><center>Abstractive Question Answering on a Document Corpus</center></h1>

In the previous recipe, we learnt how to build a QA system based on the document corpora. The answers that were retrieved were extractive in nature, i.e. the answer snippet was a piece of text copied verbatim from the document source. 

In this recipe, we will build a QA system that will provide answers that are abstractive in nature. An abstractive answer is more readable by end users compared to an extractive one. We will load the `bilgeyucel/seven-wonders` dataset from the HuggingFace site and initialize a retriever from it. This dataset has content about the seven wonders of the ancient world. For generating the answers, we will use the PromptNode component from the Haystack framework to setup a pipeline that can generate answers in an abstractive fashion.

1. Do the necessary imports.

In [None]:
from datasets import load_dataset
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, PromptNode, PromptTemplate, AnswerParser
from haystack.pipelines import Pipeline

2. As part of this step, we load the `bilgeyucel/seven-wonders` dataset. Once the dataset is downloaded, we initialize an in-memory document store based on the BM25 algorithm, as we did in the previous recipe. We write the documents in the datset to the document store. Once the document store is initilized, we create a retriever component based on the document store.

In [None]:
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(dataset)
retriever = BM25Retriever(document_store=document_store)

3. As part of this step, we initialize a prompt template. We can define the task we want the model to perform as a simple instruction in English. We use this *PromptTemplate* instance to create a *PromptNode* component instance. This instance is initialized with the *google/flan-t5-large* model and the *PromptTemplate* we created in the previous step.

In [None]:
rag_prompt = PromptTemplate(
    prompt="""Synthesize a comprehensive answer from the following text for the given question.
                             Provide a clear and concise response that summarizes the key points and information presented in the text.
                             Your answer should be in your own words and be no longer than 50 words.
                             \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""",
    output_parser=AnswerParser(),
)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=rag_prompt, use_gpu=True)

4. We now create a pipeline and add the `retriever` and `prompt_node` components that we initialized in the previous steps.

In [None]:
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

5. Once the pipeline is set up, we use it to answer questions on the content based on the dataset.  

In [None]:
output = pipe.run(query="What is the Great Pyramid of Giza?")
print(output["answers"][0].answer)
output = pipe.run(query="Where are the hanging gardens?")
print(output["answers"][0].answer)

The Great Pyramid of Giza was built in the early 26th century BC during a period of around 27 years.[3]

The Hanging Gardens of Semiramis are the only one of the Seven Wonders for which the location has not been definitively established.

# How it works

In step 1, we do the necessary imports.

In step 2, we load the `bilgeyucel/seven-wonders` dataset into an in-memory document store. This dataset has been created out of the Wikipedia pages of Seven Wonders of the Ancient World (https://en.wikipedia.org/wiki/Wonders_of_the_World). This dataset has been pre-processed and uploaded to the HuggingFace space. This dataset can be easily downloaded by using the datasets module from HuggingFace. We use the `InMemoryDocumentStore` as our document store, with BM25 as the choice of search algorithm. We write the documents from the dataset into the document store. To have a performant query time performance `write_documents` automatically optimizes how the documents are written. Once the documents are written into, we initialize the retriver based on BM25, similar to our previous recipe.

In step 3, we don't initialize a reader unlike the previous recipe. Instead, we initialize a PromptTemplate that allows us to define the way the answers will be generated. The `prompt` argument can be used to define the task that we want to perform. It also takes two internal arguments *document* and *query*. These variables are expected to be in the execution context at runtime. In our example, we join all the documents together to form one string that can be used for the search. We also define that the query will be supplied as the *query* keyword. Please refer to the prompt engineering guide on Haystack on how to generate prompts for your use cases (https://docs.haystack.deepset.ai/docs/prompt-engineering-guidelines). 
The second argument `output_parser` takes an *AnswerParser* object. This object instructs the PromptNode object to store the results in the `answers` element. 
After defining the prompt, we initialize a PromptNode object with a model and the prompt template. We use the `google/flan-t5-large` model as the answer generator. This model is based on the Google T5 language model and has been fine-tuned (*flan* stands for `Finetuning language models`). One of the fine tuning steps as part of this model training was to operate on human written instructions as tasks. This allowed the model to perform different downstream tasks on instructions alone and reduce the need for any few-shot examples to be trained on.

In step 4, we initilize a pipeline and its retriever and prompt node components. The retriever component operates on the query supplied by the user and generates a set of results. These results are passed to the prompt node and it uses the configured `flan-t5-model` to generate the answer.

In step 5, we perform a query to the QA system with two abstract questions about the wonders and it returns the answers that it extracted from the documents.



<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
from datasets import load_dataset
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, PromptNode, PromptTemplate, AnswerParser
from haystack.pipelines import Pipeline


dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(dataset)
retriever = BM25Retriever(document_store=document_store)

rag_prompt = PromptTemplate(
    prompt="""Generate an answer for the given question in less than 50 words.
                \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""",
    output_parser=AnswerParser(),
)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", default_prompt_template=rag_prompt, use_gpu=True)

pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])


output = pipe.run(query="When was the Great Pyramid of Giza built?")
print(output["answers"][0].answer)

output = pipe.run(query="Where are the hanging gardens?")
print(output["answers"][0].answer)





# <h1><center>Text Summarization</center></h1>

We will now explore techniques for performing text summarization. Generating a summary for a long passage of text allows NLP practitioners to extract the relevant information for their use cases and use these summaries for other downstream tasks. As part of summarization, we will explore recipes that use Transformer models for generating the summaries. 

<h2><center>Text Summarization with Google T5 model</center></h2>

Our first recipe for summarization will use the Google T5 (Text-to-Text Transfer Transformer) model for summarization. Let's get started.

1. Do the necessary imports

In [None]:
from transformers import pipeline

2. As part of this step. we initialize a passage of text. We also initialize a pipeline instance. This pipeline has the task defined as `summarization` and the model used in the pipeline is the Google `t5-large`.

In [None]:
passage = "The color of animals is by no means a matter of chance; it depends on many considerations, but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals. There are, however, not a few cases in which vivid colors are themselves protective. The kingfisher itself, though so brightly colored, is by no means easy to see. The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight."
pipeline_instance = pipeline("summarization", model="t5-large")

3. We now use the pipeline_instance initialized in the previous step and pass the text passage to it to perform the summarization step.

In [None]:
pipeline_result = pipeline_instance(passage, max_length=512)

4. Once the summarization step is complete, we extract the result out of the output and print it.

In [None]:
result = pipeline_result[0]["summary_text"]
print(result)

`the color of animals is by no means a matter of chance; it depends on many considerations . in the majority of cases, coloring tends to protect the animal from danger . there are, however, not a few cases in which vivid colors are themselves protective .`

# How it works

In step 1, we do the necessary imports.

In step 2, we initialize the input passage that we need to summarize alond with the pipeline. Since we defined the task as `summarization`, the object returned by the pipeline module is of type `SummarizationPipeline`. We also passed `t5-large` as the model parameter for the pipeline. This model is based on the Encoder-Decoder Transformer model, and acts as a pure sequence-to-sequence model. That means the input and output to/from the model are text sequences. This model was pre-trained using the denoising objective of finding masked words in a sentence followed by fine-tuning on specific downstream tasks like summarization, textual entailment, language translation etc. 

In step 3, we execute the summarization step using the pipeline. We pass the passage string as the first argument but a string array can be passed as well if multiple sequences are to be summarized. We passed `max_length=512` as the second argument The T5 model is memory intensive and the compute requirements grow quadratically with the increase in the input text length.

In Step 4, we extract the summary text from the result emitted by the pipeline. The pipeline returns a list of dictionaries. Each list item corresponds to the input argument. In this case, since we passed only one string as input, the first item in the list is the output dictionary that contains our summary. The summary can be retrieved by indexing the dictionary on the `summary_text` element.

<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
from transformers import pipeline
import torch

passage = "The color of animals is by no means a matter of chance; it depends on many considerations, but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals. There are, however, not a few cases in which vivid colors are themselves protective. The kingfisher itself, though so brightly colored, is by no means easy to see. The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight."
pipeline_instance = pipeline("summarization", model="t5-large")

pipeline_result = pipeline_instance(passage, max_length=512)

result = pipeline_result[0]["summary_text"]
print(result)

the color of animals is by no means a matter of chance; it depends on many considerations . in the majority of cases, coloring tends to protect the animal from danger . there are, however, not a few cases in which vivid colors are themselves protective .

# There's more

Now that we have seen how we can generate a summary using the T5 model, we can use the same generic code framework and tweak it slightly to use other models to generate summaries.

The lines below would be common for the other summarization recipes that we are using. We added an extra variable named `device` that we will use in our pipelines. We set this variable to value of the device that we will use to generate the summary. If a `GPU` is present and configured in the system, it will be use, else the summarization will be performed using the `CPU`.

In [9]:
from transformers import pipeline
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
passage = "The color of animals is by no means a matter of chance; it depends on many considerations, but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals. There are, however, not a few cases in which vivid colors are themselves protective. The kingfisher itself, though so brightly colored, is by no means easy to see. The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight."

In the example below, we use the BART model from Facebook. This model was pre-trained using a noising function on a piece of text, followed by setting the training objective to generate the original text by removing the noise from the text. The model was further fine-tuned using the on the `CNN DailyMail` dataset for summarization.

In [None]:
pipeline_instance = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
pipeline_result = pipeline_instance(passage, max_length=512)
result = pipeline_result[0]["summary_text"]
print(result)

`The color of animals is by no means a matter of chance; it depends on many considerations, but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. There are, however, not a few cases in which vivid colors are themselves protective. The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight.`

As we observe from the generated summary, it is verbose and extractive in nature. Let's try generating a summary with a another model.

In the example below, we use the `PEGASUS` model from Google for summarization. This model is a Transformer based encoder-decoder model which was pre-trained with a large news and web page corpus on a training objective of detecting important sentences. This model was further fine-tuned for summarization on a small dataset. This model generates abstract summaries.

In [None]:
pipeline_instance = pipeline("summarization", model="google/pegasus-large", device=device)
pipeline_result = pipeline_instance([passage, passage], max_length=512)
result = pipeline_result[0]["summary_text"]
print(result)

`Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals.`

As we observe from the generated summary, it is concise and abstractive.

As many new and improved models for summarization are always in the works, we recommend that the reader refer the models on the HuggingFace site (https://huggingface.co/models?pipeline_tag=summarization) and make the respective choice based on their requirements.

<h1><center>Textual Entailment</center></h1>

In this section, we will explore techniques to detect textual entailment, given a set of two sentences. The first sentence in the set is the `premise`, which sets up a context. The second sentence is the `hypothesis`. Textual Entailment identifies the contextual relationship between the `premise` and the `hypothesis`. These relationships can be be of 3 types defined as:

* Entailment - The hypothesis supports the premise.
* Contradiction - The hypothesis contradicts the premise.
* Neutral - The hypothesis does not support or contradict the premise.

In this recipe, we will initialize different sets of sentences that are related in each of the above defined relationships and explore methods to detect these relationships. Let's get started.

1. Do the necessary imports

In [7]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

2. Initialize the device, the tokenizer and the model. In this case, we are using the Google `t5-small` model.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")       
tokenizer = T5Tokenizer.from_pretrained('t5-small', legacy=False, device=device)
model = T5ForConditionalGeneration.from_pretrained('t5-small', return_dict=True, device_map=device)

3. We initialize the premise and the hypothesis sentences. In this case, the hypothesis supports the premise.

In [9]:
premise = "The corner coffee shop serves the most awesome coffee I have ever had."
hypothesis = "I love the coffee served by the corner coffee shop."

4. In this step we tokenize the premise and the hypothesis sentences and have the model generate the output tensors.

In [10]:
input_ids = tokenizer("mnli premise: " + premise + " hypothesis: " + hypothesis, return_tensors="pt").input_ids
entailment_ids = model.generate(input_ids.to(device), max_new_tokens=20)  

5. In this step we decode the generated prediction tokens from the model and print the result. In this case, the generated prediction by the model is `entailment`.

In [None]:
prediction = tokenizer.decode(entailment_ids[0], skip_special_tokens=True, device=device)
print(prediction)

entailment

# How it works

In step 1, we do the necessary imports.

In step 2, we initialize the tokenizer with the `t5-small` model. We set the legacy flag to *False* since we don't need to use the legacy behavior of the model. We set the device value based on whatever device we have available in our execution environment. Similarly for the model, we set the model name and device parameter similar to the Tokenizer. We set the parameter *return_dict* as True so that we get the model results as a dictionary instead of tuple.

In step 3, we initialize the premise and the hypothesis sentences.

In step 4, we call the tokenizer with the `mnli premise` and `hypothesis` values. This is a simple text concatenation step to set up the tokenizer for the `entailment` task. We read the `input_ids` property to get the token identifiers for the concatenated string. Once we have the token IDs, use the model to generate the entailment prediction. This returns a list of tensors with the predictions that we use in the next step.

In step 5, we call the `decode` method of the tokenizer and pass it the first tensor (or vector) of the tensors that were returned by the `generate` call of the model. We also instruct the tokenizer to skip the special tokens that are used by the tokenizer internally. The tokenizer generates the string label from the vector that is passed in. We print the prediction result.



# There's more

Now that we have shown an example with the case of entailment with a single sentence, the same framework can be used to process a batch of sentences to generate entailment predictions. We will tailor steps 3,4, and 5 from the previous recipe for this example. We initialize an array of 2 sentences for both premise and hypothesis respectively. Both the premise sentences are the same, while the hypotheses sentences are of `entailment` and `contradiction` respectively.

In [None]:
premise = ["The corner coffee shop serves the most awesome coffee I have ever had.", "The corner coffee shop serves the most awesome coffee I have ever had."]
hypothesis = ["I love the coffee served by the corner coffee shop.", "I find the coffee served by the corner coffee shop too bitter for my taste."]

Since we have an array of sentences for both premises and hypotheses, we create an array of concatenated inputs that combine the tokenizer instruction. This array is used to pass to the tokenizer and we use the token IDs returned by tokenizer in the next step.

In [None]:
premises_and_hypotheses = [f"mnli premise: {pre} hypothesis: {hyp}" for pre, hyp in zip(premise, hypothesis)]
input_ids = tokenizer(text=premises_and_hypotheses, padding=True,
                      return_tensors="pt").input_ids

We now generate the predictions using the same methodology that we used earlier. However, in this step, we generate the inference label by iterating through the tensors returned by the models output and printing the prediction.

In [None]:
entailment_ids = model.generate(input_ids.to(device), max_new_tokens=20)
for _tensor in entailment_ids:
    entailment = tokenizer.decode(_tensor, skip_special_tokens=True, device=device)
    print(entailment)

<div class="alert alert-block alert-info">
Below is the whole code as a single script that can be validated by a development editor.
</div>

In [None]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = T5Tokenizer.from_pretrained('t5-small', legacy=False, device=device)
model = T5ForConditionalGeneration.from_pretrained('t5-small', return_dict=True, device_map=device)
premise = "The corner coffee shop serves the most awesome coffee I have ever had."
hypothesis = "I love the coffee served by the corner coffee shop."
# hypothesis = ("I love the waffle served by the corner coffee shop.")
hypothesis = ("I find the coffee served by the corner coffee shop too bitter for my taste.")
input_ids = tokenizer("mnli premise: " + premise + " hypothesis: " + hypothesis,
                      return_tensors="pt").input_ids
entailment_ids = model.generate(input_ids.to(device), max_new_tokens=20)
entailment = tokenizer.decode(entailment_ids[0], skip_special_tokens=True, device=device)
print(entailment)