# Summarizing text using pre-trained models based on Transformers

We will now explore techniques for performing text summarization. Generating a summary for a long passage of text allows NLP practitioners to extract the relevant information for their use cases and use these summaries for other downstream tasks. As part of the summarization, we will explore recipes that use Transformer models to generate the summaries.

### Getting ready

Our first recipe for summarization will use the Google Text-to-Text Transfer Transformer (T5) model for summarization.

Imports

In [2]:
from transformers import pipeline



As part of this step, we initialize the input passage that we need to summarize along with the pipeline. We also calculate the length of the passage since this will be used as an argument to be passed to the pipeline during the task execution in the next step. Since we have defined the task as summarization, the object returned by the pipeline module is of the SummarizationPipeline type. We also pass t5-large as the model parameter for the pipeline. This model is based on the Encoder-Decoder Transformer model and acts as a pure sequence-to-sequence model. That means the input and output to/from the model are text sequences.

In [3]:
passage = "The color of animals is by no means a matter of chance; it depends on many considerations, but in the majority of cases tends to protect the animal from danger by rendering it less conspicuous. Perhaps it may be said that if coloring is mainly protective, there ought to be but few brightly colored animals. There are, however, not a few cases in which vivid colors are themselves protective. The kingfisher itself, though so brightly colored, is by no means easy to see. The blue harmonizes with the water, and the bird as it darts along the stream looks almost like a flash of sunlight."
passage_length = len(passage.split(' '))
pipeline_instance = pipeline("summarization", model="t5-large")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu


We now use the pipeline_instance initialized in the previous step and pass the text passage to it to perform the summarization step. A string array can be passed as well if multiple sequences are to be summarized. We pass max_length=512 as the second argument. The T5 model is memory-intensive and the compute requirements grow quadratically with the increase in the input text length. This step might take a few minutes to complete based on the compute capability of the environment you are executing this on:

In [4]:
pipeline_result = pipeline_instance(
    passage, max_length = passage_length
)

Both `max_new_tokens` (=256) and `max_length`(=105) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Once the summarization step is complete, we extract the result from the output and print it. The pipeline returns a list of dictionaries. Each list item corresponds to the input argument. In this case, since we passed only one string as input, the first item in the list is the output dictionary that contains our summary. The summary can be retrieved by indexing the dictionary on the summary_text element:

In [5]:
result = pipeline_result[0]["summary_text"]
print(result)

the color of animals is by no means a matter of chance; it depends on many considerations . in the majority of cases, coloring tends to protect the animal from danger . there are, however, not a few cases in which vivid colors are themselves protective .
