# Abstractive text summarization

In this tutorial, you'll learn how to create abstractive text summaries with a local transformer model from the Hugging Face library.

[Text summarization](https://www.ibm.com/think/topics/text-summarization) is a core task in artificial intelligence (AI) and Natural Language Processing ([NLP](https://www.ibm.com/think/topics/natural-language-processing)) that turns long, complex documents into short easy-to-read forms while preserving the main ideas. Modern transformer models make this possible by understanding text, highlighting the most important ideas, and generating clear, comprehensible summaries. 


## What is abstractive text summarization?

Abstractive summarization is a type of automatic text summarization in which a system generates new sentences that paraphrase and condense the meaning of a source text. The goal is to produce a summary that captures the core ideas using different wording and structure, rather than copying sentences verbatim.

From a tooling perspective, abstractive summarization typically combines NLP preprocessing steps with neural language models that perform the actual generation. 
Traditional NLP techniques such as tokenization, sentence segmentation and word [embedding](https://www.ibm.com/think/topics/embedding) representations are used to structure and encode the input text, while the summarization model learns how to generate new sentences from these representations. 

There are two main approaches to automatic text summarization: 

**Extractive text summarization**: Selects and copies the most important sentences directly from the original
text, similar to highlighting key sentences with a marker. This method is faster and simpler to
implement, but it is limited to the wording and structure of the source document. NLP techniques here typically involve sentence scoring, keyword extraction, or graph-based ranking algorithms.

**Abstractive text summarization**: Generates new sentences that capture the core meaning of the text, much
like how a human would write a summary in their own words. This approach is more flexible and
natural-sounding, but it is also more computationally intensive. NLP techniques used include encoder-decoder models, attention mechanisms, and contextual embeddings, which allow the system to understand relationships between words and generate new text. 

## The evolution of modern text summarization

This technical progress wasn't achieved overnight. Early NLP systems focused on
explicitly modeling linguistic structure. Techniques from information extraction ([IE](https://www.ibm.com/think/topics/information-extraction)) were used to identify entities, relations, and events using hand-drafted rules or statistical models.<sup>1</sup> During this period, most text summarization methods were extractive, selecting important sentences rather than generating new text.

Neural extractive models representents the next step. One influential example, SummaRuNNer, a recurrent neural network ([RNN](https://www.ibm.com/think/topics/recurrent-neural-networks)) based sequence model, showed
that neural models could capture document-level context and outperform traditional extractive techniques.<sup>2</sup> Early neural net models included RNN and [LSTM](https://www.ibm.com/think/topics/lstm) networks
helped capture sequential dependencies across long documents. Convolutional neural networks (CNNs) were also applied to text for local syntactical feature extraction, complementing sequential models.<sup>3</sup>

The idea of abstractive summarization became more practical with the introduction of[encoder-decoder](https://www.ibm.com/think/topics/encoder-decoder-model) neural models, which can map an input sequence to a variable-length output sequence suitable for tasks such as summarization. In these models, the encoder processes the input text and converts it into a series of contextual representations that capture the meaning and relationships between words. The decoder generates the output sequence token by token, attending to relevant parts of the input through attention mechanisms to ensure coherence and to preserve information. This structure allows the model to produce entirely new sentences rather than relying on predefined templates or extracted facts. 

In recent years, state-of-the-art transformer-based models have achieved strong results on large datasets such as Gigaword or collections of news articles (CNN/DailyMail training data). Pretraining on large corpora enabled these models to generalize across domains and produce fluent summaries.

Some systems incorporate a knowledge base or learning-based lexical modules to improve factual correctness and contextual understanding, particularly in specialized domains. These ideas are closely related to Retrieval-Augmented Generation (RAG) approaches, where a model can retrieve relevant documents or facts from an external source and then generate abstractive summaries that integrate this information. More broadly, abstractive summarization underlies many modern applications, from RAG-based QA systems to automated report generation, demonstrating its role as a building block in practical AI systems.

Abstractive text summarization is a form
of document summarization, closely related to tasks like [machine translation](https://www.ibm.com/think/topics/machine-translation) and natural language generation. Earlier syntactic text summarization techniques
relied on grammatical rules, whereas modern approaches leverage neural architectures for summary generations and rewriting. 

## How does abstractive summarization work?

Abstractive summarization relies on advanced language models such as BART, T5, or PEGASUS, which are implemented as sequence-to-sequence (seq2seq) transformer models. These models transform input documents into numerical representations that capture contextual meaning, then generate concise summaries that convey the same ideas in new words.

The summarization process begins with tokenization, where the individual words are split into tokens (words or subwords). These
tokens are converted into numerical representations and processed by the encoder, which uses [self-attention](https://www.ibm.com/think/topics/self-attention) to understand how different parts of the text relate to each other. Self-attention allows the model to weigh the importance of each token relative to every other token in the sequence. This way, the model can capture longe-range dependencies and contextual relationships across the document. The encoder produces contextual representations that capture the document's information, which the decoder then uses to generate the final summary. 

The decoder generates the summary token by token, using the encoder's contextual representations and attention mechanisms to focus on the most relevant parts of the input. It also considers previously generated tokens to maintain coherence. Some models may directly copy certain words or phrases from the input, which is useful for names, numbers, or technical terms.

By combining these techniques, the model produces human-like summaries that paraphrase and condense the original text instead of copying it verbatim.

## What kind of models are best for abstractive summarization?

Modern abstractive summarization is dominated by transformer-based sequence-to-sequence models, which treat summarization as a generation task: given an input sequence (the corpus or document), the model generates an output sequence (the summary). 

### BART (Bidirectional and Auto-Regressive Transformers)

BART is a transformer-based encoder-decoder model designed for text generation tasks. Its encoder is bidirectional, meaning it reads the entire input sequence both left-to-right and right-to-left, allowing it to fully understand context around each word. 

BART is pretrained using denoising objectives, where the model learns to reconstruct original text from corrupted versions (e.g., with masked tokens, deleted spans, or shuffled sentences). This pretraining strategy makes BART well-suited for abstractive summarization tasks, and fine-tune BART models achieve strong performance on standard benchmarks.<sup>4</sup> For this tutorial, we're using `facebook/bart-large-cnn`, a BART model that is highly regarded and widely considered one of the best pre-trained, open-source models available for abstractive text summarization. 



### Other popular seq2seq models for summarization

While this tutorial example uses BART, several other transformer-based seq2seq models are commonly used for abstractive summarization: 

- **T5 (Text-to-text Transfer Transformer)**: Uses transfer learning and is pretrained on a large mixture of tasks framed in a text-to-text format.<sup>5</sup> The T5 model frames all NLP tasks as text-to-text problems. All tasks are handled using the same input-output format, making T5 highly flexible.
- **PEGASUS**: Specifically designed for summarization, PEGASUS introduces a pretraining objective that trains models to generate important missing sentences from documents, often yielding strong results on long-form content.<sup>6</sup> 
- **Long-form models (e.g., Long-T5, LED)**: These models extend the seq2seq methodology with attention mechanisms to handle much longer documents, making them well-suited for summarizing reports, research papers, or legal texts.

## Requirements

- **Python**: version 3.9 or higher (check yours with `python --version`)
- **RAM**: 8 GB enough to load the model, store intermediate data, and handle small-to-medium texts
- **Storage**: ~2-3 GB free for model download(s)

## Steps

### Step 1. Clone the GitHub Repository

To get hands-on and run this project, clone the GitHub repository by using 
[https://github.com/IBM/ibmdotcom-tutorials](https://github.com/IBM/ibmdotcom-tutorials) as the HTTPS
URL. For detailed steps on how to clone a repository, refer to the 
[GitHub documentation](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository). You can find this specific tutorial inside the ibmdotcom-tutorials repo
under the [generative AI directory](https://github.com/IBM/ibmdotcom-tutorials/tree/main/generative-ai).

### Step 2. Set up your environment

This tutorial uses a Jupyter Notebook to demonstrate abstractive text summarization with pretrained
transformer models from HuggingFace. Jupyter Notebooks are versatile tools that allow you to combine
code, text, and visualization in a single environment. You can run this notebook in your local IDE or
explore cloud-based options like [watsonx.ai Runtime](https://cloud.ibm.com/catalog/services/watsonxai-runtime), which provides a managed environment for running Jupyter Notebooks.

## Step 3. Install dependencies for abstractive text summarization

Before we run our abstractive summarization example, we need to install a few Python libraries from
Hugging Face and PyTorch. These libraries provide the tools and pretrained models needed to process
text, run neural networks, and generate summaries.


In [None]:
!pip install -q transformers torch sentencepiece
print("Dependencies installed!")

- **transformers**:  Hugging Face's library that provides access to pretrained models like BART, T5, and many more, along with easy-to-use pipelines for tasks such as summarization.
- **torch**: The underlying deep-learning framework (PyTorch) used to run these models efficiently on CPUs or GPUs. 
- **sentencepiece**: A tokenizer used by many transformer models to split text into tokens the model can process.

### Step 4. Import the pipeline function

The `pipeline` function from Hugging Face's Transformers library is a ready-to-use interface for running
machine learning models on common tasks. For abstractive summarization, it automatically loads the proper
tokenizer and pretrained model for summarizing text and handles all the steps from preprocessing to
output generation. This allows us to generate summaries with just a few lines of code.



In [None]:
from transformers import pipeline

### Step 5. Load summarization pipeline

This step sets up the summarization pipeline using the pretrained BART model. 

In [None]:
summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn"
)

The `summarizer` object now contains the tokenizer, model, and all necessary post-processing so you can
input text and get a summary. Note that this cell may take a few minutes to download the model. 

Other summarization models in the Hugging Face Transformer library can be used by changing the `model`
argument. Different models may vary in speed, summary length, and output style.

### Step 6. Prepare your text

Let's start with a simple example of abstractive summarization itself. You can replace this with any
text you want, such as an article excerpt, a blog post, or your own notes. For best results keep it
readable.

This variable `text` will be passed into the summarization pipeline in the next step.

In [None]:
text = """
Abstractive summarization is a technique in natural language processing
that generates concise summaries by rephrasing the original content.
Rather than copying sentences directly, the model creates new text
that captures the core meaning of the source.
"""

### Step 7. Generate the summary

In this step, we pass our prepared text (`text`) into the `summarizer` pipeline. The model reads the
input text, identifies the main points, and generates a shorter version in its own words.

The `max_length` and `min_length` parameters control how long the generated summary can be, allowing
us to balance brevity and completeness. Setting `do_sample=False` disables randomness during generation,
ensuring the model produces the same summary each time the cell is run. 

The summarization pipeline always returns a list of results so it can handle multiple input texts at
once. Since we provide a single input here, we extract the first result from the list and print the 
generated text, the summary.

In [None]:
summary = summarizer(
    text,
    max_length=50, # maximum number of tokens in the generated summary
    min_length=20, # minimum number of tokens in the generated summary
    do_sample=False # when False use deterministic output; True enables randomness for stochastic output variations
)

print(summary[0]["summary_text"])

### Example abstractive summary

Below is an example of the kind of output you might see after running the summarization pipeline on the
original text:

```text
The model creates new text that captures the core meaning of the source. It generates concise summaries by rephrasing the original content.
```

This summary illustrates abstractive summarization because the model does not simply copy sentences from
the input. Instead, it paraphrases and condenses the original context, expressing the main ideas using
different wording and sentence structure. While the meaning is preserved, the phrasing is new. 

Try modifying the input text or adjusting the parameters to see how the summary changes. You can also
experiment with longer paragraphs or different writing styles to observe how the model adapts.

The `do_sample` parameter controls whether the model introduces randomness during text generation: 

- `do_sample = True`: The model samples from mutliple possible tokens at each step, introducing randomness. This can produce different summaries each time you run it, while still preserving the core meaning of the text.
- `do_sample = False`: The model selects the most likely next token at each step, producing deterministic output. Running the same input with the same parameters will always produce the same summary.

For example, below is the output generated from the same input text, but with `do_sample=True`:

```text
Rephrasing is a technique in natural language processing that generates concise summaries. Rather than copying sentences directly, the model creates new text that captures the core meaning of the source.
```

This tradeoff between consistency (`do_sample=False`) and variablity (`do_sample=True`) is common in text generation tasks.

## Practical limitations of abstractive summarization 

While models like BART or T5 generate fluent and concise summaries, they have some practical limitations:

- Factual consistency and hallucination: Sometimes, models generate content that sounds plausible but isn't supported by the original text. This phenomenon, called hallucination, is especially important in sensitive domains like medicine, finance, or law.
- Context length: Summaries may miss important details when the input text is very long.
- Domain specificity: Models trained on general datasets may struggle with specialized texts. 

Research has proposed methods to address these issues. For example, Zhang et al. (2020) developed methods to measure and optimize factual correctness, making summaries more reliable, particularly in domains like radiology reports <sup>7</sup>.

Comparing generated outputs to a reference summary is essential, and evaluation metrics, or automatic evaluation of summaries can be used to assess quality. Common evaluation metrics include ROUGE (measuring overlap of n-grams between generated reference summaries), BLEU (originally developed for machine translation), and METEOR (which considers synonymy and stemming). These metrics provide quantitative ways to evaluate how well the generated summary preserves content and meaning. 

## Conclusion

In this notebook, we explored abstractive text summarization and how modern transformer-based models can
generate concise, human-like summaries by understanding and rephrasing the original text. Using Hugging 
Face's `pipeline` API and a pretrained BART model, we were able to move from raw input text to a
meaningful summary with just a few lines of code.

Unlike extractive summarization, which selects and reuses existing sentences, abstractive summarization
creates new text that captures the core ideas of the source. This makes it more flexible and
natural-sounding, but also more computationally complex. By working through each step you've seen how 
these systems work in practice. 

If you're interested in exploring other approaches to text summarization, particularly extractive
methods, check out this [Python text summarization tutorial](https://www.ibm.com/think/tutorials/text-summarization-python) that covers classic techniques such as Luhn, LexRank, and Latent Semantic
Analysis (LSA). Comparing extractive and abstractive approaches side by side can help deepen your
understanding of when and why each method is used.

## Footnotes

[1] Angeli, Gabor, Melvin Jose Johnson Premkumar, and Christopher D. Manning. "Leveraging linguistic structure for open domain information extraction." In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 344-354. 2015. https://doi.org/10.3115/v1/P15-1034. 

[2] Nallapati, Ramesh, Feifei Zhai, and Bowen Zhou. 2017. “SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents.” In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 3075–3081. https://doi.org/10.1609/aaai.v31i1.10958.

[3] Jiang, Xinyu, Bowen Zhang, Yunming Ye, and Zhenhua Liu. "A hierarchical model with recurrent convolutional neural networks for sequential sentence classification." In CCF International conference on natural language processing and Chinese computing, pp. 78-89. Cham: Springer International Publishing, 2019.

[4] Venkataramana, Attada, K. Srividya, and R. Cristin. "Abstractive text summarization using bart." In 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), pp. 1-6. IEEE, 2022. https://doi.org/10.1109/MysuruCon55714.2022.9972639. 

[5] Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. "Exploring the limits of transfer learning with a unified text-to-text transformer." Journal of machine learning research 21, no. 140 (2020): 1-67. https://arxiv.org/abs/1910.10683. 

[6] Zhang, Jingqing, Yao Zhao, Mohammad Saleh, and Peter Liu. "Pegasus: Pre-training with extracted gap-sentences for abstractive summarization." In International conference on machine learning, pp. 11328-11339. PMLR, 2020.

[7] Zhang, Yuhao, Derek Merck, Emily Tsai, Christopher D. Manning, and Curtis Langlotz. 2020. “Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5108–5120. https://doi.org/10.18653/v1/2020.acl-main.458.
 
