<div class="alert alert-success"><h1>Text Summarization with Pretrained Models in Python</h1></div>

**Text summarization** is a powerful natural language processing (NLP) task that involves condensing long pieces of text into shorter versions while preserving the key information. There are two main approaches to summarization. **Extractive summarization** selects and combines key sentences from the original text, whereas **abstractive summarization** generates new sentences that capture the main ideas in a more natural and coherent manner. In this tutorial, we will limit our focus to extractive summarization using pretrained models from Hugging Face.

## Learning Objectives
By the end of this tutorial, you will be able to:
+ **Implement a text summarization pipeline:** Build and run a summarization pipeline that generates concise summaries by rephrasing the input text.
+ **Analyze the summarization output:** Understand how to adjust summary length and interpret the generated summary.


## Prerequisites
Before we begin, please ensure that you have:
+ A working knowledge of Python, including variables, functions, and basic object-oriented programming.
+ Familiarity with deep learning model development in Python using Keras and TensorFlow.
+ A Python (version 3.x) environment with the `tensorflow`, `keras`, `ipywidgets`, and `transformers` packages installed.

Let's also reduce the log verbosity of the `transformers` package. This ensures that we only get error alerts but not informational logs.

In [None]:
from transformers import logging
logging.set_verbosity_error()

<hr>

## 1. Instantiate a Pipeline for Text Summarization
The first thing we do is import the `pipeline` function from the Hugging Face `transformers` package. Then we instantiate a pipeline object called `summarizer` while specifying `"summarization"` as the task. 

In [None]:
from transformers import pipeline
summarizer = pipeline(task = "summarization")

## 2. Run Text Summarization on Sample Text
Now, we provide sample text to summarize. The sample text is an extended passage (approximately 300 words) covering various aspects of artificial intelligence. We then generate a summary using our pipeline, while also adjusting parameters such as `max_length`, `min_length`, `do_sample`, `top_k`, and `top_p` to control the generation process. These parameters help promote more creative and abstractive output by enabling sampling during text generation.

In [None]:
sample_text = """
summarize: Artificial intelligence and machine learning have become fundamental components of modern technology, 
revolutionizing the way industries operate and innovate. In recent years, these technologies have 
driven transformative changes in sectors ranging from healthcare and finance to transportation and education. 
For example, machine learning algorithms are now used to predict patient outcomes, optimize financial 
portfolios, and manage traffic flows in smart cities. As organizations increasingly adopt these 
technologies, they are able to leverage large amounts of data to derive insights and make informed decisions. 

Recent advancements in deep learning have further accelerated progress in areas such as computer vision, 
natural language processing, and autonomous vehicles. Deep learning models, which mimic the human brain's 
neural networks, can process complex patterns in data, enabling breakthroughs in image recognition, 
speech synthesis, and natural language understanding. These improvements have led to significant developments 
in self-driving cars, facial recognition systems, and virtual assistants, technologies that are rapidly 
becoming integral parts of everyday life.

Moreover, AI-powered automation is reshaping the workforce by streamlining operations and reducing the need 
for manual intervention. Companies are investing in intelligent systems that can learn, adapt, and improve over time, 
resulting in increased productivity and cost savings. However, the rapid adoption of artificial intelligence 
also brings challenges, including concerns about data privacy, ethical considerations, and potential 
job displacement. Policymakers, industry leaders, and researchers must collaborate to ensure that these 
technologies are developed and deployed responsibly.

Ongoing research in artificial intelligence promises even greater advancements. By embracing innovation 
and addressing ethical and societal concerns, we can harness the power of AI and machine learning to drive 
sustainable progress and improve the quality of life for people around the world. Furthermore, as advancements 
in technology continue to evolve, the integration of AI into everyday applications is expected to increase 
exponentially, creating opportunities for innovative solutions across diverse sectors.
"""

summary = summarizer(
    sample_text,
    max_length = 200,
    min_length = 50,
    do_sample = True,
    top_k = 50,
    top_p = 0.95
)
print("Summary:")
print(summary[0]['summary_text'])

The final summary is printed, providing a synthesized overview of the main ideas in the original text. Feel free to adjust the pipeline parameters to see how they impact the generated text.

## 3. Load a Specific Model from the Model Hub
In the previous example, we used the default model for the specified task. However, if we decide to use a model of our own choosing, we can do so. We simply need to specify the model's name when instantiating the pipeline object. This time, we'll use the `"facebook/bart-large-cnn"` model. This model is widely used for summarization tasks because it has been fine-tuned on the CNN/DailyMail dataset, which consists of news articles and their corresponding summaries. The BART model employs an encoder-decoder architecture that enables it to generate coherent and fluent summaries by synthesizing and rephrasing the source text.

In [None]:
model_name = "facebook/bart-large-cnn"
summarizer_ = pipeline(task = "summarization", model = model_name)

Let's pass the same sample text, as before, to this new pipeline to see how it summarizes it.

In [None]:
summary = summarizer_(
    sample_text,
    max_length = 200,
    min_length = 50,
    do_sample = True,
    top_k = 50,
    top_p = 0.95
)
print("Summary:")
print(summary[0]['summary_text'])

As expected, the summaries are different. It's always good practice to try several models for any given task. Feel free to select another model from the Model Hub to see how the results vary.

<div class="alert alert-info"><b>Note:</b> For guidance on how to choose the right pretrained model for a specific task from the Hugging Face Model Hub, watch the course video titled <b>"Choosing the right Model from the Hugging Face Hub"</b>.</div>