# Building abstractive text summaries

In the field of text summarization, there are two primary categories of summarization, extractive and abstractive summarization.

Extractive summarization takes subsections of the text and joins them together to form a summary. This is commonly backed by graph algorithms like TextRank to find the sections/sentences with the most commonality. These summaries can be highly effective but they are unable to transform text and don't have a contextual understanding.

Abstractive summarization uses Natural Language Processing (NLP) models to build transformative summaries of text. This is similar to having a human read an article and asking what was it about. A human wouldn't just give a verbose reading of the text. This notebook shows how blocks of text can be summarized using an abstractive summarization pipeline.

# Install dependencies

Install `txtai` and all dependencies. Since this notebook is using optional pipelines, we need to install the pipeline extras package.

In [2]:
%%capture
!pip install git+https://github.com/neuml/txtai#egg=txtai[pipeline]

# Create a Summary instance

The Summary instance is the main entrypoint for text summarization. This is a light-weight wrapper around the summarization pipeline in Hugging Face Transformers.

In addition to the default model, additional models can be found on the [Hugging Face model hub](https://huggingface.co/models?pipeline_tag=summarization).


In [3]:
%%capture

from txtai.pipeline import Summary

# Create summary model
summary = Summary()

# Summarize text

The example below shows how a large block of text can be distilled down into a smaller summary.

In [4]:
text = ("Search is the base of many applications. Once data starts to pile up, users want to be able to find it. It’s the foundation "
       "of the internet and an ever-growing challenge that is never solved or done. The field of Natural Language Processing (NLP) is "
       "rapidly evolving with a number of new developments. Large-scale general language models are an exciting new capability "
       "allowing us to add amazing functionality quickly with limited compute and people. Innovation continues with new models "
       "and advancements coming in at what seems a weekly basis. This article introduces txtai, an AI-powered search engine "
       "that enables Natural Language Understanding (NLU) based search in any application."
)

summary(text, maxlength=10)

'Search is the foundation of the internet'

This is by using our dataset, shorten down.

In [14]:
text = ("""In recent years, people are seeking for a solution to improve text
summarization for Thai language. Although several solutions such
as PageRank, Graph Rank, Latent Semantic Analysis (LSA)
models, etc., have been proposed, research results in Thai text
summarization were restricted due to limited corpus in Thai
language with complex grammar. This paper applied a text
summarization system for Thai travel news based on keyword
scored in Thai language by extracting the most relevant sentences
from the original document. We compared LSA and Non-negative
Matrix Factorization (NMF) to find the algorithm that is suitable
with Thai travel news. The suitable compression rates for Generic
Sentence Relevance score (GRS) and K-means clustering were also
evaluated. From these experiments, we concluded that keyword
scored calculation by LSA with sentence selection by GRS is the
best algorithm for summarizing Thai Travel News, compared with
human with the best compression rate of 20%."""
)

summary(text, minlength = 30, maxlength=50)

'A paper applied a text-summarization system for Thai travel news based on keywordscored in Thai language by extracting the most relevant sentences from the original document. We compared LSA and NMF to find the algorithm that is suitable'