#### Text summarization is the technique for generating a concise and precise summary of voluminous texts while focusing on the sections that convey useful information, and without losing the overall meaning.
There are two approaches to summarizing texts in NLP: extraction and abstraction.
Extraction-based summarization :
    In extraction-based summarization, a subset of words that represent the most important points is pulled from a piece of text and combined to make a summary. Think of it as a highlighter—which selects the main information from a source text.
Abstraction-based summarization :
    In abstraction-based summarization, advanced deep learning techniques are applied to paraphrase and shorten the original document, just like humans do. Think of it as a pen—which produces novel sentences that may not be part of the source document.

Here I have taken some text from data science blog of feature enginnering and try to summarizing text so lets gets started :

In [7]:
## input text article
article_text="Feature engineering is a machine learning technique that leverages data to create new variables that aren't in the training set. It can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy"

## Import Modules

In [8]:
import re
import nltk

## Data Preprocessing

In [9]:
article_text = article_text.lower()
article_text

"feature engineering is a machine learning technique that leverages data to create new variables that aren't in the training set. it can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy"

In [10]:
# remove spaces, punctuations and numbers
clean_text = re.sub('[^a-zA-Z]', ' ', article_text)
clean_text = re.sub('\s+', ' ', clean_text)
clean_text

'feature engineering is a machine learning technique that leverages data to create new variables that aren t in the training set it can produce new features for both supervised and unsupervised learning with the goal of simplifying and speeding up data transformations while also enhancing model accuracy'

In [11]:
# split into sentence list
sentence_list = nltk.sent_tokenize(article_text)
sentence_list

["feature engineering is a machine learning technique that leverages data to create new variables that aren't in the training set.",
 'it can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy']

In [22]:
## run this cell once to download stopwords
# import nltk
# nltk.download('stopwords')
# We can also use spacy lib

In [13]:
!pip install nltk
import nltk





## Word Frequencies

In [14]:
stopwords = nltk.corpus.stopwords.words('english')

word_frequencies = {}
for word in nltk.word_tokenize(clean_text):
    if word not in stopwords:
        if word not in word_frequencies:
            word_frequencies[word] = 1
        else:
            word_frequencies[word] += 1

In [15]:
maximum_frequency = max(word_frequencies.values())

for word in word_frequencies:
    word_frequencies[word] = word_frequencies[word] / maximum_frequency

## Calculate Sentence Scores

In [16]:
sentence_scores = {}

for sentence in sentence_list:
    for word in nltk.word_tokenize(sentence):
        if word in word_frequencies and len(sentence.split(' ')) < 30:
            if sentence not in sentence_scores:
                sentence_scores[sentence] = word_frequencies[word]
            else:
                sentence_scores[sentence] += word_frequencies[word]

In [19]:
word_frequencies

{'feature': 0.5,
 'engineering': 0.5,
 'machine': 0.5,
 'learning': 1.0,
 'technique': 0.5,
 'leverages': 0.5,
 'data': 1.0,
 'create': 0.5,
 'new': 1.0,
 'variables': 0.5,
 'training': 0.5,
 'set': 0.5,
 'produce': 0.5,
 'features': 0.5,
 'supervised': 0.5,
 'unsupervised': 0.5,
 'goal': 0.5,
 'simplifying': 0.5,
 'speeding': 0.5,
 'transformations': 0.5,
 'also': 0.5,
 'enhancing': 0.5,
 'model': 0.5,
 'accuracy': 0.5}

In [20]:
sentence_scores

{"feature engineering is a machine learning technique that leverages data to create new variables that aren't in the training set.": 7.5,
 'it can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy': 9.0}

## Text Summarization

In [21]:
# get top 5 sentences
import heapq
summary = heapq.nlargest(5, sentence_scores, key=sentence_scores.get)

print(" ".join(summary))

it can produce new features for both supervised and unsupervised learning, with the goal of simplifying and speeding up data transformations while also enhancing model accuracy feature engineering is a machine learning technique that leverages data to create new variables that aren't in the training set.
