# **Text Summarization using NLP**


**What is text summarization?**

Text summarization is the process of distilling the most important information from a source text.

**Why automatic text summarization?**



1.   Summaries reduce reading time.
2.   When researching documents,summaries make the  selection process easier.
3.   Automatic summarization improves the effectiveness of indexing.
4.   Automatice summarization algorithms are less biased than human summarization.
5.   Personalized summaries are useful in question-answering systems as they provied personalized information.
6.   Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of text documents they are able to process.





# **Type of summarization**




**How to do text summarization**


*   Text cleaning
*   Sentence tokenization
*   Word tokenzation
*   Word-frequency table
*   Summarization 
 
 

  **Text variable**








In [30]:
 text = """
 Water, a cornerstone of life, holds unparalleled significance in sustaining ecosystems and supporting human existence.
 Constituting about 71% of the Earth's surface, water exists in oceans, rivers, lakes, and underground reservoirs. 
 Despite its abundance, only a small fraction, roughly 2.5%, is freshwater, with the majority locked in glaciers or polar
 ice caps.The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
 of water. However, human activities, industrialization, and climate change pose formidable challenges. Water scarcity, 
 pollution, and uneven distribution threaten communities worldwide. Climate change exacerbates these challenges, contributing 
 to altered precipitation patterns, droughts, and more intense storms. Regions already grappling with water scarcity face 
 increased risks, underscoring the urgent need for sustainable water management. Efforts to address water-related issues 
 include advancements in water purification technologies, sustainable agricultural practices, and conservation initiatives. 
 Promoting responsible water usage at individual and societal levels is crucial. Additionally, fostering international 
 collaboration is paramount to ensure fair access to this vital resource. Education and awareness play pivotal roles 
 in shaping a water-responsible society. Initiatives promoting water conservation, pollution control, and equitable 
 distribution are imperative. As we navigate the complex interplay of environmental, social, and economic factors, a 
 collective commitment to safeguarding water resources emerges as a linchpin for a sustainable and resilient future.
 """



# Let's Get Started with SpaCy

In [31]:
 !pip install -U spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
                                              0.0/12.8 MB ? eta -:--:--
                                              0.0/12.8 MB 1.3 MB/s eta 0:00:10
                                             0.0/12.8 MB 393.8 kB/s eta 0:00:33
                                             0.1/12.8 MB 819.2 kB/s eta 0:00:16
                                             0.2/12.8 MB 893.0 kB/s eta 0:00:15
                                             0.2/12.8 MB 981.9 kB/s eta 0:00:13
                                              0.3/12.8 MB 1.2 MB/s eta 0:00:11
     -                                        0.4/12.8 MB 1.2 MB/s eta 0:00:11
     -                                        0.5/12.8 MB 1.3 MB/s eta 0:00:10
     -                                        0.6/12.8 MB 1.3 MB/s eta 0:00:10
     --                              

In [7]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [32]:
stopwords = list(STOP_WORDS)

In [33]:
nlp = spacy.load('en_core_web_sm')

In [34]:
doc = nlp(text)

In [35]:
tokens = [token.text for token in doc]
print(tokens)

['\n', 'Water', ',', 'a', 'cornerstone', 'of', 'life', ',', 'holds', 'unparalleled', 'significance', 'in', 'sustaining', 'ecosystems', 'and', 'supporting', 'human', 'existence', '.', '\n', 'Constituting', 'about', '71', '%', 'of', 'the', 'Earth', "'s", 'surface', ',', 'water', 'exists', 'in', 'oceans', ',', 'rivers', ',', 'lakes', ',', 'and', 'underground', 'reservoirs', '.', '\n', 'Despite', 'its', 'abundance', ',', 'only', 'a', 'small', 'fraction', ',', 'roughly', '2.5', '%', ',', 'is', 'freshwater', ',', 'with', 'the', 'majority', 'locked', 'in', 'glaciers', 'or', 'polar', '\n', 'ice', 'caps', '.', 'The', 'water', 'cycle', ',', 'an', 'intricate', 'dance', 'of', 'evaporation', ',', 'condensation', ',', 'and', 'precipitation', ',', 'governs', 'the', 'global', 'movement', '\n', 'of', 'water', '.', 'However', ',', 'human', 'activities', ',', 'industrialization', ',', 'and', 'climate', 'change', 'pose', 'formidable', 'challenges', '.', 'Water', 'scarcity', ',', '\n', 'pollution', ',', 'a

In [36]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\n'

In [37]:
word_frequencies = {}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text] = 1
      else:
        word_frequencies[word.text] += 1

In [38]:
print(word_frequencies)

{'Water': 2, 'cornerstone': 1, 'life': 1, 'holds': 1, 'unparalleled': 1, 'significance': 1, 'sustaining': 1, 'ecosystems': 1, 'supporting': 1, 'human': 2, 'existence': 1, 'Constituting': 1, '71': 1, 'Earth': 1, 'surface': 1, 'water': 11, 'exists': 1, 'oceans': 1, 'rivers': 1, 'lakes': 1, 'underground': 1, 'reservoirs': 1, 'Despite': 1, 'abundance': 1, 'small': 1, 'fraction': 1, 'roughly': 1, '2.5': 1, 'freshwater': 1, 'majority': 1, 'locked': 1, 'glaciers': 1, 'polar': 1, 'ice': 1, 'caps': 1, 'cycle': 1, 'intricate': 1, 'dance': 1, 'evaporation': 1, 'condensation': 1, 'precipitation': 2, 'governs': 1, 'global': 1, 'movement': 1, 'activities': 1, 'industrialization': 1, 'climate': 1, 'change': 2, 'pose': 1, 'formidable': 1, 'challenges': 2, 'scarcity': 2, 'pollution': 2, 'uneven': 1, 'distribution': 2, 'threaten': 1, 'communities': 1, 'worldwide': 1, 'Climate': 1, 'exacerbates': 1, 'contributing': 1, 'altered': 1, 'patterns': 1, 'droughts': 1, 'intense': 1, 'storms': 1, 'Regions': 1, 'g

In [39]:
max_frequency = max(word_frequencies.values())

In [40]:
max_frequency

11

In [41]:
for word in word_frequencies.keys():
  word_frequencies[word] = word_frequencies[word]/max_frequency

In [42]:
print(word_frequencies)

{'Water': 0.18181818181818182, 'cornerstone': 0.09090909090909091, 'life': 0.09090909090909091, 'holds': 0.09090909090909091, 'unparalleled': 0.09090909090909091, 'significance': 0.09090909090909091, 'sustaining': 0.09090909090909091, 'ecosystems': 0.09090909090909091, 'supporting': 0.09090909090909091, 'human': 0.18181818181818182, 'existence': 0.09090909090909091, 'Constituting': 0.09090909090909091, '71': 0.09090909090909091, 'Earth': 0.09090909090909091, 'surface': 0.09090909090909091, 'water': 1.0, 'exists': 0.09090909090909091, 'oceans': 0.09090909090909091, 'rivers': 0.09090909090909091, 'lakes': 0.09090909090909091, 'underground': 0.09090909090909091, 'reservoirs': 0.09090909090909091, 'Despite': 0.09090909090909091, 'abundance': 0.09090909090909091, 'small': 0.09090909090909091, 'fraction': 0.09090909090909091, 'roughly': 0.09090909090909091, '2.5': 0.09090909090909091, 'freshwater': 0.09090909090909091, 'majority': 0.09090909090909091, 'locked': 0.09090909090909091, 'glaciers

In [43]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[
Water, a cornerstone of life, holds unparalleled significance in sustaining ecosystems and supporting human existence.
, Constituting about 71% of the Earth's surface, water exists in oceans, rivers, lakes, and underground reservoirs. 
, Despite its abundance, only a small fraction, roughly 2.5%, is freshwater, with the majority locked in glaciers or polar
ice caps., The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
of water., However, human activities, industrialization, and climate change pose formidable challenges., Water scarcity, 
pollution, and uneven distribution threaten communities worldwide., Climate change exacerbates these challenges, contributing 
to altered precipitation patterns, droughts, and more intense storms., Regions already grappling with water scarcity face 
increased risks, underscoring the urgent need for sustainable water management., Efforts to address water-related issues 
include advancements

In [44]:
sentence_scores = {}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_scores.keys():
        sentence_scores[sent] = word_frequencies[word.text.lower()]
      else:
        sentence_scores[sent] += word_frequencies[word.text.lower()]


In [45]:
sentence_scores

{
 Water, a cornerstone of life, holds unparalleled significance in sustaining ecosystems and supporting human existence.: 1.9999999999999993,
 Constituting about 71% of the Earth's surface, water exists in oceans, rivers, lakes, and underground reservoirs. : 1.7272727272727268,
 Despite its abundance, only a small fraction, roughly 2.5%, is freshwater, with the majority locked in glaciers or polar
 ice caps.: 1.090909090909091,
 The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
 of water.: 2.9090909090909083,
 However, human activities, industrialization, and climate change pose formidable challenges.: 1.0000000000000002,
 Water scarcity, 
 pollution, and uneven distribution threaten communities worldwide.: 1.909090909090909,
 Climate change exacerbates these challenges, contributing 
 to altered precipitation patterns, droughts, and more intense storms.: 1.2727272727272725,
 Regions already grappling with water scarcity 

In [46]:
from heapq import nlargest

In [47]:
select_length = int(len(sentence_tokens)*0.3)
select_length

4

In [48]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [49]:
summary

[Efforts to address water-related issues 
 include advancements in water purification technologies, sustainable agricultural practices, and conservation initiatives. ,
 Regions already grappling with water scarcity face 
 increased risks, underscoring the urgent need for sustainable water management.,
 The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
 of water.,
 As we navigate the complex interplay of environmental, social, and economic factors, a 
 collective commitment to safeguarding water resources emerges as a linchpin for a sustainable and resilient future.]

In [50]:
final_summary = [word.text for word in summary]

In [51]:
summary = ' '.join(final_summary)

In [52]:
print(text)


Water, a cornerstone of life, holds unparalleled significance in sustaining ecosystems and supporting human existence.
Constituting about 71% of the Earth's surface, water exists in oceans, rivers, lakes, and underground reservoirs. 
Despite its abundance, only a small fraction, roughly 2.5%, is freshwater, with the majority locked in glaciers or polar
ice caps.The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
of water. However, human activities, industrialization, and climate change pose formidable challenges. Water scarcity, 
pollution, and uneven distribution threaten communities worldwide. Climate change exacerbates these challenges, contributing 
to altered precipitation patterns, droughts, and more intense storms. Regions already grappling with water scarcity face 
increased risks, underscoring the urgent need for sustainable water management. Efforts to address water-related issues 
include advancements in water pu

In [53]:
print(summary)

Efforts to address water-related issues 
include advancements in water purification technologies, sustainable agricultural practices, and conservation initiatives. 
 Regions already grappling with water scarcity face 
increased risks, underscoring the urgent need for sustainable water management. The water cycle, an intricate dance of evaporation, condensation, and precipitation, governs the global movement 
of water. As we navigate the complex interplay of environmental, social, and economic factors, a 
collective commitment to safeguarding water resources emerges as a linchpin for a sustainable and resilient future.

