Text Summarization

In [68]:

text = """The intricate dance of ecosystems is a testament to the delicate balance of nature. Within every forest, ocean, and desert, myriad species interact in ways that sustain and support each other, forming a complex web of life. Predators and prey, plants and pollinators, and countless other relationships create an environment where each organism has a role to play. This interdependence ensures the stability and resilience of ecosystems, allowing them to adapt to changes and recover from disturbances. However, human activities such as deforestation, pollution, and climate change threaten this balance, highlighting the urgent need for conservation efforts to protect the natural world and its intricate dynamics. Understanding and preserving these ecosystems is crucial not only for the survival of countless species but also for the well-being of humanity, which relies on the services they provide, from clean air and water to fertile soil and climate regulation."""

In [69]:
len(text)

967

Importing the libaries and dataset

In [70]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation


In [71]:
nlp = spacy.load("en_core_web_sm")

In [72]:
doc = nlp(text)

In [73]:
tokens = [token.text for token in doc]
tokens

['The',
 'intricate',
 'dance',
 'of',
 'ecosystems',
 'is',
 'a',
 'testament',
 'to',
 'the',
 'delicate',
 'balance',
 'of',
 'nature',
 '.',
 'Within',
 'every',
 'forest',
 ',',
 'ocean',
 ',',
 'and',
 'desert',
 ',',
 'myriad',
 'species',
 'interact',
 'in',
 'ways',
 'that',
 'sustain',
 'and',
 'support',
 'each',
 'other',
 ',',
 'forming',
 'a',
 'complex',
 'web',
 'of',
 'life',
 '.',
 'Predators',
 'and',
 'prey',
 ',',
 'plants',
 'and',
 'pollinators',
 ',',
 'and',
 'countless',
 'other',
 'relationships',
 'create',
 'an',
 'environment',
 'where',
 'each',
 'organism',
 'has',
 'a',
 'role',
 'to',
 'play',
 '.',
 'This',
 'interdependence',
 'ensures',
 'the',
 'stability',
 'and',
 'resilience',
 'of',
 'ecosystems',
 ',',
 'allowing',
 'them',
 'to',
 'adapt',
 'to',
 'changes',
 'and',
 'recover',
 'from',
 'disturbances',
 '.',
 'However',
 ',',
 'human',
 'activities',
 'such',
 'as',
 'deforestation',
 ',',
 'pollution',
 ',',
 'and',
 'climate',
 'change',
 

In [74]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

Text Cleaning

In [75]:
word_freq = {}

stop_words = list(STOP_WORDS)


for word in doc:
    if word.text.lower() not in stop_words:
        if word.text.lower() not in punctuation:
          if word.text not in word_freq.keys():
            word_freq[word.text] = 1
          else:
            
            word_freq[word.text] += 1
            
          

In [76]:
word_freq

{'intricate': 2,
 'dance': 1,
 'ecosystems': 3,
 'testament': 1,
 'delicate': 1,
 'balance': 2,
 'nature': 1,
 'forest': 1,
 'ocean': 1,
 'desert': 1,
 'myriad': 1,
 'species': 2,
 'interact': 1,
 'ways': 1,
 'sustain': 1,
 'support': 1,
 'forming': 1,
 'complex': 1,
 'web': 1,
 'life': 1,
 'Predators': 1,
 'prey': 1,
 'plants': 1,
 'pollinators': 1,
 'countless': 2,
 'relationships': 1,
 'create': 1,
 'environment': 1,
 'organism': 1,
 'role': 1,
 'play': 1,
 'interdependence': 1,
 'ensures': 1,
 'stability': 1,
 'resilience': 1,
 'allowing': 1,
 'adapt': 1,
 'changes': 1,
 'recover': 1,
 'disturbances': 1,
 'human': 1,
 'activities': 1,
 'deforestation': 1,
 'pollution': 1,
 'climate': 2,
 'change': 1,
 'threaten': 1,
 'highlighting': 1,
 'urgent': 1,
 'need': 1,
 'conservation': 1,
 'efforts': 1,
 'protect': 1,
 'natural': 1,
 'world': 1,
 'dynamics': 1,
 'Understanding': 1,
 'preserving': 1,
 'crucial': 1,
 'survival': 1,
 'humanity': 1,
 'relies': 1,
 'services': 1,
 'provide': 1,

In [77]:
max_freq = max(word_freq.values())
max_freq

3

In [78]:
for word in word_freq.keys():
    word_freq[word] = word_freq[word]/max_freq

In [79]:
print(word_freq)

{'intricate': 0.6666666666666666, 'dance': 0.3333333333333333, 'ecosystems': 1.0, 'testament': 0.3333333333333333, 'delicate': 0.3333333333333333, 'balance': 0.6666666666666666, 'nature': 0.3333333333333333, 'forest': 0.3333333333333333, 'ocean': 0.3333333333333333, 'desert': 0.3333333333333333, 'myriad': 0.3333333333333333, 'species': 0.6666666666666666, 'interact': 0.3333333333333333, 'ways': 0.3333333333333333, 'sustain': 0.3333333333333333, 'support': 0.3333333333333333, 'forming': 0.3333333333333333, 'complex': 0.3333333333333333, 'web': 0.3333333333333333, 'life': 0.3333333333333333, 'Predators': 0.3333333333333333, 'prey': 0.3333333333333333, 'plants': 0.3333333333333333, 'pollinators': 0.3333333333333333, 'countless': 0.6666666666666666, 'relationships': 0.3333333333333333, 'create': 0.3333333333333333, 'environment': 0.3333333333333333, 'organism': 0.3333333333333333, 'role': 0.3333333333333333, 'play': 0.3333333333333333, 'interdependence': 0.3333333333333333, 'ensures': 0.33

Sentence tokenization

In [80]:
sent_token = [sent for sent in doc.sents]
print(sent_token)

[The intricate dance of ecosystems is a testament to the delicate balance of nature., Within every forest, ocean, and desert, myriad species interact in ways that sustain and support each other, forming a complex web of life., Predators and prey, plants and pollinators, and countless other relationships create an environment where each organism has a role to play., This interdependence ensures the stability and resilience of ecosystems, allowing them to adapt to changes and recover from disturbances., However, human activities such as deforestation, pollution, and climate change threaten this balance, highlighting the urgent need for conservation efforts to protect the natural world and its intricate dynamics., Understanding and preserving these ecosystems is crucial not only for the survival of countless species but also for the well-being of humanity, which relies on the services they provide, from clean air and water to fertile soil and climate regulation.]


In [81]:
sent_score = {}

In [82]:
for sent in sent_token:
    for word in sent:
        if word.text.lower() in word_freq.keys():
            if sent not in sent_score.keys():
                sent_score[sent] = word_freq[word.text.lower()]
            else:
                sent_score[sent] += word_freq[word.text.lower()]

In [83]:
sent_score

{The intricate dance of ecosystems is a testament to the delicate balance of nature.: 3.666666666666667,
 Within every forest, ocean, and desert, myriad species interact in ways that sustain and support each other, forming a complex web of life.: 4.666666666666667,
 Predators and prey, plants and pollinators, and countless other relationships create an environment where each organism has a role to play.: 3.666666666666667,
 This interdependence ensures the stability and resilience of ecosystems, allowing them to adapt to changes and recover from disturbances.: 4.0,
 However, human activities such as deforestation, pollution, and climate change threaten this balance, highlighting the urgent need for conservation efforts to protect the natural world and its intricate dynamics.: 6.999999999999998,
 Understanding and preserving these ecosystems is crucial not only for the survival of countless species but also for the well-being of humanity, which relies on the services they provide, from 

select 30% sentences with maximum score

In [84]:
from heapq import nlargest

In [85]:
summary_length = int(len(sent_token) * 0.3)

Getting the summary

In [86]:
summary_sentences = nlargest(summary_length, sent_score, key=sent_score.get)

In [87]:
summary = ' '.join([sent.text for sent in summary_sentences])


In [88]:

# Print the summary
print("Summary:\n", summary)

# Print the lengths of the original and summarized texts
print("Original text length:", len(text))
print("Summary text length:", len(summary))

Summary:
 Understanding and preserving these ecosystems is crucial not only for the survival of countless species but also for the well-being of humanity, which relies on the services they provide, from clean air and water to fertile soil and climate regulation.
Original text length: 967
Summary text length: 252
