Text Summarization

In [1]:

text = """The art of bonsai, a practice that dates back over a thousand years, is a unique form of horticulture that combines meticulous care with artistic expression. Originating in China and later refined in Japan, bonsai involves growing small trees in containers, carefully shaping them to mimic the appearance of full-sized trees in nature. This process requires not only regular pruning and wiring of the branches but also an understanding of the tree's growth patterns and needs. Each bonsai tree tells a story, reflecting the seasons and the passage of time, making it a living piece of art that evolves with its caretaker. The patience and dedication required to cultivate bonsai have made it a beloved and meditative hobby for enthusiasts around the world."""

In [2]:
len(text)

756

Importing the libaries and dataset

In [3]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation


In [4]:
nlp = spacy.load("en_core_web_sm")

In [5]:
doc = nlp(text)

In [6]:
tokens = [token.text for token in doc]
tokens

['The',
 'art',
 'of',
 'bonsai',
 ',',
 'a',
 'practice',
 'that',
 'dates',
 'back',
 'over',
 'a',
 'thousand',
 'years',
 ',',
 'is',
 'a',
 'unique',
 'form',
 'of',
 'horticulture',
 'that',
 'combines',
 'meticulous',
 'care',
 'with',
 'artistic',
 'expression',
 '.',
 'Originating',
 'in',
 'China',
 'and',
 'later',
 'refined',
 'in',
 'Japan',
 ',',
 'bonsai',
 'involves',
 'growing',
 'small',
 'trees',
 'in',
 'containers',
 ',',
 'carefully',
 'shaping',
 'them',
 'to',
 'mimic',
 'the',
 'appearance',
 'of',
 'full',
 '-',
 'sized',
 'trees',
 'in',
 'nature',
 '.',
 'This',
 'process',
 'requires',
 'not',
 'only',
 'regular',
 'pruning',
 'and',
 'wiring',
 'of',
 'the',
 'branches',
 'but',
 'also',
 'an',
 'understanding',
 'of',
 'the',
 'tree',
 "'s",
 'growth',
 'patterns',
 'and',
 'needs',
 '.',
 'Each',
 'bonsai',
 'tree',
 'tells',
 'a',
 'story',
 ',',
 'reflecting',
 'the',
 'seasons',
 'and',
 'the',
 'passage',
 'of',
 'time',
 ',',
 'making',
 'it',
 'a',

In [7]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

Text Cleaning

In [8]:
word_freq = {}

stop_words = list(STOP_WORDS)


for word in doc:
    if word.text.lower() not in stop_words:
        if word.text.lower() not in punctuation:
          if word.text not in word_freq.keys():
            word_freq[word.text] = 1
          else:
            
            word_freq[word.text] += 1
            
          

In [9]:
word_freq

{'art': 2,
 'bonsai': 4,
 'practice': 1,
 'dates': 1,
 'thousand': 1,
 'years': 1,
 'unique': 1,
 'form': 1,
 'horticulture': 1,
 'combines': 1,
 'meticulous': 1,
 'care': 1,
 'artistic': 1,
 'expression': 1,
 'Originating': 1,
 'China': 1,
 'later': 1,
 'refined': 1,
 'Japan': 1,
 'involves': 1,
 'growing': 1,
 'small': 1,
 'trees': 2,
 'containers': 1,
 'carefully': 1,
 'shaping': 1,
 'mimic': 1,
 'appearance': 1,
 'sized': 1,
 'nature': 1,
 'process': 1,
 'requires': 1,
 'regular': 1,
 'pruning': 1,
 'wiring': 1,
 'branches': 1,
 'understanding': 1,
 'tree': 2,
 'growth': 1,
 'patterns': 1,
 'needs': 1,
 'tells': 1,
 'story': 1,
 'reflecting': 1,
 'seasons': 1,
 'passage': 1,
 'time': 1,
 'making': 1,
 'living': 1,
 'piece': 1,
 'evolves': 1,
 'caretaker': 1,
 'patience': 1,
 'dedication': 1,
 'required': 1,
 'cultivate': 1,
 'beloved': 1,
 'meditative': 1,
 'hobby': 1,
 'enthusiasts': 1,
 'world': 1}

In [10]:
max_freq = max(word_freq.values())
max_freq

4

In [11]:
for word in word_freq.keys():
    word_freq[word] = word_freq[word]/max_freq

In [12]:
print(word_freq)

{'art': 0.5, 'bonsai': 1.0, 'practice': 0.25, 'dates': 0.25, 'thousand': 0.25, 'years': 0.25, 'unique': 0.25, 'form': 0.25, 'horticulture': 0.25, 'combines': 0.25, 'meticulous': 0.25, 'care': 0.25, 'artistic': 0.25, 'expression': 0.25, 'Originating': 0.25, 'China': 0.25, 'later': 0.25, 'refined': 0.25, 'Japan': 0.25, 'involves': 0.25, 'growing': 0.25, 'small': 0.25, 'trees': 0.5, 'containers': 0.25, 'carefully': 0.25, 'shaping': 0.25, 'mimic': 0.25, 'appearance': 0.25, 'sized': 0.25, 'nature': 0.25, 'process': 0.25, 'requires': 0.25, 'regular': 0.25, 'pruning': 0.25, 'wiring': 0.25, 'branches': 0.25, 'understanding': 0.25, 'tree': 0.5, 'growth': 0.25, 'patterns': 0.25, 'needs': 0.25, 'tells': 0.25, 'story': 0.25, 'reflecting': 0.25, 'seasons': 0.25, 'passage': 0.25, 'time': 0.25, 'making': 0.25, 'living': 0.25, 'piece': 0.25, 'evolves': 0.25, 'caretaker': 0.25, 'patience': 0.25, 'dedication': 0.25, 'required': 0.25, 'cultivate': 0.25, 'beloved': 0.25, 'meditative': 0.25, 'hobby': 0.25,

Sentence tokenization

In [13]:
sent_token = [sent for sent in doc.sents]
print(sent_token)

[The art of bonsai, a practice that dates back over a thousand years, is a unique form of horticulture that combines meticulous care with artistic expression., Originating in China and later refined in Japan, bonsai involves growing small trees in containers, carefully shaping them to mimic the appearance of full-sized trees in nature., This process requires not only regular pruning and wiring of the branches but also an understanding of the tree's growth patterns and needs., Each bonsai tree tells a story, reflecting the seasons and the passage of time, making it a living piece of art that evolves with its caretaker., The patience and dedication required to cultivate bonsai have made it a beloved and meditative hobby for enthusiasts around the world.]


In [14]:
sent_score = {}

In [15]:
for sent in sent_token:
    for word in sent:
        if word.text.lower() in word_freq.keys():
            if sent not in sent_score.keys():
                sent_score[sent] = word_freq[word.text.lower()]
            else:
                sent_score[sent] += word_freq[word.text.lower()]

In [16]:
sent_score

{The art of bonsai, a practice that dates back over a thousand years, is a unique form of horticulture that combines meticulous care with artistic expression.: 4.5,
 Originating in China and later refined in Japan, bonsai involves growing small trees in containers, carefully shaping them to mimic the appearance of full-sized trees in nature.: 5.0,
 This process requires not only regular pruning and wiring of the branches but also an understanding of the tree's growth patterns and needs.: 3.0,
 Each bonsai tree tells a story, reflecting the seasons and the passage of time, making it a living piece of art that evolves with its caretaker.: 4.75,
 The patience and dedication required to cultivate bonsai have made it a beloved and meditative hobby for enthusiasts around the world.: 3.25}

select 30% sentences with maximum score

In [17]:
from heapq import nlargest

In [18]:
summary_length = int(len(sent_token) * 0.3)

Getting the summary

In [19]:
summary_sentences = nlargest(summary_length, sent_score, key=sent_score.get)

In [20]:
summary = ' '.join([sent.text for sent in summary_sentences])


In [21]:

# Print the summary
print("Summary:\n", summary)

# Print the lengths of the original and summarized texts
print("Original text length:", len(text))
print("Summary text length:", len(summary))

Summary:
 Originating in China and later refined in Japan, bonsai involves growing small trees in containers, carefully shaping them to mimic the appearance of full-sized trees in nature.
Original text length: 756
Summary text length: 177
