In [None]:
text = """Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular such as by writing grammars or devising heuristic rules for stemming.
Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach:
both statistical and neural networks methods can focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare cases and common ones equally.
language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce.
the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems.
Although rule-based systems for manipulating symbols were still in use in 2020, they have become mostly obsolete with the advance of LLMs in 2023.
Before that they were commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for postprocessing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses."""

In [None]:
# text = """Johannes Gutenberg (1398 – 1468) was a German goldsmith and publisher who introduced printing to Europe. His introduction of mechanical movable type printing to Europe started the Printing Revolution and is widely regarded as the most important event of the modern period. It played a key role in the scientific revolution and laid the basis for the modern knowledge-based economy and the spread of learning to the masses.“Alice’s Adventures in the Wonderland” is a very popular book for children that adults can also enjoy reading. The author is Charles Dodson, an English writer who published the book in 1865 under the pseudonym of Lewis Carroll.Through its fantasy story and main theme: “a growing girl exploring the wonders of world“, the book has been very influential over the years, both in popular culture and in literature. The book is an enigmatic work, and over the years, readers have been puzzled by the language and the logic of Wonderland.The protagonist of the book is Alice, a seven years old girl who must find her way in a strange world called “Wonderland”. During her magic journey through Wonderland, Alice encounters peculiar human-like creatures or talking animals: the White Rabbit, the Caterpillar, the Cheshire Cat, the Mad Hatter, and the Dormouse.The White Rabbit is Alice’s guide and he leads her on many places and adventures through the book. The always hurrying rabbit is a symbol of forever running timeThe smiling Cheshire Cat, who can disappear and reappear, is the only character in the entire novel who listens to Alice. The Cat is giving “advice” to Alice and teaches her the strange rules leading the world she is traveling through.The Cheshire-Cat’s smile is a metaphor of Wonderland’s magic and it is as famous and enigmatic as Mona Lisa’s smile.Each character teaches Alice something about life and growing up in a hazardous world. Every object or setting in “Alice in the Wonderland” functions as a symbol and often the symbols work together to convey a particular meaning to a scene.Through an intricate symbolism, Lewis Carroll suggests the complexity of life.This could be the message learned by Alice in her magic, initiatory journey: don’t try to find meaning in all the situations you encounter in the “wonder” world of life, don’t give up and continue your way."""

In [None]:
len(text)

1777

In [None]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation as str_punc

In [None]:
stop_words = list(STOP_WORDS)

In [None]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


In [None]:
nlp = spacy.load("en_core_web_lg")

In [None]:
doc = nlp(text)

In [None]:
tokens = [token.text for token in doc]

In [None]:
word_frequency = {}
for word in doc:
  if word.text.lower() not in stop_words:
    if word.text.lower() not in str_punc:
      if word.text not in word_frequency.keys():
        word_frequency[word.text] = 1
      else:
        word_frequency[word.text] += 1


In [None]:
max_frequency = max(word_frequency.values())

In [None]:
for word in word_frequency.keys():
  word_frequency[word] = word_frequency[word]/max_frequency

In [None]:
print(word_frequency)

{'Symbolic': 0.16666666666666666, 'approach': 0.6666666666666666, 'i.e.': 0.16666666666666666, 'hand': 0.3333333333333333, 'coding': 0.16666666666666666, 'set': 0.16666666666666666, 'rules': 0.6666666666666666, 'manipulating': 0.3333333333333333, 'symbols': 0.3333333333333333, 'coupled': 0.16666666666666666, 'dictionary': 0.16666666666666666, 'lookup': 0.16666666666666666, 'historically': 0.16666666666666666, 'AI': 0.16666666666666666, 'general': 0.16666666666666666, 'NLP': 0.5, 'particular': 0.16666666666666666, 'writing': 0.16666666666666666, 'grammars': 0.16666666666666666, 'devising': 0.16666666666666666, 'heuristic': 0.16666666666666666, 'stemming': 0.16666666666666666, '\n': 1.0, 'Machine': 0.16666666666666666, 'learning': 0.3333333333333333, 'approaches': 0.16666666666666666, 'include': 0.16666666666666666, 'statistical': 0.5, 'neural': 0.5, 'networks': 0.5, 'advantages': 0.16666666666666666, 'symbolic': 0.16666666666666666, 'methods': 0.5, 'focus': 0.16666666666666666, 'common'

In [None]:
sentence_tokens = [sentence for sentence in doc.sents]
print(sentence_tokens)

[Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular such as by writing grammars or devising heuristic rules for stemming.
, Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach:
both statistical and neural networks methods can focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare cases and common ones equally.
language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce.
, the larger 

In [None]:
sentence_score = {}
for sentence in sentence_tokens:
  for word in sentence:
    if word.text.lower( ) in word_frequency.keys():
      if sentence not in sentence_score.keys():
        sentence_score[sentence] = word_frequency[word.text.lower()]
      else:
        sentence_score[sentence] += word_frequency[word.text.lower()]

In [None]:
print(sentence_score)

{Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular such as by writing grammars or devising heuristic rules for stemming.
: 7.166666666666669, Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach:
both statistical and neural networks methods can focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare cases and common ones equally.
language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to pro

In [None]:
from heapq import nlargest                 # Heap queue algorithm

In [None]:
len(sentence_score)  * 0.2                 # select 20 percent of high score sentences

1.0

In [None]:
summary = nlargest(n = 1, iterable = sentence_score, key = sentence_score.get)


In [None]:
final_summary = [word.text for word in summary]

In [None]:
final_summary = "".join(final_summary)

In [None]:
print(final_summary)

Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach:
both statistical and neural networks methods can focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare cases and common ones equally.
language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce.



In [None]:
len(final_summary)

707