In [1]:
text = """
The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. Because it has achieved significance within the past fifty years, Criteria Consideration G applies. Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA. Unlike the Mercury, Gemini, and Apollo programs, the SSP’s emphasis was on cost effectiveness and reusability, and eventually the construction of a space station. Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions. She had the honor of being chosen as the Return to Flight vehicle after both the Challenger and Columbia accidents. Discovery was the first shuttle to fly with the redesigned SRBs, a result of the Challenger accident, and the first shuttle to fly with the Phase II and Block I SSME. Discovery also carried the Hubble Space Telescope to orbit and performed two of the five servicing missions to the observatory. She flew the first and last dedicated Department of Defense (DoD) missions, as well as the first unclassified defense-related mission. In addition, Discovery was vital to the construction of the International Space Station (ISS); she flew thirteen of the thirty-seven total missions flown to the station by a U.S. Space Shuttle. She was the first orbiter to dock to the ISS, and the first to perform an exchange of a resident crew. Under Criterion C, Discovery is significant as a feat of engineering. According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials. The Space Shuttle orbiter also featured the first reusable TPS; all previous spaceflight vehicles had a single-use, ablative heat shield. Other notable engineering achievements of the orbiter included the first reusable orbital propulsion system, and the first two-fault-tolerant Integrated Avionics System. As Hale stated, the Space Shuttle remains “the largest, fastest, winged hypersonic aircraft in history,” having regularly flown at twenty-five times the speed of sound.
"""

__SpaCy__

In [2]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS as spacy_SW
from string import punctuation as punct
from heapq import nlargest

Завантажимо попередньо навчену модель SpaCy для англійської мови та обробимо заданий текст за допомогою неї:

In [3]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)

Створимо списки стоп-слів та знаків пунктуації:

In [4]:
sp_stopwords = list(spacy_SW)
punctuation = punct + '\n'
print(sp_stopwords)
print(punctuation)

['thereupon', 'toward', 'although', 'every', 'top', 'could', 'your', 'hundred', 'should', 'that', 'hereafter', 'more', 'much', 'whereupon', 'also', 'part', 'will', 'quite', 'even', 'until', 'on', 'fifty', 'some', 'become', 'for', 'there', 'upon', 'to', 'around', 'becomes', 'each', 'an', 'ever', 'third', 'side', 'doing', 'just', 'now', 'whereas', 'everywhere', 'somewhere', 'serious', 'why', 'anyway', 'within', '’ve', 'anyhow', 'becoming', 'never', 'afterwards', 'up', 'get', 'can', 'though', 'well', 'while', 'a', 'any', 'further', 'is', 'it', 'always', 'anyone', '‘s', 'i', 'anything', 'but', 'without', 'have', 'among', 'front', '’ll', 'anywhere', 'when', 'does', 'herself', 'n‘t', 'alone', 'beyond', 'hence', 'was', 'you', 'very', 'again', 'nobody', 'then', 'whoever', 'by', 'its', 'name', 'formerly', 'via', 'ten', "'ve", 'one', 'neither', 'during', 'how', 'yourself', 'seems', 'here', 'do', 'than', 'forty', 'our', 'we', 'where', 'namely', 'mostly', 'such', 'somehow', 'of', 'together', 'exce

Вираховуємо частоти слів у тексті та нормалізуємо їх:

In [5]:
word_frequencies = {}

for word in doc:
    if word.text.lower() not in sp_stopwords and word.text.lower() not in punctuation:
        if word.text not in word_frequencies:
            word_frequencies[word.text] = 1
        else:
            word_frequencies[word.text] += 1

max_frequency = max(word_frequencies.values())

for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word] / max_frequency

Розбиваємо текст на речення та вираховуємо оцінки важливості речень:

In [6]:
sentence_tokens = list(doc.sents)

sentence_scores = {}

for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores:
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]
                
sentence_scores

{
 The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering.: 2.5000000000000004,
 Because it has achieved significance within the past fifty years, Criteria Consideration G applies.: 0.5,
 Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA.: 2.9000000000000004,
 Unlike the Mercury, Gemini, and Apollo programs, the SSP’s emphasis was on cost effectiveness and reusability, and eventually the construction of a space station.: 1.3,
 Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four 

Створимо резюме тексту, виводячи речення у порядку спадання важливості:

In [7]:
select_length = len(sentence_tokens)

summary = nlargest(select_length, sentence_scores, key=sentence_scores.get)
for sentence in summary:
    print(sentence.text)

According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.”
Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials.
In addition, Discovery was vital to the construction of the International Space Station (ISS); she flew thirteen of the thirty-seven total missions flown to the station by a U.S. Space Shuttle.
Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions.
Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles co

__NLTK__

In [8]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from heapq import nlargest

Створимо списки стоп-слів та знаків пунктуації:

In [9]:
nltk.download('stopwords')
nltk.download('punkt')
nltk_stop_words = set(stopwords.words('english'))
print(nltk_stop_words)

{'before', 'during', "doesn't", "wouldn't", 'he', 'themselves', 'how', 'himself', 'wasn', 'wouldn', "needn't", 'as', 'yourself', 'your', "shouldn't", "hasn't", 'her', 'here', 'do', 'than', 'our', 'should', 'shan', 'that', 'we', 'where', 'm', 'their', 'down', 'more', 'the', 'those', 'such', 'nor', 'will', 'after', 'weren', 'of', 'until', 'ain', 'on', "won't", 'd', 'my', "haven't", 'yours', 'no', 'these', 'some', 'did', 'too', "it's", 'for', 'ma', 'theirs', "that'll", 'hadn', 'there', 'shouldn', 'to', 'out', 'each', 'this', "aren't", "she's", 'an', 'over', 'from', 'doing', "don't", "isn't", 'mightn', 'just', 'him', 'now', 'why', 'so', 'below', 'them', 'above', 'were', "hadn't", "didn't", 'doesn', 'against', 'other', 'only', 'they', 'whom', 'his', "weren't", 've', "you're", 'and', 'what', 'who', 'up', 'not', 'll', 'can', 'yourselves', 'through', 'with', 'while', "you'll", 'ourselves', 'a', "you've", 'any', 'is', 'further', 'it', 'off', "couldn't", 'haven', 'i', 'am', 'isn', 'but', 'once',

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Денис\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Денис\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Проведемо токенізацію тексту на слова та речення:

In [10]:
nltk_tokens = word_tokenize(text)
sentences_list = sent_tokenize(text)
print(f'Number of words in text: {len(nltk_tokens)}')
print(f'Number of sentences in text: {len(sentences_list)}')

Number of words in text: 519
Number of sentences in text: 16


Вираховуємо частоти слів у тексті та нормалізуємо їх:

In [11]:
nltk_word_frequencies = {}

for word in nltk_tokens:
    if word.lower() not in nltk_stop_words and word.lower() not in punctuation:
        if word not in nltk_word_frequencies:
            nltk_word_frequencies[word] = 1
        else:
            nltk_word_frequencies[word] += 1

nltk_max_frequency = max(nltk_word_frequencies.values())

for word in nltk_word_frequencies.keys():
    nltk_word_frequencies[word] = nltk_word_frequencies[word] / nltk_max_frequency

Вираховуємо оцінки важливості речень:

In [12]:
nltk_sentence_score = {}

for sent in sentences_list:
    for word in word_tokenize(sent.lower()):
        if word.lower() in nltk_word_frequencies.keys():
            if sent not in nltk_sentence_score:
                nltk_sentence_score[sent] = nltk_word_frequencies[word.lower()]
            else:
                nltk_sentence_score[sent] += nltk_word_frequencies[word.lower()]

nltk_sentence_score

{'\nThe Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering.': 2.4000000000000004,
 'Because it has achieved significance within the past fifty years, Criteria Consideration G applies.': 0.7,
 'Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA.': 3.3000000000000007,
 'Unlike the Mercury, Gemini, and Apollo programs, the SSP’s emphasis was on cost effectiveness and reusability, and eventually the construction of a space station.': 1.4,
 'Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the ot

Створимо резюме тексту, виводячи речення у порядку спадання важливості:

In [13]:
nltk_select_length = len(sentences_list)

nltk_summary = nlargest(nltk_select_length, nltk_sentence_score, key=nltk_sentence_score.get)

for sentence in nltk_summary:
    print(sentence)

According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials.
Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions.
Other notable engineering achievements of the orbiter included the first reusable orbital propulsion system, and the first two-fault-tolerant Integrated Avionics System.
The Space Shuttle orbiter also featured the first reusable TPS; all previous spaceflight vehicles had a single-use, ablat