In [39]:
import spacy
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from collections import Counter

#from string import punctuation

import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)

# Load SpaCy's English model
nlp = spacy.load('en_core_web_sm')

# # Download NLTK's stopword list
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('popular')

In [20]:
text = """
The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. Because it has achieved significance within the past fifty years, Criteria Consideration G applies. Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA. Unlike the Mercury, Gemini, and Apollo programs, the SSP’s emphasis was on cost effectiveness and reusability, and eventually the construction of a space station. Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions. She had the honor of being chosen as the Return to Flight vehicle after both the Challenger and Columbia accidents. Discovery was the first shuttle to fly with the redesigned SRBs, a result of the Challenger accident, and the first shuttle to fly with the Phase II and Block I SSME. Discovery also carried the Hubble Space Telescope to orbit and performed two of the five servicing missions to the observatory. She flew the first and last dedicated Department of Defense (DoD) missions, as well as the first unclassified defense-related mission. In addition, Discovery was vital to the construction of the International Space Station (ISS); she flew thirteen of the thirty-seven total missions flown to the station by a U.S. Space Shuttle. She was the first orbiter to dock to the ISS, and the first to perform an exchange of a resident crew. Under Criterion C, Discovery is significant as a feat of engineering. According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials. The Space Shuttle orbiter also featured the first reusable TPS; all previous spaceflight vehicles had a single-use, ablative heat shield. Other notable engineering achievements of the orbiter included the first reusable orbital propulsion system, and the first two-fault-tolerant Integrated Avionics System. As Hale stated, the Space Shuttle remains “the largest, fastest, winged hypersonic aircraft in history,” having regularly flown at twenty-five times the speed of sound.
"""

In [40]:
# Tokenize the text into sentences
sentences = sent_tokenize(text)

# Tokenize the text into words and remove stopwords
stop_words = set(stopwords.words('english'))
words = [word_tokenize(sent) for sent in sentences]
punctuation += '\n'

filtered_words = []
for word_list in words:
    filtered_word_list = []
    for word in word_list:
        # Конвертуємо слово до нижнього регістру і перевіряємо чи це не стопслово 
        if word.lower() not in stop_words:
            filtered_word_list.append(word)
    # Додаємо відфільтрований список слів кожного речення до основного списку
    filtered_words.append(filtered_word_list)

#filtered_words = [[word for word in word_list if word.lower() not in stop_words] for word_list in words]

In [22]:
# Flatten the list of words and count the frequency
all_words = [word for sublist in filtered_words for word in sublist]
word_frequencies = Counter(all_words)

# Consider the most common 5 words as key words
key_words = [word for word, freq in word_frequencies.most_common(5)]

In [23]:
sentence_scores = {}
for sent in sentences:
    for word in word_tokenize(sent.lower()):
        if word not in punctuation:
            if word in key_words:
                if sent not in sentence_scores:
                    sentence_scores[sent] = 1
                else:
                    sentence_scores[sent] += 1

In [30]:
# Select top 5 sentences for summary
summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:5]
summary = ' '.join(summary_sentences)
print(summary)



The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials. According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program 

In [31]:
"""
Process: This method sorts all the sentences based on their scores in descending order and then slices the list to get the top 5 sentences.
Efficiency: For a small number of sentences, this method is sufficiently efficient. However, its efficiency decreases as the size of the data increases because it sorts the entire list before slicing.
Use Case: Best for smaller datasets where sorting the entire list doesn't have a significant performance impact.
"""

"\nProcess: This method sorts all the sentences based on their scores in descending order and then slices the list to get the top 5 sentences.\nEfficiency: For a small number of sentences, this method is sufficiently efficient. However, its efficiency decreases as the size of the data increases because it sorts the entire list before slicing.\nUse Case: Best for smaller datasets where sorting the entire list doesn't have a significant performance impact.\n"


  " Process Text with SpaCy"   
  

In [41]:
from heapq import nlargest

# Токенізація
doc = nlp(text)

punctuation += '\n'

filtered_sentences = []
for sent in doc.sents:
    filtered_words = [token.text for token in sent if token.text not in punctuation and not token.is_stop]
    filtered_sentences.append(filtered_words)

print(filtered_words)

['Hale', 'stated', 'Space', 'Shuttle', 'remains', '“', 'largest', 'fastest', 'winged', 'hypersonic', 'aircraft', 'history', '”', 'having', 'regularly', 'flown', 'times', 'speed', 'sound']


In [42]:
sentence_scores = {}
for sent, filtered_words in zip(doc.sents, filtered_sentences):
    # Scoring: Count of filtered words in the sentence
    score = len(filtered_words)
    sentence_scores[sent.text] = score

In [43]:
N = 5  # Кількість речень у summary
top_sentences = nlargest(N, sentence_scores, key=sentence_scores.get)

summary = ' '.join(top_sentences)
print("Summary:\n", summary)


Summary:
 
The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials. Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American spac

In [44]:
"""
Process: This method directly extracts the 'n' largest elements based on their scores without sorting the entire list.
Efficiency: nlargest is more efficient than sorting the entire list, especially when the list is large, but you need only a few elements from it. This is because nlargest uses a heap data structure internally, which is optimized for such operations.
Use Case: Best for larger datasets where extracting a few elements from a large list is required.
"""

"\nProcess: This method directly extracts the 'n' largest elements based on their scores without sorting the entire list.\nEfficiency: nlargest is more efficient than sorting the entire list, especially when the list is large, but you need only a few elements from it. This is because nlargest uses a heap data structure internally, which is optimized for such operations.\nUse Case: Best for larger datasets where extracting a few elements from a large list is required.\n"