<a href="https://colab.research.google.com/github/dspuliaiev/Data_Science/blob/master/Hw12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from heapq import nlargest
import spacy

# Завантаження англійських стоп-слів з корпусу NLTK
nltk.download('stopwords')
nltk.download('punkt')

# Визначення тексту для створення анотації
text = ('The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places '
        '(NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space '
        'Exploration and Transportation and under Criterion C in the area of Engineering. Because it has achieved '
        'significance within the past fifty years, Criteria Consideration G applies. Under Criterion A, Discovery is '
        'significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), '
        'the longest running American space program to date; she was the third of five orbiters built by NASA. Unlike '
        'the Mercury, Gemini, and Apollo programs, the SSP’s emphasis was on cost effectiveness and reusability, and '
        'eventually the construction of a space station. Including her maiden voyage (launched August 30, 1984), Discovery '
        'flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly '
        'twenty missions. She had the honor of being chosen as the Return to Flight vehicle after both the Challenger and '
        'Columbia accidents. Discovery was the first shuttle to fly with the redesigned SRBs, a result of the Challenger '
        'accident, and the first shuttle to fly with the Phase II and Block I SSME. Discovery also carried the Hubble '
        'Space Telescope to orbit and performed two of the five servicing missions to the observatory. She flew the first '
        'and last missions for the SSP. After her retirement, she was transferred to the Steven F. Udvar-Hazy Center in '
        'Virginia where she remains on display. In total, Discovery spent 365 days in space, traveling 148,221,675 miles, '
        'and completed 5,830 orbits around Earth.')

# Токенізація тексту на речення
sentences = sent_tokenize(text)  # розбиває текст на речення

# Видалення стоп-слів та створення частотного розподілу
stop_words = set(stopwords.words('english'))  # завантаження англійських стоп-слів

word_frequencies = {}  # порожній словник для зберігання частот слів
for word in nltk.word_tokenize(text):  # токенізує текст на окремі слова
    if word.lower() not in stop_words:  # перевірка, чи слово не є стоп-словом
        if word.lower() not in word_frequencies:
            word_frequencies[word.lower()] = 1  # додавання нового слова до словника
        else:
            word_frequencies[word.lower()] += 1  # збільшення частоти слова

# Нормалізація частот слів
maximum_frequncy = max(word_frequencies.values())  # знаходження максимальної частоти слова

for word in word_frequencies.keys():
    word_frequencies[word] = (word_frequencies[word] / maximum_frequncy)  # нормалізація частоти кожного слова

# Підрахунок ваги речень
sentence_scores = {}  # порожній словник для зберігання оцінок речень
for sent in sentences:
    for word in nltk.word_tokenize(sent.lower()):  # токенізація речень на слова
        if word in word_frequencies.keys():  # перевірка, чи слово є у словнику частот
            if sent not in sentence_scores:
                sentence_scores[sent] = word_frequencies[word]  # додавання речення до словника з початковою оцінкою
            else:
                sentence_scores[sent] += word_frequencies[word]  # збільшення оцінки речення

# Створення анотації
summary_sentences = nlargest(3, sentence_scores, key=sentence_scores.get)  # вибір трьох найважливіших речень
summary = ' '.join(summary_sentences)  # об'єднання вибраних речень у підсумковий текст



# Використання SpaCy для створення анотації
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

# Токенізація тексту на речення
sentences_spacy = list(doc.sents)  # створення списку речень за допомогою SpaCy

# Підрахунок ваги речень за допомогою SpaCy
sentence_scores_spacy = {sent: len(sent) for sent in sentences_spacy}  # оцінка речень за їхньою довжиною

# Створення анотації за допомогою SpaCy
summary_sentences_spacy = nlargest(3, sentence_scores_spacy, key=sentence_scores_spacy.get)  # вибір трьох найважливіших речень
summary_spacy = ' '.join([sent.text for sent in summary_sentences_spacy])  # об'єднання вибраних речень у підсумковий текст


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [4]:
print("Summary (NLTK):")
print(summary)

Summary (NLTK):
Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions. The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA.


In [5]:
print("Summary (SpaCy):")
print(summary_spacy)

Summary (SpaCy):
The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering. Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program to date; she was the third of five orbiters built by NASA. Including her maiden voyage (launched August 30, 1984), Discovery flew to space thirty-nine times, more than any of the other four orbiters; she was also the first orbiter to fly twenty missions.
