<a href="https://colab.research.google.com/github/data-tamer2410/ds-text-summarization/blob/main/text_summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task.

Make a summary of the corresponding text using the librarie for NLP: nltk.

# Solving task.

In [79]:
import nltk
import string
from heapq import nlargest
from langdetect import detect
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize,sent_tokenize

## Downloading the necessary resources and text.

In [80]:
nltk.download('punkt_tab') # Resource for tokenization of text.
nltk.download('stopwords') # Download stop words.

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [81]:
# Loading text.
with open('/content/text.txt','r',encoding='utf-8') as f:
    text = f.read()

## Text analysis.

In [82]:
# Definition of the language of the text.
print(detect(text))

en


In [83]:
# We will get tokens words, tokens sentences, punctuation symbols and stop words for english language.
tokens = word_tokenize(text)
sentence_tokens = sent_tokenize(text)
punctuation = string.punctuation + "‘’“”„“\n"
stop_words = set(stopwords.words('english'))

In [84]:
# Let's count the frequency of occurrence of words in the text.
clear_words = []
for word in tokens:
    if word.lower() not in stop_words:
        if word.lower() not in punctuation:
            clear_words.append(word)

word_frequencies = Counter(clear_words)
word_frequencies.most_common()

[('Space', 10),
 ('first', 10),
 ('Discovery', 7),
 ('Shuttle', 6),
 ('orbiter', 6),
 ('Criterion', 4),
 ('missions', 4),
 ('space', 3),
 ('flew', 3),
 ('also', 3),
 ('fly', 3),
 ('reusable', 3),
 ('U.S.', 2),
 ('Program', 2),
 ('C', 2),
 ('significant', 2),
 ('vehicles', 2),
 ('SSP', 2),
 ('five', 2),
 ('orbiters', 2),
 ('construction', 2),
 ('station', 2),
 ('times', 2),
 ('Challenger', 2),
 ('shuttle', 2),
 ('ISS', 2),
 ('flown', 2),
 ('engineering', 2),
 ('Hale', 2),
 ('winged', 2),
 ('hypersonic', 2),
 ('cargo-carrying', 2),
 ('base', 2),
 ('aircraft', 2),
 ('materials', 2),
 ('featured', 2),
 ('System', 2),
 ('TPS', 2),
 ('Orbiter', 1),
 ('OV-103', 1),
 ('considered', 1),
 ('eligible', 1),
 ('listing', 1),
 ('National', 1),
 ('Register', 1),
 ('Historic', 1),
 ('Places', 1),
 ('NRHP', 1),
 ('context', 1),
 ('1969-2011', 1),
 ('areas', 1),
 ('Exploration', 1),
 ('Transportation', 1),
 ('area', 1),
 ('Engineering', 1),
 ('achieved', 1),
 ('significance', 1),
 ('within', 1),
 ('past

In [85]:
# Calculation of scores for each sentence, based on the frequency of words in the text.
sentence_scores = {}
for sentence in sentence_tokens:
    score = 0
    for word in word_tokenize(sentence):
        score += word_frequencies.get(word,0)
    sentence_scores[sentence] = score

In [89]:
# Creating a summary of the text from the 5 most important sentences.
summary = nlargest(5, sentence_scores, key=sentence_scores.get)
for sent in summary:
    print(sent)

According to Wayne Hale, a flight director from Johnson Space Center, the Space Shuttle orbiter represents a “huge technological leap from expendable rockets and capsules to a reusable, winged, hypersonic, cargo-carrying spacecraft.” Although her base structure followed a conventional aircraft design, she used advanced materials that both minimized her weight for cargo-carrying purposes and featured low thermal expansion ratios, which provided a stable base for her Thermal Protection System (TPS) materials.
The Orbiter Discovery, OV-103, is considered eligible for listing in the National Register of Historic Places (NRHP) in the context of the U.S. Space Shuttle Program (1969-2011) under Criterion A in the areas of Space Exploration and Transportation and under Criterion C in the area of Engineering.
Under Criterion A, Discovery is significant as the oldest of the three extant orbiter vehicles constructed for the Space Shuttle Program (SSP), the longest running American space program t

## Conclusion.

#### **English:**

---

In this project, I utilized the Natural Language Processing (NLP) library NLTK to process and summarize a given text. By tokenizing the text into words and sentences, filtering out stop words and punctuation, and calculating word frequencies, I was able to assign scores to sentences based on the relevance of the words they contained. The highest-scoring sentences were selected to create a concise summary of the text.

#### **Ukrainian:**

---

У цьому проекті я використав бібліотеку обробки природної мови (NLP) NLTK для обробки та створення резюме тексту. Токенізуючи текст на слова та речення, фільтруючи стоп-слова і пунктуацію, а також обчислюючи частоту слів, я зміг призначити бали реченням на основі значущості слів, які вони містили. Речення з найвищими балами були вибрані для створення короткого резюме тексту.