# Text Summarizer Project Using Natural Language Processing

This project showcases a simple extractive text summarizer tool implemented in Python and Natural Language Toolkit (NLTK). The system takes an input text, cleanses and tokenizes it, and computes the frequency of significant words while disregarding common stopwords. Each sentence is ranked using the normalized frequencies of its words, and the highest-ranked sentences are chosen to produce a brief summary. This method effectively summarizes lengthy articles or paragraphs into brief versions that maintain the salient details, which are simpler and faster to understand the major concepts of the text. The project showcases basic NLP methods like tokenization, stopword elimination, sentence scoring, and ranking, setting the stage for more sophisticated summarization models.

In [None]:
!pip install nltk



In [None]:
import nltk
import re
import heapq
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
text = """Interstellar is a Christopher Nolan movie exploring the vast galaxy through a time and space paradox. The movie primarily focuses on life. Earth is coming to an end and a few astronauts are traveling to find another home at the end of the galaxy.

Han Zimmer has created an excellent background score to cover up the visual effect through the eyes of the camera. The movie has an excellent cast, which includes Matthew McConaughey, Anne Hathaway, and Michael Caine. Well, the movie starts with a family where a Nasa(Matthew McConaughey) astronaut lives with his children and his father-in-law. In the beginning, it will be noticeable that the corn is getting destroyed. Murph, the younger daughter, is always afraid of a ghost living in the house where the ghost always messages Murph with some symbols.

It is likely that there will be a dust storm someday when the ghost will try to communicate with Murph and her family again. Being a NASA astronaut, Cooper will understand the message and he can predict what the ghost is trying to say. Analyzing the pattern of the symbol reveals that it is the location of an unknown place, which was later known to be NASA.
At NASA, the scientist, Dr. Brand (Michel Caine), comes to tell us that the world is going to end and they are looking for some astronaut who will help them to find another world somewhere in the galaxy or another galaxy for the sustainability of the future. Dr. Brand has a team of scientists and a robot (TARS), but they require a pilot with actual practical experience. For the future of Murph and Timothy, Cooper will accept this mission, and he will go to find another world or another home for the future generation. This is the beginning of the movie Interstellar.  As you watch the movie, there are many aspects like the space-time paradox, a parallel world to the theory of relativity. For instance, they will find a planet similar to Earth while travelling on that planet for inspection. When Cooper returns to the spaceship, they will notice that a half hour on the planet is 25 years for Earth, equivalent to most of the life span of children.
At the end of the movie, when Cooper and Amelia Brand (Anne Hathaway) find out that everything was planned by Dr. Brand, and he was trying to save a couple of scientists and astronauts, including his daughter. Hearing this, Cooper wanted to return home to connect with his daughter and son, which was already too late for the timeline. While going through the wormhole, Cooper entered a portal and got connected to the bookshelf in Murph's room, where she used to live, so he tried to connect with Murph through the dust. Later in the movie, Murph, in his older age, realises that the ghost was none other than his father, Cooper.

Interestingly, Nolan has studied with other scientists the design of the black hole. It's not randomly designed. After analysing this with the other scientists, they created this graphical blackhole, which is somewhat similar to a real blackhole when studied in the future. Nolan has used some panoramic camera shots for massive coverage of space, which justifies and suits the character of the movie.

It's a movie that blends science and drama, with good visual effects, a background score, and an excellent cast. Also, every dramatic scene Zimmer made was so good to watch that it sometimes felt like a classic Nolan movie. With an excellent background score, the audience was always reminded that they were getting a vibe of movies like Batman, The Prestige, etc. The movie has won many awards, starting with the Academy Award for best visual effects, the Empire Award for best director and best film. It was truly a mind-boggling movie experience."""


Preprocessing the text input

In [None]:
clean_text = re.sub(r'\[[0-9]*\]', ' ', text)
clean_text = re.sub(r'\s+', ' ', clean_text)
formatted_text = re.sub('[^a-zA-Z]', ' ', clean_text)
formatted_text = re.sub(r'\s+', ' ', formatted_text)

Tokenization and word frequencies

In [None]:
stop_words = set(stopwords.words('english'))
word_frequencies = {}
for word in word_tokenize(formatted_text.lower()):
    if word not in stop_words:
        if word not in word_frequencies:
            word_frequencies[word] = 1
        else:
            word_frequencies[word] += 1


Normalizing word frequencies

In [None]:
max_freq = max(word_frequencies.values())
for word in word_frequencies:
    word_frequencies[word] = word_frequencies[word] / max_freq

Score sentences

In [None]:
sentence_scores = {}
sentences = sent_tokenize(text)
for sentence in sentences:
    for word in word_tokenize(sentence.lower()):
        if word in word_frequencies:
            if len(sentence.split(' ')) < 30:
                if sentence not in sentence_scores:
                    sentence_scores[sentence] = word_frequencies[word]
                else:
                    sentence_scores[sentence] += word_frequencies[word]



Output

In [None]:
import textwrap

summary = ' '.join(summary_sentences)
wrapped_summary = textwrap.fill(summary, width=60)

print("Summary:\n",wrapped_summary)


Summary:
 For the future of Murph and Timothy, Cooper will accept this
mission, and he will go to find another world or another
home for the future generation. At the end of the movie,
when Cooper and Amelia Brand (Anne Hathaway) find out that
everything was planned by Dr. Later in the movie, Murph, in
his older age, realises that the ghost was none other than
his father, Cooper. Murph, the younger daughter, is always
afraid of a ghost living in the house where the ghost always
messages Murph with some symbols. The movie has won many
awards, starting with the Academy Award for best visual
effects, the Empire Award for best director and best film.
