## Name / Ahmed Samy Mohamed Eldawody

# **Text Summarization using NLP**


**What is text summarization?**

Text summarization is the process of distilling the most important information from a source text.

**Why automatic text summarization?**



1.   Summaries reduce reading time.
2.   When researching documents,summaries make the  selection process easier.
3.   Automatic summarization improves the effectiveness of indexing.
4.   Automatice summarization algorithms are less biased than human summarization.
5.   Personalized summaries are useful in question-answering systems as they provied personalized information.
6.   Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of text documents they are able to process.





# **Type of summarization**

![alt text](https://drive.google.com/uc?id=1AqwSGEpi3vzAOLVt_5XXRXokZHvcn43B)



**How to do text summarization**


*   Text cleaning
*   Sentence tokenization
*   Word tokenzation
*   Word-frequency table
*   Summarization 
 
 

  **Text variable**








In [1]:
#this is the text we want to summarization 
text = """
 Maria Sharapova has basically no friends as tennis players on the WTA Tour. The Russian player has no problems in openly speaking about it and in a recent interview she said: 'I don't really hide any feelings too much. 
 I think everyone knows this is my job here. When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
 So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
 I'm a pretty competitive girl. I say my hellos, but I'm not sending any players flowers as well. Uhm, I'm not really friendly or close to many players.
 I have not a lot of friends away from the courts.' When she said she is not really close to a lot of players, is that something strategic that she is doing? Is it different on the men's tour than the women's tour? 'No, not at all.
 I think just because you're in the same sport doesn't mean that you have to be friends with everyone just because you're categorized, you're a tennis player, so you're going to get along with tennis players. 
 I think every person has different interests. I have friends that have completely different jobs and interests, and I've met them in very different parts of my life.
 I think everyone just thinks because we're tennis players we should be the greatest of friends. But ultimately tennis is just a very small part of what we do. 
 There are so many other things that we're interested in, that we do.'
 """



# Let's Get Started with SpaCy

In [2]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
import warnings
warnings.filterwarnings('ignore')

In [3]:
#put stop words in a list and show them
stopwords = list(STOP_WORDS)

In [4]:
print(stopwords)

['whatever', 'call', 'whereas', 'latter', 'everything', 'been', 'what', 'thence', 'twelve', 'per', 'because', '’d', 'alone', 'beside', 'i', 'by', 'enough', 'everyone', 'why', 'me', 'yet', '‘ve', 'hereafter', 'through', 'bottom', 'being', 'becomes', '’m', 'further', 'meanwhile', 'formerly', "'ll", 'third', 'three', 'both', 'sixty', 'be', 'regarding', 'towards', 'those', 'up', 'of', 'below', 'too', 'from', 'six', 'mine', 'either', 'on', 'but', 'cannot', 'amongst', 'thereby', 'behind', 'upon', 'indeed', 'nor', 'becoming', 'have', 'again', '’re', 'noone', 'all', 'him', 'was', 'former', 'them', 'already', '’ve', 'although', 'since', 'least', 'these', 'whole', 'please', 'every', 'there', 'empty', 'herself', 'move', 'across', 'show', 'nevertheless', 'to', 'anywhere', 'rather', 'above', 'yourselves', 'whether', 'ever', 'seeming', 'whereafter', 'how', 'themselves', 'give', 'seemed', 'due', 'throughout', '‘m', 'namely', 'make', 'does', 'your', 'otherwise', 'therefore', '‘d', 'none', 'until', 'so

In [5]:
#using english core with library spacy
nlp = spacy.load('en_core_web_sm')

In [6]:
doc = nlp(text)

In [7]:
#split the text with token
tokens = [token.text for token in doc]
print(tokens)

['\n ', 'Maria', 'Sharapova', 'has', 'basically', 'no', 'friends', 'as', 'tennis', 'players', 'on', 'the', 'WTA', 'Tour', '.', 'The', 'Russian', 'player', 'has', 'no', 'problems', 'in', 'openly', 'speaking', 'about', 'it', 'and', 'in', 'a', 'recent', 'interview', 'she', 'said', ':', "'", 'I', 'do', "n't", 'really', 'hide', 'any', 'feelings', 'too', 'much', '.', '\n ', 'I', 'think', 'everyone', 'knows', 'this', 'is', 'my', 'job', 'here', '.', 'When', 'I', "'m", 'on', 'the', 'courts', 'or', 'when', 'I', "'m", 'on', 'the', 'court', 'playing', ',', 'I', "'m", 'a', 'competitor', 'and', 'I', 'want', 'to', 'beat', 'every', 'single', 'person', 'whether', 'they', "'re", 'in', 'the', 'locker', 'room', 'or', 'across', 'the', 'net', '.', '\n ', 'So', 'I', "'m", 'not', 'the', 'one', 'to', 'strike', 'up', 'a', 'conversation', 'about', 'the', 'weather', 'and', 'know', 'that', 'in', 'the', 'next', 'few', 'minutes', 'I', 'have', 'to', 'go', 'and', 'try', 'to', 'win', 'a', 'tennis', 'match', '.', '\n ',

In [8]:
#show the punctuation in the text 
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [9]:
#this loop to put words in the text in set without stopwords or punctuation 
word_frequencies = {}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text] = 1
      else:
        word_frequencies[word.text] += 1

In [10]:
print(word_frequencies)

{'\n ': 10, 'Maria': 1, 'Sharapova': 1, 'basically': 1, 'friends': 5, 'tennis': 6, 'players': 6, 'WTA': 1, 'Tour': 1, 'Russian': 1, 'player': 2, 'problems': 1, 'openly': 1, 'speaking': 1, 'recent': 1, 'interview': 1, 'said': 2, 'hide': 1, 'feelings': 1, 'think': 4, 'knows': 1, 'job': 1, 'courts': 2, 'court': 1, 'playing': 1, 'competitor': 1, 'want': 1, 'beat': 1, 'single': 1, 'person': 2, 'locker': 1, 'room': 1, 'net': 1, 'strike': 1, 'conversation': 1, 'weather': 1, 'know': 1, 'minutes': 1, 'try': 1, 'win': 1, 'match': 1, 'pretty': 1, 'competitive': 1, 'girl': 1, 'hellos': 1, 'sending': 1, 'flowers': 1, 'Uhm': 1, 'friendly': 1, 'close': 2, 'lot': 2, 'away': 1, 'strategic': 1, 'different': 4, 'men': 1, 'tour': 2, 'women': 1, 'sport': 1, 'mean': 1, 'categorized': 1, 'going': 1, 'interests': 2, 'completely': 1, 'jobs': 1, 'met': 1, 'parts': 1, 'life': 1, 'thinks': 1, 'greatest': 1, 'ultimately': 1, 'small': 1, 'things': 1, 'interested': 1}


In [11]:
#Find out the most frequent words
max_frequency = max(word_frequencies.values())

In [12]:
max_frequency

10

In [13]:
##this loop to know the weight of each word in the text
for word in word_frequencies.keys():
  word_frequencies[word] = word_frequencies[word]/max_frequency

In [14]:
print(word_frequencies)

{'\n ': 1.0, 'Maria': 0.1, 'Sharapova': 0.1, 'basically': 0.1, 'friends': 0.5, 'tennis': 0.6, 'players': 0.6, 'WTA': 0.1, 'Tour': 0.1, 'Russian': 0.1, 'player': 0.2, 'problems': 0.1, 'openly': 0.1, 'speaking': 0.1, 'recent': 0.1, 'interview': 0.1, 'said': 0.2, 'hide': 0.1, 'feelings': 0.1, 'think': 0.4, 'knows': 0.1, 'job': 0.1, 'courts': 0.2, 'court': 0.1, 'playing': 0.1, 'competitor': 0.1, 'want': 0.1, 'beat': 0.1, 'single': 0.1, 'person': 0.2, 'locker': 0.1, 'room': 0.1, 'net': 0.1, 'strike': 0.1, 'conversation': 0.1, 'weather': 0.1, 'know': 0.1, 'minutes': 0.1, 'try': 0.1, 'win': 0.1, 'match': 0.1, 'pretty': 0.1, 'competitive': 0.1, 'girl': 0.1, 'hellos': 0.1, 'sending': 0.1, 'flowers': 0.1, 'Uhm': 0.1, 'friendly': 0.1, 'close': 0.2, 'lot': 0.2, 'away': 0.1, 'strategic': 0.1, 'different': 0.4, 'men': 0.1, 'tour': 0.2, 'women': 0.1, 'sport': 0.1, 'mean': 0.1, 'categorized': 0.1, 'going': 0.1, 'interests': 0.2, 'completely': 0.1, 'jobs': 0.1, 'met': 0.1, 'parts': 0.1, 'life': 0.1, 't

In [15]:
#split the text to sentence
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[
 Maria Sharapova has basically no friends as tennis players on the WTA Tour., The Russian player has no problems in openly speaking about it and in a recent interview she said: 'I don't really hide any feelings too much. 
 , I think everyone knows this is my job here., When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
 , So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
 , I'm a pretty competitive girl., I say my hellos, but I'm not sending any players flowers as well., Uhm, I'm not really friendly or close to many players.
 , I have not a lot of friends away from the courts.', When she said she is not really close to a lot of players, is that something strategic that she is doing?, Is it different on the men's tour than the women's tour?, ', No, not at all.
 , I think jus

In [16]:
#this loop to know the weight of each sentence in the text
sentence_scores = {}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_scores.keys():
        sentence_scores[sent] = word_frequencies[word.text.lower()]
      else:
        sentence_scores[sent] += word_frequencies[word.text.lower()]


In [17]:
sentence_scores

{
  Maria Sharapova has basically no friends as tennis players on the WTA Tour.: 3.0000000000000004,
 The Russian player has no problems in openly speaking about it and in a recent interview she said: 'I don't really hide any feelings too much. 
  : 2.0999999999999996,
 I think everyone knows this is my job here.: 0.6,
 When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
  : 2.3000000000000003,
 So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
  : 2.4,
 I'm a pretty competitive girl.: 0.30000000000000004,
 I say my hellos, but I'm not sending any players flowers as well.: 0.9,
 Uhm, I'm not really friendly or close to many players.
  : 1.9,
 I have not a lot of friends away from the courts.': 1.0,
 When she said she is not really close to a lot of players, is that something s

In [18]:
from heapq import nlargest

In [19]:
#we choose the percentage of the text we want in summary
select_length = int(len(sentence_tokens)*0.3)
select_length

6

In [20]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [21]:
summary

[I think just because you're in the same sport doesn't mean that you have to be friends with everyone just because you're categorized, you're a tennis player, so you're going to get along with tennis players. 
  ,
 
  Maria Sharapova has basically no friends as tennis players on the WTA Tour.,
 I have friends that have completely different jobs and interests, and I've met them in very different parts of my life.
  ,
 So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
  ,
 When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
  ,
 I think everyone just thinks because we're tennis players we should be the greatest of friends.]

In [22]:
#This step is to integrate the words together
final_summary = [word.text for word in summary]

In [23]:
summary = ' '.join(final_summary)

In [24]:
print(text)


 Maria Sharapova has basically no friends as tennis players on the WTA Tour. The Russian player has no problems in openly speaking about it and in a recent interview she said: 'I don't really hide any feelings too much. 
 I think everyone knows this is my job here. When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
 So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
 I'm a pretty competitive girl. I say my hellos, but I'm not sending any players flowers as well. Uhm, I'm not really friendly or close to many players.
 I have not a lot of friends away from the courts.' When she said she is not really close to a lot of players, is that something strategic that she is doing? Is it different on the men's tour than the women's tour? 'No, not at all.
 I think just because you're in 

In [25]:
print(summary)

I think just because you're in the same sport doesn't mean that you have to be friends with everyone just because you're categorized, you're a tennis player, so you're going to get along with tennis players. 
  
 Maria Sharapova has basically no friends as tennis players on the WTA Tour. I have friends that have completely different jobs and interests, and I've met them in very different parts of my life.
  So I'm not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match. 
  When I'm on the courts or when I'm on the court playing, I'm a competitor and I want to beat every single person whether they're in the locker room or across the net.
  I think everyone just thinks because we're tennis players we should be the greatest of friends.


In [26]:
#now I have finish my project with all your directions

In [27]:
#I learned about this library with search, and I believe it will help me in the future to build a more advanced model

In [28]:
from spacy import displacy
summary=nlp(summary)
displacy.render(summary, style='ent', jupyter=True, options={'distance': 80})

In [29]:
sentence_spans = list(doc.sents)
displacy.render(sentence_spans, style="dep",options={'distance': 50})