# TEXT SUMMARIZATION

**Process**
- Text Cleaning
- Sentence Tokenization
- Word Tokenization
- Frequency of words
- Summarization

In [38]:
Text="""
(IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership. This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership. In a clear reaffirmation of their unstinting commitment to building lasting partnerships with India, 41 countries were represented at the level of heads of state/government.

The architecture of the India-Africa engagement is evolving, with the two sides relating at three levels, namely bilateral, Regional Economic Communities (RECs) and the African Union.

The IAFS-III brought out vividly an increasing convergence of interests, values and a burgeoning web of win-win partnership between the two growth poles of the world. Afro-optimism is for real, and is attested by latest trends, with more than thirty African countries becoming functioning democracies.And it’s not just resource-rich countries that are doing well, but also those countries, which are driven by enterprise and innovation of their people.

The resurgence of Africa has coincided with the rise of India as a global player, investor and provider of developmental assistance. The two narratives are now getting intertwined: be it trade, technology, training or reform of global governance, the multi-faceted ties between India and Africa are blossoming and finding new areas of convergence. The third India-Africa Forum Summit has opened new avenues for upscaling the India-Africa partnership across the spectrum.
    
"""


In [7]:
!pip install -U spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl (13.7 MB)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [39]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [40]:
stopwords= list(STOP_WORDS)
stopwords

['somewhere',
 'per',
 'twelve',
 'ten',
 'beside',
 'off',
 'besides',
 'hers',
 'both',
 'at',
 'done',
 'never',
 'themselves',
 'amongst',
 'each',
 'am',
 'due',
 'hence',
 'them',
 'has',
 'anywhere',
 'move',
 'anyone',
 'itself',
 'latterly',
 'our',
 'still',
 'herein',
 'moreover',
 'can',
 'bottom',
 'ca',
 'though',
 'which',
 'a',
 'others',
 'now',
 'as',
 'this',
 'whenever',
 'against',
 'even',
 'along',
 '‘re',
 'however',
 'on',
 'twenty',
 "'ll",
 'neither',
 'these',
 'became',
 'any',
 'further',
 'name',
 'without',
 'than',
 'five',
 'among',
 'six',
 'yet',
 'such',
 'otherwise',
 'ours',
 'everyone',
 'another',
 'when',
 'then',
 'sixty',
 'seemed',
 'seems',
 '‘d',
 '’ve',
 'across',
 'once',
 'me',
 'become',
 'anything',
 'other',
 'either',
 'is',
 'the',
 'yourselves',
 'how',
 'whence',
 'using',
 'before',
 'did',
 'perhaps',
 'least',
 'up',
 'front',
 'namely',
 'eleven',
 'my',
 'same',
 '’re',
 'nine',
 'so',
 'fifteen',
 'mostly',
 'if',
 'whereup

In [41]:
nlp=spacy.load("en_core_web_sm")

In [42]:
doc=nlp(Text) # from this the words are tokenized

In [43]:
tokens = [token.text for token in doc]
print(tokens)

['\n', '(', 'IAFS', '-', 'III', ')', 'brought', 'the', 'leaders', 'and', 'representatives', 'of', 'all', '54', 'countries', 'of', 'the', 'dynamic', 'African', 'countries', 'to', 'New', 'Delhi', 'for', 'first', 'time', 'in', 'a', 'landmark', 'summit', 'meeting', 'that', 'is', 'set', 'to', 'upscale', 'and', 'transform', 'the', 'multi', '-', 'faceted', 'India', '-', 'Africa', 'partnership', '.', 'This', 'was', 'by', 'far', 'the', 'biggest', 'gathering', 'of', 'African', 'leaders', 'on', 'the', 'Indian', 'soil', 'and', 'showcased', 'multiple', 'dimensions', 'of', 'the', 'India', '-', 'Africa', 'relationship', 'that', 'is', 'pivoted', 'around', 'trade', ',', 'training', ',', 'technology', ',', 'capacity', 'building', 'and', 'development', 'partnership', '.', 'In', 'a', 'clear', 'reaffirmation', 'of', 'their', 'unstinting', 'commitment', 'to', 'building', 'lasting', 'partnerships', 'with', 'India', ',', '41', 'countries', 'were', 'represented', 'at', 'the', 'level', 'of', 'heads', 'of', 'sta

In [44]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [47]:
#\n -> means new line
#Here we are adding '/n' with standard punctuation already in the library
punctuation=punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~/n\n'

In [48]:
# to find the maximum word score

word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies .keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] +=1

                
word_frequencies                

{'IAFS': 2,
 'III': 2,
 'brought': 2,
 'leaders': 2,
 'representatives': 1,
 '54': 1,
 'countries': 6,
 'dynamic': 1,
 'African': 4,
 'New': 1,
 'Delhi': 1,
 'time': 1,
 'landmark': 1,
 'summit': 1,
 'meeting': 1,
 'set': 1,
 'upscale': 1,
 'transform': 1,
 'multi': 2,
 'faceted': 2,
 'India': 8,
 'Africa': 7,
 'partnership': 4,
 'far': 1,
 'biggest': 1,
 'gathering': 1,
 'Indian': 1,
 'soil': 1,
 'showcased': 1,
 'multiple': 1,
 'dimensions': 1,
 'relationship': 1,
 'pivoted': 1,
 'trade': 2,
 'training': 2,
 'technology': 2,
 'capacity': 1,
 'building': 2,
 'development': 1,
 'clear': 1,
 'reaffirmation': 1,
 'unstinting': 1,
 'commitment': 1,
 'lasting': 1,
 'partnerships': 1,
 '41': 1,
 'represented': 1,
 'level': 1,
 'heads': 1,
 'state': 1,
 'government': 1,
 '\n\n': 3,
 'architecture': 1,
 'engagement': 1,
 'evolving': 1,
 'sides': 1,
 'relating': 1,
 'levels': 1,
 'bilateral': 1,
 'Regional': 1,
 'Economic': 1,
 'Communities': 1,
 'RECs': 1,
 'Union': 1,
 'vividly': 1,
 'increa

In [49]:
max_frequency=max(word_frequencies.values())

In [50]:
max_frequency

8

In [51]:
# so now we need to divide this value to all values so that we can get the normalized values

for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency


In [52]:
# so the normalization is over
word_frequencies

{'IAFS': 0.25,
 'III': 0.25,
 'brought': 0.25,
 'leaders': 0.25,
 'representatives': 0.125,
 '54': 0.125,
 'countries': 0.75,
 'dynamic': 0.125,
 'African': 0.5,
 'New': 0.125,
 'Delhi': 0.125,
 'time': 0.125,
 'landmark': 0.125,
 'summit': 0.125,
 'meeting': 0.125,
 'set': 0.125,
 'upscale': 0.125,
 'transform': 0.125,
 'multi': 0.25,
 'faceted': 0.25,
 'India': 1.0,
 'Africa': 0.875,
 'partnership': 0.5,
 'far': 0.125,
 'biggest': 0.125,
 'gathering': 0.125,
 'Indian': 0.125,
 'soil': 0.125,
 'showcased': 0.125,
 'multiple': 0.125,
 'dimensions': 0.125,
 'relationship': 0.125,
 'pivoted': 0.125,
 'trade': 0.25,
 'training': 0.25,
 'technology': 0.25,
 'capacity': 0.125,
 'building': 0.25,
 'development': 0.125,
 'clear': 0.125,
 'reaffirmation': 0.125,
 'unstinting': 0.125,
 'commitment': 0.125,
 'lasting': 0.125,
 'partnerships': 0.125,
 '41': 0.125,
 'represented': 0.125,
 'level': 0.125,
 'heads': 0.125,
 'state': 0.125,
 'government': 0.125,
 '\n\n': 0.375,
 'architecture': 0.125

In [53]:
sentence_tokens=[sent for sent in doc.sents]
print(sentence_tokens)

[
(IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership., This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership., In a clear reaffirmation of their unstinting commitment to building lasting partnerships with India, 41 countries were represented at the level of heads of state/government., 

The architecture of the India-Africa engagement is evolving, with the two sides relating at three levels, namely bilateral, Regional Economic Communities (RECs) and the African Union., 

The IAFS-III brought out vividly an increasing convergence of interests, values and a burgeoning web of win-win partnership between the two growth p

In [57]:
# to find the sentence score

sentence_scores = {}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]
                    
sentence_scores  

{
 (IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership.: 4.5,
 This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership.: 3.125,
 In a clear reaffirmation of their unstinting commitment to building lasting partnerships with India, 41 countries were represented at the level of heads of state/government.: 2.5,
 
 
 The architecture of the India-Africa engagement is evolving, with the two sides relating at three levels, namely bilateral, Regional Economic Communities (RECs) and the African Union.: 1.25,
 
 
 The IAFS-III brought out vividly an increasing convergence of interests, values and a burgeoning web of win-win part

#### Now we want  to select 30% of sentence with maximum score

In [58]:
from heapq import nlargest

In [59]:
# here 0.3 indicates the 30%
select_length = int(len(sentence_tokens)*0.3)
select_length

3

#### so now we want to select 3 sentences for the 30% of the corpus

In [61]:
summary = nlargest(select_length,sentence_scores,key = sentence_scores.get)

In [62]:
# so this is the summary of the corpus
summary

[
 (IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership.,
 This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership.,
 The two narratives are now getting intertwined: be it trade, technology, training or reform of global governance, the multi-faceted ties between India and Africa are blossoming and finding new areas of convergence.]

In [63]:
print(Text)


(IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership. This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership. In a clear reaffirmation of their unstinting commitment to building lasting partnerships with India, 41 countries were represented at the level of heads of state/government.

The architecture of the India-Africa engagement is evolving, with the two sides relating at three levels, namely bilateral, Regional Economic Communities (RECs) and the African Union.

The IAFS-III brought out vividly an increasing convergence of interests, values and a burgeoning web of win-win partnership between the two growth poles of

In [64]:
# if the summary output is having separate sentences with gaps , to combine all into one sentence use below program
final_summary= [word.text for word in summary]

In [65]:
summary=''.join(final_summary)

In [66]:
print(summary)


(IAFS-III) brought the leaders and representatives of all 54 countries of the dynamic African countries to New Delhi for first time in a landmark summit meeting that is set to upscale and transform the multi-faceted India-Africa partnership.This was by far the biggest gathering of African leaders on the Indian soil and showcased multiple dimensions of the India-Africa relationship that is pivoted around trade, training, technology, capacity building and development partnership.The two narratives are now getting intertwined: be it trade, technology, training or reform of global governance, the multi-faceted ties between India and Africa are blossoming and finding new areas of convergence.


In [67]:
# From above for the previous and present summary we can see a difference

In [68]:
len(Text)

1775

In [69]:
len(summary)

697

In [70]:
# so it is the 30% of the length  of original text