## Simple Text Summary Model Using NLP
##### Ashraf Rahman

What is NLP?
- NLP also is known as '***Natural language processing***' is a field in machine learning which is used to ***understand, analyse and generate*** natural human texts.

What is Text Summary?
- Text Summary allows the model to extract unbiased key insights and information to generate a shorted summary of the document.
- There are two types of summarization in particular:
	- **Extractive**: Investigates important sentences using a ranking system and picks the sentences to be used in the summary. (Used in this example)
	- **Abstractive**: Similar to extractive however it paraphrases and achieves a more human-like summary. 

Use Cases for NLP? (To name a few)
- Text Summary.
- Chat-Bot for first-line support
- Open/Closed domain Q&A. 
- Automating and making sense of unstructured data.
- Speech-To-Text for writing notes, resulting in more focus towards meetings.

How it’s done? <br />
1) Tokenisation. <br />
&nbsp;&nbsp; a) Word Tokenisation. <br />
&nbsp;&nbsp; b) Sentence Tokenisation. <br />
3) TF-IDF (Text Frequency-Inverse Document Frequency): Identify the importance of words and sentences, using a frequency table. <br />
4) Normalisation. <br />
5) Summerisation. <br />

> **Note**: 
Tokenisation: Breaking the raw text into small chunks. Tokenization breaks the raw text into words, sentences called tokens. These tokens help in understanding the context or developing the model for the NLP. The tokenization helps in interpreting the meaning of the text by analyzing the sequence of the words.

In [1]:
# Article I will be using for testing

text = """
What is Artifical intelligence? (https://www.bbc.co.uk/newsround/49274918)
Artificial intelligence - or AI for short - is technology that enables a computer to think or act in a more 'human' way. It does this by taking in information from its surroundings, and deciding its response based on what it learns or senses.
It affects the the way we live, work and have fun in our spare time - and sometimes without us even realising.
AI is becoming a bigger part of our lives, as the technology behind it becomes more and more advanced. Machines are improving their ability to 'learn' from mistakes and change how they approach a task the next time they try it.
Some researchers are even trying to teach robots about feelings and emotions.
You might not realise some of the devices and daily activities which rely on AI technology - phones, video games and going shopping, for example.
More technology
Why did this photo make history 60 years ago?
How robots and drones are changing deliveries
Flyboard inventor crosses English Channel
Why Instagram is going to hide your 'likes'
Some people think that the technology is a really good idea, while others aren't so sure.
Just this month, it was announced that the NHS in England is setting up a special AI laboratory to boost the role of AI within the health service.
Announcing that the government will spend £250 million on this, Health Secretary Matt Hancock said the technology had "enormous power" to improve care, save lives and ensure doctors had more time to spend with patients.
Read on to find out more about AI and let us know what you think about it in the comments below.
What does AI do?
AI can be used for many different tasks and activities.
Personal electronic devices or accounts (like our phones or social media) use AI to learn more about us and the things that we like. One example of this is entertainment services like Netflix which use the technology to understand what we like to watch and recommend other shows based on what they learn.
It can make video games more challenging by studying how a player behaves, while home assistants like Alexa and Siri also rely on it.
It has been announced that NHS England will spend millions on AI in order to improve patient care and research
AI can be used in healthcare, not only for research purposes, but also to take better care of patients through improved diagnosis and monitoring.
It also has uses within transport too. For example, driverless cars are an example of AI tech in action, while it is used extensively in the aviation industry (for example, in flight simulators).
Farmers can use AI to monitor crops and conditions, and to make predictions, which will help them to be more efficient.
You only have to look at what some of these AI robots can do to see just how advanced the technology is and imagine many other jobs for which it could be used.
Where did AI come from?
The term 'artificial intelligence' was first used in 1956.
In the 1960s, scientists were teaching computers how to mimic - or copy - human decision-making.
This developed into research around 'machine learning', in which robots were taught to learn for themselves and remember their mistakes, instead of simply copying. Algorithms play a big part in machine learning as they help computers and robots to know what to do.
What is an algorithm?
An algorithm is basically a set of rules or instructions which a computer can use to help solve a problem or come to a decision about what to do next.
From here, the research has continued to develop, with scientists now exploring 'machine perception'. This involves giving machines and robots special sensors to help them to see, hear, feel and taste things like human do - and adjust how they behave as a result of what they sense.
The idea is that the more this technology develops, the more robots will be able to 'understand' and read situations, and determine their response as a result of the information that they pick up.
Why are people worried about AI?
Many people have concerns about AI technology and teaching robots too much.
Famous scientist Sir Stephen Hawking spoke out about it in the past. He said that although the AI we've made so far has been very useful and helpful, he worried that if we teach robots too much, they could become smarter than humans and potentially cause problems.
Sir Stephen Hawking spoke out about AI and said that he had concerns that the technology could cause problems in the future
People have expressed concerns about privacy too. For example, critics think that it could become a problem if AI learns too much about what we like to look at online and encourages us to spend too much time on electronic devices.
Another concern about AI is that if robots and computers become very intelligent, they could learn to do jobs which people would usually have to do, which could leave some people unemployed.
Other people disagree, saying that the technology will never be as advanced as human thoughts and actions, so there is not a danger of robots 'taking over' in the way that some critics have described.
What do you think about AI? Do you think that it is a good thing or a bad thing? Let us know in the comments below.
"""

In [2]:
import spacy
from spacy.lang.en import English
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

>Note:
Stop Words: Most Common words

We remove stop words and punctuation as they are common and do not contribute to the investigation of important sentences.

In [3]:
stopwords = list(STOP_WORDS)
print(stopwords)

['’s', 'nor', 'whenever', 'much', 'least', 'about', 'before', 'throughout', 'may', 'beside', 'had', 'perhaps', 'four', 'through', 'yourselves', 'everything', '‘ll', 'rather', 'those', 'into', 'bottom', 'take', 'sometime', 'herein', 'amount', 'n‘t', 'cannot', 'be', 'hence', 'indeed', 'twelve', 'whose', 'do', 'among', 'hereupon', 'mostly', 'against', 'two', 'various', 'around', 'thereafter', 'whole', 'still', 'either', 'no', 'besides', 'become', 'somehow', 'ca', 'us', 'due', 'often', 'mine', 'will', 'used', 'did', 'first', 'made', 'never', 'on', 'because', 'except', 'just', 'themselves', 'anyway', 'n’t', 'herself', 're', 'me', 'thereupon', 'how', 'from', 'part', 'please', 'put', 'always', 'became', 'across', 'wherever', 'whether', '‘s', 'my', 'show', 'something', 'also', 'at', 'one', 'you', 'both', 'these', '‘d', 'could', 'it', 'meanwhile', 'say', 'under', 'using', 'that', 'if', 'would', '‘re', 'off', "'m", 'onto', 'call', 'until', 'his', 'once', 'whither', 'and', 'thru', 'hundred', 'abo

In [4]:
punctuation += "\n"
punctuation += "\n\n"

punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\n\n'

#### Loading the model

In [5]:
nlp = spacy.load('en')

#### Tokenisation

In [6]:
doc = nlp(text) # Loading the text into the nlp model 
tokens = [token.text for token in doc] # Extracting the text to a list
print(tokens)

['\n', 'What', 'is', 'Artifical', 'intelligence', '?', '(', 'https://www.bbc.co.uk/newsround/49274918', ')', '\n', 'Artificial', 'intelligence', '-', 'or', 'AI', 'for', 'short', '-', 'is', 'technology', 'that', 'enables', 'a', 'computer', 'to', 'think', 'or', 'act', 'in', 'a', 'more', "'", 'human', "'", 'way', '.', 'It', 'does', 'this', 'by', 'taking', 'in', 'information', 'from', 'its', 'surroundings', ',', 'and', 'deciding', 'its', 'response', 'based', 'on', 'what', 'it', 'learns', 'or', 'senses', '.', '\n', 'It', 'affects', 'the', 'the', 'way', 'we', 'live', ',', 'work', 'and', 'have', 'fun', 'in', 'our', 'spare', 'time', '-', 'and', 'sometimes', 'without', 'us', 'even', 'realising', '.', '\n', 'AI', 'is', 'becoming', 'a', 'bigger', 'part', 'of', 'our', 'lives', ',', 'as', 'the', 'technology', 'behind', 'it', 'becomes', 'more', 'and', 'more', 'advanced', '.', 'Machines', 'are', 'improving', 'their', 'ability', 'to', "'", 'learn', "'", 'from', 'mistakes', 'and', 'change', 'how', 'the

#### TF-IDF (Generating a frequency table)

In [7]:
word_frequencies = {}
for word in doc:
    
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            
            if word.text not in word_frequencies:
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] += 1

word_frequencies

{'Artifical': 1,
 'intelligence': 3,
 'https://www.bbc.co.uk/newsround/49274918': 1,
 'Artificial': 1,
 'AI': 22,
 'short': 1,
 'technology': 12,
 'enables': 1,
 'computer': 2,
 'think': 6,
 'act': 1,
 'human': 4,
 'way': 3,
 'taking': 2,
 'information': 2,
 'surroundings': 1,
 'deciding': 1,
 'response': 2,
 'based': 2,
 'learns': 2,
 'senses': 1,
 'affects': 1,
 'live': 1,
 'work': 1,
 'fun': 1,
 'spare': 1,
 'time': 4,
 'realising': 1,
 'bigger': 1,
 'lives': 2,
 'advanced': 3,
 'Machines': 1,
 'improving': 1,
 'ability': 1,
 'learn': 5,
 'mistakes': 2,
 'change': 1,
 'approach': 1,
 'task': 1,
 'try': 1,
 'researchers': 1,
 'trying': 1,
 'teach': 2,
 'robots': 11,
 'feelings': 1,
 'emotions': 1,
 'realise': 1,
 'devices': 3,
 'daily': 1,
 'activities': 2,
 'rely': 2,
 'phones': 2,
 'video': 2,
 'games': 2,
 'going': 2,
 'shopping': 1,
 'example': 6,
 'photo': 1,
 'history': 1,
 '60': 1,
 'years': 1,
 'ago': 1,
 'drones': 1,
 'changing': 1,
 'deliveries': 1,
 'Flyboard': 1,
 'invent

In [8]:
max_frequency = max(word_frequencies.values())
max_frequency

22

#### We must normalise the values so it can improve the performance of the model by reducing the values to a range. (Although it is not required in this example it is good practice to implement normalisation for advanced models e.g. Abstractive Text Summary)

In [9]:
for word in word_frequencies.keys():
    word_frequencies[word] /= max_frequency
    
word_frequencies

{'Artifical': 0.045454545454545456,
 'intelligence': 0.13636363636363635,
 'https://www.bbc.co.uk/newsround/49274918': 0.045454545454545456,
 'Artificial': 0.045454545454545456,
 'AI': 1.0,
 'short': 0.045454545454545456,
 'technology': 0.5454545454545454,
 'enables': 0.045454545454545456,
 'computer': 0.09090909090909091,
 'think': 0.2727272727272727,
 'act': 0.045454545454545456,
 'human': 0.18181818181818182,
 'way': 0.13636363636363635,
 'taking': 0.09090909090909091,
 'information': 0.09090909090909091,
 'surroundings': 0.045454545454545456,
 'deciding': 0.045454545454545456,
 'response': 0.09090909090909091,
 'based': 0.09090909090909091,
 'learns': 0.09090909090909091,
 'senses': 0.045454545454545456,
 'affects': 0.045454545454545456,
 'live': 0.045454545454545456,
 'work': 0.045454545454545456,
 'fun': 0.045454545454545456,
 'spare': 0.045454545454545456,
 'time': 0.18181818181818182,
 'realising': 0.045454545454545456,
 'bigger': 0.045454545454545456,
 'lives': 0.0909090909090

#### Sentence Tokenisation

In [10]:
sentence_token = [token for token in doc.sents]
sentence_token

[
 What is Artifical intelligence?,
 (https://www.bbc.co.uk/newsround/49274918),
 Artificial intelligence - or AI for short - is technology that enables a computer to think or act in a more 'human' way.,
 It does this by taking in information from its surroundings, and deciding its response based on what it learns or senses.,
 It affects the the way we live, work and have fun in our spare time - and sometimes without us even realising.,
 AI is becoming a bigger part of our lives, as the technology behind it becomes more and more advanced.,
 Machines are improving their ability to 'learn' from mistakes and change how they approach a task the next time they try it.,
 Some researchers are even trying to teach robots about feelings and emotions.,
 You might not realise some of the devices and daily activities which rely on AI technology - phones, video games and going shopping, for example.,
 More technology,
 Why did this photo make history 60 years ago?,
 How robots and drones are changi

#### Now we have a score of the sentences based on the score of the words to identify the importance of each sentence.

In [11]:
sentence_score = {}
for sent in sentence_token:
    for word in sent:
        if word.text.lower() in word_frequencies:
                if sent not in sentence_score:
                    sentence_score[sent] = word_frequencies[word.text.lower()]
                else:
                    sentence_score[sent] += word_frequencies[word.text.lower()]

sentence_score

{
 What is Artifical intelligence?: 0.13636363636363635,
 (https://www.bbc.co.uk/newsround/49274918): 0.045454545454545456,
 Artificial intelligence - or AI for short - is technology that enables a computer to think or act in a more 'human' way.: 1.5454545454545454,
 It does this by taking in information from its surroundings, and deciding its response based on what it learns or senses.: 0.5909090909090909,
 It affects the the way we live, work and have fun in our spare time - and sometimes without us even realising.: 0.5909090909090909,
 AI is becoming a bigger part of our lives, as the technology behind it becomes more and more advanced.: 0.8181818181818181,
 Machines are improving their ability to 'learn' from mistakes and change how they approach a task the next time they try it.: 0.8181818181818181,
 Some researchers are even trying to teach robots about feelings and emotions.: 0.7727272727272727,
 You might not realise some of the devices and daily activities which rely on AI tec

#### The next goal is to reduce the summary from 100% down to 20% with high scoring sentences

In [12]:
from heapq import nlargest

In [13]:
select_length = int(len(sentence_token) * 0.2)
print("Maximum number of sentences needed to achieve a 80% reducion:",select_length)

Maximum number of sentences needed to achieve a 80% reducion: 10


In [14]:
summary = nlargest(select_length, sentence_score, key=sentence_score.get)

#### Finally, we have the list of important sentences so all that is required is to put it all together

In [15]:
summary

[One example of this is entertainment services like Netflix which use the technology to understand what we like to watch and recommend other shows based on what they learn.,
 Other people disagree, saying that the technology will never be as advanced as human thoughts and actions, so there is not a danger of robots 'taking over' in the way that some critics have described.,
 Announcing that the government will spend £250 million on this, Health Secretary Matt Hancock said the technology had "enormous power" to improve care, save lives and ensure doctors had more time to spend with patients.,
 For example, critics think that it could become a problem if AI learns too much about what we like to look at online and encourages us to spend too much time on electronic devices.,
 This involves giving machines and robots special sensors to help them to see, hear, feel and taste things like human do - and adjust how they behave as a result of what they sense.,
 The idea is that the more this tec

In [16]:
final_summary = [word.text for word in summary]
summary = "".join(final_summary)

In [17]:
print(summary)

One example of this is entertainment services like Netflix which use the technology to understand what we like to watch and recommend other shows based on what they learn.
Other people disagree, saying that the technology will never be as advanced as human thoughts and actions, so there is not a danger of robots 'taking over' in the way that some critics have described.
Announcing that the government will spend £250 million on this, Health Secretary Matt Hancock said the technology had "enormous power" to improve care, save lives and ensure doctors had more time to spend with patients.
For example, critics think that it could become a problem if AI learns too much about what we like to look at online and encourages us to spend too much time on electronic devices.
This involves giving machines and robots special sensors to help them to see, hear, feel and taste things like human do - and adjust how they behave as a result of what they sense.
The idea is that the more this technology dev

In [18]:
print("Length of original text:", len(text))
print("Length of summary:", len(summary))

Length of original text: 5210
Length of summary: 1741
