# Text summariser
Using natural language processing techniques, create summaries for different texts.

In [1]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize,sent_tokenize

* A corpus is a collection of text
* Tokenization is the process of reducing a section of text into tokens.

## Procedure:
* Remove stop words (unnecessary words that do not add meaning to a text)
* Create a frequency table of existing words
* Assign a score to each sentence depending on the frequency table generated before. Can add different weightings to words in frequency table but not required in this task.
* Compare scores of setences to a threshold to determine if these setences are important to the summary. (Use mean score in this case) Rank eligible setences above threshold by score.

In [3]:
#accept an input of text
text=input("Type text here: ")
#instantiate model that removes stopwords
stopWords=set(stopwords.words('english'))
#find words and sentences relevant to the frequency table
words=word_tokenize(text)
sentences=sent_tokenize(text)
#store details of the frequencies and weightings of different sentences and words
freq=dict()
weightings=dict()
#tally up frequencies of words
for word in words:
    word=word.lower()
    if word in stopWords:
        continue
    if word in freq:
        freq[word]+=1
    else:
        freq[word]=1
#store weights of sentences
for sentence in sentences:
    for word,f in freq.items():
        if word in sentence.lower():
            if sentence in weightings:
                weightings[sentence]+=f
            else:
                weightings[sentence]=f
#determine threshold using averaging method
sum_weight=0
for sentence in weightings:
    sum_weight+=weightings[sentence]
threshold=int(sum_weight/len(weightings))#HYPERPARAMETER
#creating summary
summary=''
imp_factor=1.2#HYPERPARAMETER
for sentence in sentences:
    if (sentence in weightings) and (weightings[sentence]>(imp_factor*threshold)):
        summary+=sentence
print(summary)

Type text here: Twinkle, twinkle, little star, How I wonder what you are! Up above the world so high, Like a diamond in the sky.  When the blazing sun is gone, When he nothing shines upon, Then you show your little light, Twinkle, twinkle, all the night.  Then the trav'ller in the dark, Thanks you for your tiny spark, He could not see which way to go, If you did not twinkle so.  In the dark blue sky you keep, And often thro' my curtains peep, For you never shut your eye, Till the sun is in the sky.  'Tis your bright and tiny spark, Lights the trav'ller in the dark, Tho' I know not what you are, Twinkle, twinkle, little star.
'Tis your bright and tiny spark, Lights the trav'ller in the dark, Tho' I know not what you are, Twinkle, twinkle, little star.


## Limitations
* Contextual summarisation
* Sentiment analysis on certain sentences cannot be performed so some sentences of symobllic importance cannot be included even if they are essential to the summary.
* Upper limit on space complexity