**Importing required libraries**

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from spacy.lang.en import English
import numpy as np

**Load spacy model for sentence tokenization**

In [None]:
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))

In [None]:
text=""" To set yourself up for success, try to keep things simple. Eating a healthier diet doesn’t have to be complicated. Instead of being overly concerned with counting calories, for example, think of your diet in terms of color, variety, and freshness. Focus on avoiding packaged and processed foods and opting for more fresh ingredients whenever possible.
Prepare more of your own meals. Cooking more meals at home can help you take charge of what you’re eating and better monitor exactly what goes into your food. You’ll eat fewer calories and avoid the chemical additives, added sugar, and unhealthy fats of packaged and takeout foods that can leave you feeling tired, bloated, and irritable, and exacerbate symptoms of depression, stress, and anxiety.
Make the right changes. When cutting back on unhealthy foods in your diet, it’s important to replace them with healthy alternatives. Replacing dangerous trans fats with healthy fats (such as switching fried chicken for grilled salmon) will make a positive difference to your health. Switching animal fats for refined carbohydrates, though (such as switching your breakfast bacon for a donut), won’t lower your risk for heart disease or improve your mood.
Read the labels. It’s important to be aware of what’s in your food as manufacturers often hide large amounts of sugar or unhealthy fats in packaged food, even food claiming to be healthy.Focus on how you feel after eating. This will help foster healthy new habits and tastes. The healthier the food you eat, the better you’ll feel after a meal. The more junk food you eat, the more likely you are to feel uncomfortable, nauseous, or drained of energy """

**Create spacy document for further sentence level tokenization**

In [None]:
doc = nlp(text.replace("\n", ""))
sentences = [sent.string.strip() for sent in doc.sents]

**Peeking into our tokenized sentences**

In [None]:
print("Senetence are: \n", sentences)

Senetence are: 
 ['To set yourself up for success, try to keep things simple.', 'Eating a healthier diet doesn’t have to be complicated.', 'Instead of being overly concerned with counting calories, for example, think of your diet in terms of color, variety, and freshness.', 'Focus on avoiding packaged and processed foods and opting for more fresh ingredients whenever possible.', 'Prepare more of your own meals.', 'Cooking more meals at home can help you take charge of what you’re eating and better monitor exactly what goes into your food.', 'You’ll eat fewer calories and avoid the chemical additives, added sugar, and unhealthy fats of packaged and takeout foods that can leave you feeling tired, bloated, and irritable, and exacerbate symptoms of depression, stress, and anxiety.', 'Make the right changes.', 'When cutting back on unhealthy foods in your diet, it’s important to replace them with healthy alternatives.', 'Replacing dangerous trans fats with healthy fats (such as switching fr

**Creating sentence organizer**

In [None]:
# Let's create an organizer which will store the sentence ordering to later reorganize the 
# scored sentences in their correct order
sentence_organizer = {k:v for v,k in enumerate(sentences)}

**Peeking into our sentence organizer**

In [None]:
print("Our sentence organizer: \n", sentence_organizer)

Our sentence organizer: 
 {'To set yourself up for success, try to keep things simple.': 0, 'Eating a healthier diet doesn’t have to be complicated.': 1, 'Instead of being overly concerned with counting calories, for example, think of your diet in terms of color, variety, and freshness.': 2, 'Focus on avoiding packaged and processed foods and opting for more fresh ingredients whenever possible.': 3, 'Prepare more of your own meals.': 4, 'Cooking more meals at home can help you take charge of what you’re eating and better monitor exactly what goes into your food.': 5, 'You’ll eat fewer calories and avoid the chemical additives, added sugar, and unhealthy fats of packaged and takeout foods that can leave you feeling tired, bloated, and irritable, and exacerbate symptoms of depression, stress, and anxiety.': 6, 'Make the right changes.': 7, 'When cutting back on unhealthy foods in your diet, it’s important to replace them with healthy alternatives.': 8, 'Replacing dangerous trans fats wit

**Creating TF-IDF model**

In [None]:
# Let's now create a tf-idf (Term frequnecy Inverse Document Frequency) model
tf_idf_vectorizer = TfidfVectorizer(min_df=2,  max_features=None, 
                                    strip_accents='unicode', 
                                    analyzer='word',
                                    token_pattern=r'\w{1,}',
                                    ngram_range=(1, 3), 
                                    use_idf=1,smooth_idf=1,
                                    sublinear_tf=1,
                                    stop_words = 'english')

In [None]:
# Passing our sentences treating each as one document to TF-IDF vectorizer
tf_idf_vectorizer.fit(sentences)

TfidfVectorizer(min_df=2, ngram_range=(1, 3), smooth_idf=1,
                stop_words='english', strip_accents='unicode', sublinear_tf=1,
                token_pattern='\\w{1,}', use_idf=1)

In [None]:
# Transforming our sentences to TF-IDF vectors
sentence_vectors = tf_idf_vectorizer.transform(sentences)

**Performing sentence scoring**

In [None]:
# Getting sentence scores for each sentences
sentence_scores = np.array(sentence_vectors.sum(axis=1)).ravel()

# Sanity checkup
print(len(sentences) == len(sentence_scores))

True


In [None]:
# Getting top-n sentences
N = 3
top_n_sentences = [sentences[ind] for ind in np.argsort(sentence_scores, axis=0)[::-1][:N]]

**Performing final summarization**

In [None]:
# Let's now do the sentence ordering using our prebaked sentence_organizer
# Let's map the scored sentences with their indexes
mapped_top_n_sentences = [(sentence,sentence_organizer[sentence]) for sentence in top_n_sentences]
print("Our top_n_sentence with their index: \n")
for element in mapped_top_n_sentences:
    print(element)

# Ordering our top-n sentences in their original ordering
mapped_top_n_sentences = sorted(mapped_top_n_sentences, key = lambda x: x[1])
ordered_scored_sentences = [element[0] for element in mapped_top_n_sentences]

# Our final summary
summary = " ".join(ordered_scored_sentences)

Our top_n_sentence with their index: 

('It’s important to be aware of what’s in your food as manufacturers often hide large amounts of sugar or unhealthy fats in packaged food, even food claiming to be healthy.', 12)
('You’ll eat fewer calories and avoid the chemical additives, added sugar, and unhealthy fats of packaged and takeout foods that can leave you feeling tired, bloated, and irritable, and exacerbate symptoms of depression, stress, and anxiety.', 6)
('When cutting back on unhealthy foods in your diet, it’s important to replace them with healthy alternatives.', 8)


**Result / Summary**

In [None]:
print("Summary: \n", summary)

Summary: 
 You’ll eat fewer calories and avoid the chemical additives, added sugar, and unhealthy fats of packaged and takeout foods that can leave you feeling tired, bloated, and irritable, and exacerbate symptoms of depression, stress, and anxiety. When cutting back on unhealthy foods in your diet, it’s important to replace them with healthy alternatives. It’s important to be aware of what’s in your food as manufacturers often hide large amounts of sugar or unhealthy fats in packaged food, even food claiming to be healthy.


**Reading Privacy Policy Text file**

In [None]:
f = open("test.txt", "r")
read=f.read()
print(read)

Dhoby Ghaut MRT station is an underground Mass Rapid Transit (MRT) interchange station on the North South, North East and Circle lines in Singapore. Located beneath the eastern end of Orchard Road shopping belt in Dhoby Ghaut, Museum Planning Area, the station is integrated with the commercial development The Atrium@Orchard. The station is near landmarks such as The Istana, the MacDonald House, Plaza Singapura and Dhoby Ghaut Green.
Dhoby Ghaut station was part of the early plans for the original MRT network since 1982. It was constructed as part of Phase I of the MRT network which was completed in 1987. Following the network's operational split, the station has been served by the North South line since 1989. To construct the North East line platforms, which were completed in 2003, the Stamford Canal had to be diverted while excavating through part of Mount Sophia. The Circle line platforms opened in 2010 along with Stages 1 and 2 of the line.
The only triple-line MRT interchange stati

**Creating a function using above steps**

In [None]:
def summarizer(text, tokenizer, max_sent_in_summary):
    # Create spacy document for further sentence level tokenization
    doc = nlp(text.replace("\n", ""))
    sentences = [sent.string.strip() for sent in doc.sents]
    # Let's create an organizer which will store the sentence ordering to later reorganize the 
    # scored sentences in their correct order
    sentence_organizer = {k:v for v,k in enumerate(sentences)}
    # Let's now create a tf-idf (Term frequnecy Inverse Document Frequency) model
    tf_idf_vectorizer = TfidfVectorizer(min_df=2,  max_features=None, 
                                        strip_accents='unicode', 
                                        analyzer='word',
                                        token_pattern=r'\w{1,}',
                                        ngram_range=(1, 3), 
                                        use_idf=1,smooth_idf=1,
                                        sublinear_tf=1,
                                        stop_words = 'english')
    # Passing our sentences treating each as one document to TF-IDF vectorizer
    tf_idf_vectorizer.fit(sentences)
    # Transforming our sentences to TF-IDF vectors
    sentence_vectors = tf_idf_vectorizer.transform(sentences)
    # Getting sentence scores for each sentences
    sentence_scores = np.array(sentence_vectors.sum(axis=1)).ravel()
    # Getting top-n sentences
    N = max_sent_in_summary
    top_n_sentences = [sentences[ind] for ind in np.argsort(sentence_scores, axis=0)[::-1][:N]]
    # Let's now do the sentence ordering using our prebaked sentence_organizer
    # Let's map the scored sentences with their indexes
    mapped_top_n_sentences = [(sentence,sentence_organizer[sentence]) for sentence in top_n_sentences]
    # Ordering our top-n sentences in their original ordering
    mapped_top_n_sentences = sorted(mapped_top_n_sentences, key = lambda x: x[1])
    ordered_scored_sentences = [element[0] for element in mapped_top_n_sentences]
    # Our final summary
    summary = " ".join(ordered_scored_sentences)
    return summary

In [None]:
print("Result: \n", summarizer(text=read,tokenizer=nlp,max_sent_in_summary=3))

Result: 
 The station features many forms of artworks, three of them under the Art-in-Transit scheme in the North East line and Circle line stations, a pair of Art Seats at the Circle line platforms and an art piece above theDhoby Ghaut station was included in the early plans of the MRT network in May 1982.[8] It was to be constructed as part of the Phase I MRT segment from the Novena to Outram Park station.[9] This segment was targeted to be completed by December 1987.[10] Phase I, which would be part of the North South line (NSL), was given priority as it passes through areas having a higher demand for public transport, such as the densely populated housing estates of Toa Payoh and Ang Mo Kio and the Central Area. The line was aimed to relieve the traffic congestion on the Thomson–Sembawang road corridor.[11][12]Before construction began, tenants of Amber Mansions[note 1] were compelled to relocate; the land had already been marked for acquisition in 1978.[14] Contract 106 for the de