## Text summarization

In [1]:
text = """ Maria Sharapova has basically no friends as tennis players on the WTA Tour. The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
I think everyone knows this is my job here. When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
I’m a pretty competitive girl. I say my hellos, but I’m not sending any players flowers as well. Uhm, I’m not really friendly or close to many players.
I have not a lot of friends away from the courts.’ When she said she is not really close to a lot of players, is that something strategic that she is doing? Is it different on the men’s tour than the women’s tour? ‘No, not at all.
I think just because you’re in the same sport doesn’t mean that you have to be friends with everyone just because you’re categorized, you’re a tennis player, so you’re going to get along with tennis players.
I think every person has different interests. I have friends that have completely different jobs and interests, and I’ve met them in very different parts of my life.
I think everyone just thinks because we’re tennis players we should be the greatest of friends. But ultimately tennis is just a very small part of what we do.
There are so many other things that we’re interested in, that we do. """

In [2]:
len(text)

1563

# 1) Importing the libraries and Dataset

In [3]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [4]:
nlp = spacy.load("en_core_web_sm")

In [5]:
doc = nlp(text)

In [6]:
tokens = [token.text for token in doc]
print(tokens)

[' ', 'Maria', 'Sharapova', 'has', 'basically', 'no', 'friends', 'as', 'tennis', 'players', 'on', 'the', 'WTA', 'Tour', '.', 'The', 'Russian', 'player', 'has', 'no', 'problems', 'in', 'openly', 'speaking', 'about', 'it', 'and', 'in', 'a', 'recent', 'interview', 'she', 'said', ':', '‘', 'I', 'do', 'n’t', 'really', 'hide', 'any', 'feelings', 'too', 'much', '.', '\n', 'I', 'think', 'everyone', 'knows', 'this', 'is', 'my', 'job', 'here', '.', 'When', 'I', '’m', 'on', 'the', 'courts', 'or', 'when', 'I', '’m', 'on', 'the', 'court', 'playing', ',', 'I', '’m', 'a', 'competitor', 'and', 'I', 'want', 'to', 'beat', 'every', 'single', 'person', 'whether', 'they', '’re', 'in', 'the', 'locker', 'room', 'or', 'across', 'the', 'net', '.', '\n', 'So', 'I', '’m', 'not', 'the', 'one', 'to', 'strike', 'up', 'a', 'conversation', 'about', 'the', 'weather', 'and', 'know', 'that', 'in', 'the', 'next', 'few', 'minutes', 'I', 'have', 'to', 'go', 'and', 'try', 'to', 'win', 'a', 'tennis', 'match', '.', '\n', 'I',

In [7]:
punctuation =  punctuation + '\n'

In [8]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

# 2) Text Cleaning

In [9]:
word_freq = {}

stop_words = list(STOP_WORDS)

for word in doc:
  if word.text.lower() not in  stop_words:
    if word.text.lower() not in punctuation:
      if word.text not in word_freq.keys():
        word_freq[word.text] = 1
      else:
        word_freq[word.text] += 1


In [10]:
print(word_freq)

{' ': 1, 'Maria': 1, 'Sharapova': 1, 'basically': 1, 'friends': 5, 'tennis': 6, 'players': 6, 'WTA': 1, 'Tour': 1, 'Russian': 1, 'player': 2, 'problems': 1, 'openly': 1, 'speaking': 1, 'recent': 1, 'interview': 1, 'said': 2, '‘': 2, 'hide': 1, 'feelings': 1, 'think': 4, 'knows': 1, 'job': 1, 'courts': 2, 'court': 1, 'playing': 1, 'competitor': 1, 'want': 1, 'beat': 1, 'single': 1, 'person': 2, 'locker': 1, 'room': 1, 'net': 1, 'strike': 1, 'conversation': 1, 'weather': 1, 'know': 1, 'minutes': 1, 'try': 1, 'win': 1, 'match': 1, 'pretty': 1, 'competitive': 1, 'girl': 1, 'hellos': 1, 'sending': 1, 'flowers': 1, 'Uhm': 1, 'friendly': 1, 'close': 2, 'lot': 2, 'away': 1, '’': 1, 'strategic': 1, 'different': 4, 'men': 1, 'tour': 2, 'women': 1, 'sport': 1, 'mean': 1, 'categorized': 1, 'going': 1, 'interests': 2, 'completely': 1, 'jobs': 1, 'met': 1, 'parts': 1, 'life': 1, 'thinks': 1, 'greatest': 1, 'ultimately': 1, 'small': 1, 'things': 1, 'interested': 1}


In [11]:
max_freq = max(word_freq.values())

In [12]:
for word in word_freq.keys():
  word_freq[word] = word_freq[word] / max_freq

In [13]:
print(word_freq)

{' ': 0.16666666666666666, 'Maria': 0.16666666666666666, 'Sharapova': 0.16666666666666666, 'basically': 0.16666666666666666, 'friends': 0.8333333333333334, 'tennis': 1.0, 'players': 1.0, 'WTA': 0.16666666666666666, 'Tour': 0.16666666666666666, 'Russian': 0.16666666666666666, 'player': 0.3333333333333333, 'problems': 0.16666666666666666, 'openly': 0.16666666666666666, 'speaking': 0.16666666666666666, 'recent': 0.16666666666666666, 'interview': 0.16666666666666666, 'said': 0.3333333333333333, '‘': 0.3333333333333333, 'hide': 0.16666666666666666, 'feelings': 0.16666666666666666, 'think': 0.6666666666666666, 'knows': 0.16666666666666666, 'job': 0.16666666666666666, 'courts': 0.3333333333333333, 'court': 0.16666666666666666, 'playing': 0.16666666666666666, 'competitor': 0.16666666666666666, 'want': 0.16666666666666666, 'beat': 0.16666666666666666, 'single': 0.16666666666666666, 'person': 0.3333333333333333, 'locker': 0.16666666666666666, 'room': 0.16666666666666666, 'net': 0.166666666666666

# 3) Sentence tokenization

In [14]:
sent_tokens = [sent for sent in doc.sents]
print(sent_tokens)

[ Maria Sharapova has basically no friends as tennis players on the WTA Tour., The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
, I think everyone knows this is my job here., When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
, So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
, I’m a pretty competitive girl., I say my hellos, but I’m not sending any players flowers as well., Uhm, I’m not really friendly or close to many players.
, I have not a lot of friends away from the courts.’, When she said she is not really close to a lot of players, is that something strategic that she is doing?, Is it different on the men’s tour than the women’s tour?, ‘No, not at all.
, I think just because 

In [15]:
sent_score = {}

In [16]:
for sent in sent_tokens:
  for word in sent:
    if word.text.lower()in word_freq.keys():
      if sent not in sent_score.keys():
        sent_score[sent] = word_freq[word.text.lower()]
      else:
        sent_score[sent] += word_freq[word.text.lower()]


In [17]:
print(sent_score)

{ Maria Sharapova has basically no friends as tennis players on the WTA Tour.: 3.5000000000000004, The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
: 2.1666666666666665, I think everyone knows this is my job here.: 0.9999999999999999, When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
: 2.1666666666666665, So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
: 2.333333333333333, I’m a pretty competitive girl.: 0.5, I say my hellos, but I’m not sending any players flowers as well.: 1.5, Uhm, I’m not really friendly or close to many players.
: 1.5, I have not a lot of friends away from the courts.’: 1.8333333333333335, When she said she is not really close to a lot of players, is that some

# 4) Select 30% sentences with maximum score

In [18]:
from heapq import nlargest

In [19]:
len(sent_score) * 0.3

5.3999999999999995

In [20]:
8

8

# 5) Getting the Summary

In [21]:
summary = nlargest(n = 8, iterable= sent_score, key = sent_score.get)

In [22]:
print(summary)

[I think just because you’re in the same sport doesn’t mean that you have to be friends with everyone just because you’re categorized, you’re a tennis player, so you’re going to get along with tennis players.
, I think everyone just thinks because we’re tennis players we should be the greatest of friends.,  Maria Sharapova has basically no friends as tennis players on the WTA Tour., I have friends that have completely different jobs and interests, and I’ve met them in very different parts of my life.
, So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
, The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
, When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
, When she said she is not real

In [23]:
final_summary = [word.text for word in summary]

In [24]:
print(final_summary)

['I think just because you’re in the same sport doesn’t mean that you have to be friends with everyone just because you’re categorized, you’re a tennis player, so you’re going to get along with tennis players.\n', 'I think everyone just thinks because we’re tennis players we should be the greatest of friends.', ' Maria Sharapova has basically no friends as tennis players on the WTA Tour.', 'I have friends that have completely different jobs and interests, and I’ve met them in very different parts of my life.\n', 'So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.\n', 'The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.\n', 'When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.\n', 'When she 

In [25]:
summary = " ".join(final_summary)

In [26]:
print(summary)

I think just because you’re in the same sport doesn’t mean that you have to be friends with everyone just because you’re categorized, you’re a tennis player, so you’re going to get along with tennis players.
 I think everyone just thinks because we’re tennis players we should be the greatest of friends.  Maria Sharapova has basically no friends as tennis players on the WTA Tour. I have friends that have completely different jobs and interests, and I’ve met them in very different parts of my life.
 So I’m not the one to strike up a conversation about the weather and know that in the next few minutes I have to go and try to win a tennis match.
 The Russian player has no problems in openly speaking about it and in a recent interview she said: ‘I don’t really hide any feelings too much.
 When I’m on the courts or when I’m on the court playing, I’m a competitor and I want to beat every single person whether they’re in the locker room or across the net.
 When she said she is not really close

In [27]:
len(summary)

1068

In [28]:
len(summary)/ len(text)

0.6833013435700576

In [30]:
from transformers import pipeline
from IPython.display import Markdown

def summarize_text(text, max_length=150, min_length=50):
    summarizer = pipeline("summarization")
    summary = summarizer(text, max_length=max_length, min_length=min_length, return_text=True)
    return summary

# Example: Get user input for text
user_input = input("Enter the text you want to summarize: ")

# Summarize the text
summarized_text = summarize_text(user_input)

# Display the original and summarized text in a formatted way
display(Markdown(f"**Original Text:**\n\n{user_input}\n\n**Summarized Text:**\n\n{summarized_text[0]['summary_text']}"))


Enter the text you want to summarize: A chronicle of what’s been shattered yields a litany of horribles: educational losses, ruined businesses, rampant mental illness, medical injury, homelessness, job upheaval and loss, depleted arts, wrecked families and communities, inflation, ruined national accounts, a generation of students traumatized, bitter political divisions, and a widespread lack of hope in the future.   That list is only a fraction of the cost. And the words above are anodyne to the real experiences of people. Whenever the subject comes up in private conversation, the result is a jaw-dropping accounting of personal despair and tragedy, often followed by tears under some circumstances. Constitutional government was shot and most of what we believed was and was not possible in public life was torched by the sheer ferocity of tyranny pushed by mostly unelected bureaucrats. 


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

**Original Text:**

A chronicle of what’s been shattered yields a litany of horribles: educational losses, ruined businesses, rampant mental illness, medical injury, homelessness, job upheaval and loss, depleted arts, wrecked families and communities, inflation, ruined national accounts, a generation of students traumatized, bitter political divisions, and a widespread lack of hope in the future.   That list is only a fraction of the cost. And the words above are anodyne to the real experiences of people. Whenever the subject comes up in private conversation, the result is a jaw-dropping accounting of personal despair and tragedy, often followed by tears under some circumstances. Constitutional government was shot and most of what we believed was and was not possible in public life was torched by the sheer ferocity of tyranny pushed by mostly unelected bureaucrats. 

**Summarized Text:**

 A chronicle of what’s been shattered yields a litany of horribles: educational losses, ruined businesses, rampant mental illness, medical injury, homelessness, job upheaval and loss . Constitutional government was shot and most of what we believed was and was not possible in public life was torched .