<a href="https://colab.research.google.com/github/anishchapagain/OpenLLM/blob/main/Text_Summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Summarization**

In [1]:
from transformers import pipeline

# without document_loaders

In [2]:
text="""
Nato members have pledged their support for an "irreversible path" to future membership for Ukraine, as well as more aid.

While a formal timeline for it to join the military alliance was not agreed at a summit in Washington DC, the military alliance's 32 members said they had "unwavering" support for Ukraine's war effort.
Nato has also announced further integration with Ukraine's military and members have committed €40bn ($43.3bn, £33.7bn) in aid in the next year, including F-16 fighter jets and air defence support.
The bloc's Secretary-General Jens Stoltenberg said: "Support to Ukraine is not charity - it is in our own security interest."
"""

In [4]:
# https://huggingface.co/facebook/bart-large-cnn, BART model pre-trained on English language, and fine-tuned on CNN Daily Mail.

summarize = pipeline("summarization", model="facebook/bart-large-cnn")

In [5]:
len(text)

650

In [7]:
summarize(text, max_length=200)

# Your max_length is set to 200, but your input_length is only 156.
# Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=78)


Your max_length is set to 200, but your input_length is only 156. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=78)


[{'summary_text': 'Nato members have pledged their support for an "irreversible path" to future membership for Ukraine. The military alliance\'s 32 members said they had "unwavering" support for Ukraine\'s war effort. Members have committed €40bn ($43.3bn, £33.7bn) in aid in the next year.'}]

Check the difference betwwen the last cell and the one below.


In [6]:

output_text = summarize(text, min_length=5, max_length=140)[0]['summary_text']
output_text

'Nato members have pledged their support for an "irreversible path" to future membership for Ukraine. Members have committed €40bn ($43.3bn, £33.7bn) in aid in the next year.'

# Let's check the sentiment on the summarized text returned.

In [8]:
classifier = pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [9]:
classifier(output_text)

[{'label': 'POSITIVE', 'score': 0.8508466482162476}]

**Multi-label** classifier makes more sense here

In [10]:
multilabel_classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions")

multilabel_classifier(output_text) # 'label': 'POSITIVE', 'score': 0.8508466482162476

config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/380 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

[{'label': 'neutral', 'score': 0.6524327397346497}]

**top_k=None**

In [11]:
multilabel_classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=None)

multilabel_classifier(output_text) # 'label': 'POSITIVE', 'score': 0.8508466482162476

[[{'label': 'neutral', 'score': 0.6524327397346497},
  {'label': 'approval', 'score': 0.3031771779060364},
  {'label': 'optimism', 'score': 0.058593738824129105},
  {'label': 'desire', 'score': 0.022525442764163017},
  {'label': 'realization', 'score': 0.012457828968763351},
  {'label': 'caring', 'score': 0.009569250978529453},
  {'label': 'admiration', 'score': 0.00652693398296833},
  {'label': 'love', 'score': 0.004447152838110924},
  {'label': 'disapproval', 'score': 0.003290079068392515},
  {'label': 'annoyance', 'score': 0.002489931182935834},
  {'label': 'joy', 'score': 0.002421693643555045},
  {'label': 'excitement', 'score': 0.0023391384165734053},
  {'label': 'gratitude', 'score': 0.0018707935232669115},
  {'label': 'confusion', 'score': 0.0016802680911496282},
  {'label': 'disappointment', 'score': 0.0016386595088988543},
  {'label': 'curiosity', 'score': 0.001561611657962203},
  {'label': 'relief', 'score': 0.0012624586233869195},
  {'label': 'sadness', 'score': 0.0009419094

Summarized text looks positive, neutral and carry labels approval.