# Text Summarizer Example

This is a notebook showing a simple example of text summarization. The model is [DistilBart](https://huggingface.co/sshleifer/distilbart-cnn-6-6) which is a small implementation of [Bart](https://arxiv.org/abs/1910.13461).

In [1]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/ae/05/c8c55b600308dc04e95100dc8ad8a244dd800fe75dfafcf1d6348c6f6209/transformers-3.1.0-py3-none-any.whl (884kB)
[K     |████████████████████████████████| 890kB 5.3MB/s 
Collecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 27.6MB/s 
Collecting tokenizers==0.8.1.rc2
[?25l  Downloading https://files.pythonhosted.org/packages/80/83/8b9fccb9e48eeb575ee19179e2bdde0ee9a1904f97de5f02d19016b8804f/tokenizers-0.8.1rc2-cp36-cp36m-manylinux1_x86_64.whl (3.0MB)
[K     |████████████████████████████████| 3.0MB 39.6MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K 

We're going to use an example news article about an owl that got stuck down a well.

https://www.bbc.com/news/world-europe-53554755

In [26]:
!git clone https://github.com/AMontgomerie/text_summarizer
%cd text_summarizer/colab

Cloning into 'text_summarizer'...
remote: Enumerating objects: 14, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 14 (delta 3), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (14/14), done.
/content/text_summarizer/colab/text_summarizer/colab/text_summarizer/colab/text_summarizer/colab/text_summarizer/colab


In [27]:
with open('owl_rescue.txt') as file:
    text = file.read()
print(text)

A rescuer in northern Germany brought a trapped owl out of a 40m-deep (131ft) well at a ruined castle, after descending on ropes.

A local had heard the distressed eagle owl hooting from the well on Saturday and alerted the police.

The Bad Segeberg fire service pumped oxygen into the shaft and set up an abseiling rig, after failing to lure the owl into a sack with bait.

The young bird is now safely in the hands of a local bat sanctuary.

Bad Segeberg is a town just north of Lübeck and is famous for the Kalkberg, a massive gypsum rock which is topped by the ruined medieval castle.

The eagle owl rescue involved a team of 12 firefighters, plus a six-member volunteer technical team and two staff from the nearby bat centre, the fire service reported (in German).

They got to the bird just in time, as a probe lowered into the well indicated that there was little air inside. They used a powerful light and telescope to locate the owl, before sending a rescuer down with breathing apparatus.


Now we can initialize the tokenizer and the model. The inputs will be truncated at 512 tokens, so it can only process short articles. The outputs are generated using a beam search with 4 beams and the outputs are limited to 64 tokens.

In [21]:
from transformers import BartTokenizer, BartForConditionalGeneration

MODEL = 'sshleifer/distilbart-cnn-6-6'
INPUT_MAX_LEN = 512
OUTPUT_MAX_LEN = 64
BEAMS = 4

class TextSummarizer():

    def __init__(self):       
        self.model = BartForConditionalGeneration.from_pretrained(MODEL)
        self.tokenizer = BartTokenizer.from_pretrained(MODEL)

    def __call__(self, text):
        inputs = self.tokenizer.encode_plus(
            text,
            max_length=INPUT_MAX_LEN,
            truncation=True,
            return_tensors='pt'
        )

        summary_ids = self.model.generate(
            inputs['input_ids'],
            num_beams=BEAMS,
            max_length=OUTPUT_MAX_LEN,
            early_stopping=True)

        summary = self.tokenizer.decode(summary_ids[0], skip_special_tokens=True)

        return summary


summarizer = TextSummarizer()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1619.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=460021128.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




In [22]:
summarizer(text)

'A local had heard the distressed eagle owl hooting from the well on Saturday and alerted the police. The Bad Segeberg fire service pumped oxygen into the shaft and set up an abseiling rig. The young bird is now safely in the hands of a local bat sanctuary.'