In [50]:
from transformers import pipeline, PegasusForConditionalGeneration, PegasusTokenizer, DistilBertTokenizer, DistilBertModel
import torch

# Pipeline summarization

In [159]:
summarizer = pipeline('summarization')

In [101]:
inp = """
Our colleagues at the service desk are responsible for the ordering process. But are generally just very nice people to have coffee with.
Eventually, an order is placed by a specific customer. 
This customer will have a first name, last name, address, and birth date.
The order consists of multiple line items. 
Each order has an order number, an entry date, a delivery status, and a description.
A line item specifies a particular product, and defines the quantity that is ordered.

A product is characterized by a name, a description, a product number, a price, a location.
Restricted products and flammable products are types of products.

Each order is shipped by a delivery company. The delivery company has a name and an address. 
MyCorp has an incredible relationship with our delivery company, ensuring the best delivery possible.

There are two types of customer, namely gold customers, and regular customers.
We are very thankful for our generous Gold customers. 
Gold customers have a discount level.
Each customer has an account. The account has a balance, and a status.

Each order has an invoice that is sent to the customer. 
An invoice has a date, status, amount payable, and invoice address.
It can be an initial invoice, or a late payment invoice.
A payment is for a particular invoice and has an amount, date, and  bank account. 
It can either be a full payment, or a partial payment.

A product has a stock level, and a reorder level.
When the stock level drops below the reorder level, the product needs to be reordered from a supplier.

As a result, a supply request is sent to a supplier. This request is for a certain product, and has an amount and a delivery date.
Finally, the supplier submits an invoice for the product.
"""

In [162]:
summarizer(inp, min_length=300, max_length=370, num_beams=10, early_stopping=True)

[{'summary_text': ' An order is placed by a specific customer, with a first name, last name, address, and birth date . The order consists of multiple line items . Each order has an order number, an entry date, a delivery status, and a description . The delivery company has a name and an address . The supplier submits an invoice for the product . Gold customers have a discount level, and regular customers have discount levels . The product is characterized by a name, a description, a product number, a price, a location and a stock level . When the stock level drops below the reorder level, the product needs to be reordered from a supplier . The invoice has a date, status, amount payable, and invoice address. It can be an initial invoice, or a late payment invoice. A payment is for a particular invoice and has an amount, date, and  bank account. The customer has a balance and a delivery date. The delivery companies have an address and an order is shipped by a delivery company. The delive

In [167]:
summarizer(inp, min_length=400, max_length=358, num_beams=10, early_stopping=True)

Your max_length is set to 470, but you input_length is only 358. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


KeyboardInterrupt: 

# Pegasus summarization

In [123]:
model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-xsum')
tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-xsum')

In [127]:
inputs = tokenizer([inp], max_length=512, return_tensors='pt', truncation=True)

# Generate Summary
summary_ids = model.generate(inputs['input_ids'], min_length=200, max_length=324)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

['In order to place an order on MyCorp, a customer will first have to go to the service desk, where they will be asked a series of questions about MyCorp, and then they will be given a list of products that they can place an order for. Each order has an order number, an entry date, a delivery status, and a description. An invoice has a date, status, amount payable, and invoice address. When the stock level drops below the reorder level, the product needs to be reordered from a supplier. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, and has an amount and a delivery date. this request is for a certain product, a

In [149]:
model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-xsum')
tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-xsum')

In [150]:
summarizer = pipeline('summarization', model=model, tokenizer=tokenizer)

In [151]:
summarizer(inp, min_length=250, do_sample=False)

[{'summary_text': 'In order to place an order on MyCorp, a customer will first have to go to the service desk, where they will be asked a series of questions about MyCorp, and then they will be given a list of products they want to order. MyCorp has an incredible relationship with our delivery company, ensuring'}]

# DistilBERT summarization

In [76]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

inputs = tokenizer([inp], max_length=324, return_tensors='pt', truncation=True)

In [81]:
# Generate Summary
summary_ids = model.generate(inputs['input_ids'], min_length=200, max_length=324)

Input length of input_ids is 324, but ``max_length`` is set to 324.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


In [73]:
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

['our colleagues at the service desk are responsible for the ordering process . but are generally just very nice people to have coffee with . eventually , an order is placed by a specific customer . this customer will have a first name , last name , address , and birth date . the order consists of multiple line items . each order has an order number , an entry date , a delivery status , and a description . a line item specifies a particular product , and defines the quantity that is ordered . a product is characterized by a name , a description , a product number , a price , a location . restricted products and flammable products are types of products . each order is shipped by a delivery company . the delivery company has a name and an address . mycorp has an incredible relationship with our delivery company , ensuring the best delivery possible . there are two types of customer , namely gold customers , and regular customers . we are very thankful for our generous gold customers . go

In [119]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

In [121]:
inputs = tokenizer([inp], max_length=1024, return_tensors='pt', truncation=True)

# Generate Summary
summary_ids = model.generate(inputs['input_ids'], min_length=200, max_length=324, truncation=True)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

Input length of input_ids is 381, but ``max_length`` is set to 324.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


['our colleagues at the service desk are responsible for the ordering process . but are generally just very nice people to have coffee with . eventually , an order is placed by a specific customer . this customer will have a first name , last name , address , and birth date . the order consists of multiple line items . each order has an order number , an entry date , a delivery status , and a description . a line item specifies a particular product , and defines the quantity that is ordered . a product is characterized by a name , a description , a product number , a price , a location . restricted products and flammable products are types of products . each order is shipped by a delivery company . the delivery company has a name and an address . mycorp has an incredible relationship with our delivery company , ensuring the best delivery possible . there are two types of customer , namely gold customers , and regular customers . we are very thankful for our generous gold customers . go

In [168]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

In [169]:
summarizer(inp, min_length=280, max_length=340, do_sample=False)

[{'summary_text': 'The ordering process begins when a customer places an order. Each order has an order number, an entry date, a delivery status, and a description. The delivery company has a name, an address, an account, a discount level, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the 

In [None]:
summarizer = pipeline('summarization', model=model, tokenizer=tokenizer)

In [163]:
summarizer = pipeline('summarization', model='human-centered-summarization/financial-summarization-pegasus')

In [164]:
summarizer(inp, min_length=300, max_length=370, num_beams=5, early_stopping=True)

Your max_length is set to 370, but you input_length is only 358. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


[{'summary_text': 'The ordering process begins when a customer places an order. Each order has an order number, an entry date, a delivery status, and a description. The delivery company has a name, an address, a discount level, and a special delivery service for our loyal Gold customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company has a stock level, a reorder level, and a special delivery service for our regular customers. The delivery company 

In [166]:
summarizer(inp, min_length=280, max_length=340, do_sample=False)

[{'summary_text': 'The ordering process begins when a customer places an order. Each order has an order number, an entry date, a delivery status, and a description. The delivery company has a name, an address, an account, a discount level, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the items in its inventory The delivery company has an account, and a list of all the 