## Transformers Pipelines

> The pipeline() is the easiest and fastest way to use a pretrained model for inference. 
>
> You can use the pipeline() out-of-the-box for many tasks across different modalities

### Example Pipeline - "Summarization"

> Summarize news articles and other documents.

for example, "Summarize the following news article: ' ... '"


In [1]:
# import pipeline() convenience function 

from transformers import pipeline


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
my_summarizer = pipeline("summarization")

{"model": my_summarizer.model.name_or_path, "tokenizer": my_summarizer.tokenizer.name_or_path}

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'model': 'sshleifer/distilbart-cnn-12-6',
 'tokenizer': 'sshleifer/distilbart-cnn-12-6'}

In [3]:
bart_model_overview = """
The Bart model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.

According to the abstract,

Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
Tips:

BART is a model with absolute position embeddings so it’s usually advised to pad the inputs on the right rather than the left.

Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of the following transformations are applied on the pretraining tasks for the encoder:

mask random tokens (like in BERT)
delete random tokens
mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
permute sentences
rotate the document to make it start at a specific token
"""

In [4]:
summary = my_summarizer(bart_model_overview)

from pprint import pprint
pprint(summary)

[{'summary_text': ' The Bart model was proposed in BART: Denoising '
                  'Sequence-to-Sequence Pre-training for Natural Language '
                  'Generation, Translation, and Comprehension . It uses a '
                  'standard seq2seq/machine translation architecture with a '
                  'bidirectional encoder (like BERT) The pretraining task '
                  'involves randomly shuffling the order of the original '
                  'sentences and a novel in-filling scheme .'}]


## Let's do it _without_ the pipeline factory

### Auto-Tokenizer

Prepares input for Model

> Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model.
>
> https://huggingface.co/docs/transformers/v4.33.2/en/autoclass_tutorial#autotokenizer

### Model: BartForConditionalGeneration

https://huggingface.co/docs/transformers/v4.33.2/en/model_doc/bart#transformers.BartForConditionalGeneration

Well, perhaps you understood the previous summary... 🙃


In [5]:
from transformers import AutoTokenizer, BartForConditionalGeneration

# create model and tokenizer from model id
# 
# model class: https://huggingface.co/docs/transformers/v4.33.2/en/model_doc/bart#transformers.BartForConditionalGeneration
# auto-tokenizer: 

model = BartForConditionalGeneration.from_pretrained("sshleifer/distilbart-cnn-12-6")
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")

{"model": my_summarizer.model.name_or_path, "tokenizer": my_summarizer.tokenizer.name_or_path}

{'model': 'sshleifer/distilbart-cnn-12-6',
 'tokenizer': 'sshleifer/distilbart-cnn-12-6'}

In [6]:
# create pytorch tensor of tokenized input
inputs = tokenizer([bart_model_overview], return_tensors="pt")

inputs

from pprint import pprint
pprint(inputs)

{'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         

In [8]:
# Generate Summary
summary_ids = model.generate(inputs["input_ids"])

summary_ids

tensor([[    2,     0,    20,  8811,  1421,    21,  1850,    11, 30634,    35,
          6743,   139,  3009, 47134,    12,   560,    12, 48245,  4086,  5048,
            12, 32530,    13,  7278, 22205, 17362,     6, 41737,     6,     8,
         10081, 43341,  1499,   479,    85,  2856,     5,   819,     9,  3830,
         11126, 38495,    19, 10451,  1058,  1915,    15, 12209,  9162,     8,
           208, 12444,  2606,   479,    85, 35499,    92,   194,    12,  1116,
            12,   627,    12,  2013,   775,    15,    10,  1186,     9, 20372,
          2088,  6054,     6,   864, 15635,     6,     8, 39186,  1938,  8558,
             6,    19,  3077,     9,    62,     7,   231,   248,  5061,  8800,
           479,     2]])

In [9]:
# decode and print summary

summary = tokenizer.batch_decode(summary_ids)

from pprint import pprint
pprint(summary)


['</s><s> The Bart model was proposed in BART: Denoising Sequence-to-Sequence '
 'Pre-training for Natural Language Generation, Translation, and '
 'Comprehension. It matches the performance of RoBERTa with comparable '
 'training resources on GLUE and SQuAD. It achieves new state-of-the-art '
 'results on a range of abstractive dialogue, question answering, and '
 'summarization tasks, with gains of up to 6 ROUGE.</s>']
