## What happens inside the pipeline function?


Installing the transformers,datasets and evaluate libraries to run this notebook.

In [7]:
!pip install datasets evaluate transformers[sentencepiece]



In [8]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"])

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

The pipeline groups three steps: <b>Preprocessing</b>, <b>passing inputs through the model</b> and <b>Postprocessing</b><br>
----> Tokenizer ----> Model ----> Post Processing ----> <br>
<b>Tokenization:</b><br>
Step 1: To convert the raw text inputs to numbers<br>
      <li>Split input into words, subwords or symbols - <b>tokens</b></li>
      <li>Map each token to integer</li>
      <li>Adding additional inputs that may be useful to the model</li>

In [9]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [10]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


#### Going through the model
We can download the pretrained model from <i>AutoModel</i> class which has <i>from_pretrained()</i> method.

In [11]:
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

This Transformer architecture, given some inputs, outputs <i>hidden states</i> also known as <i>features</i><br>
For each model input, we will get a high-dimensional vector representing the contextual understanding of that input by the Transformer model.<br>
These hidden states are useful on their own. They are also the inputs to the <b>head</b> of the model.<br>
Different tasks have different heads associated.

### A high-dimensional vector
<li><b>Batch size</b>: The number of sequences processed at a time.</li>
<li><b>Sequence length</b>: The length of the numerical representation of the sequence</li>
<li><b>Hidden size</b>: The vector dimension of each model input. 768 for small models upto 3072 for large models.</li>

In [12]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])


The model head take the high-dimensional vector of hidden states as input and project them onto a different dimension. They are usually composed of one of a few linear layers.

In [13]:
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

In [14]:
print(outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [15]:
print(outputs.logits.shape)

torch.Size([2, 2])


In [16]:
print(outputs.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>)


In [17]:
import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim = -1)
print(predictions)

tensor([[4.0195e-02, 9.5980e-01],
        [9.9946e-01, 5.4418e-04]], grad_fn=<SoftmaxBackward0>)


In [18]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}