# Inside the `pipeline` function

Huggingface `pipeline` function is super simple & now we are going to see what's inside it & how it works.

In [1]:
!pip install transformers

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
from transformers import pipeline
classi = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
classi(
    [
        "I'm a good person",
        "I think that image is NSFW"
    ]
)

[{'label': 'POSITIVE', 'score': 0.9998700618743896},
 {'label': 'NEGATIVE', 'score': 0.9950696229934692}]

So basically, this is what's happening inside the `pipeline` function.
![image.png](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg)

**We are going to dig ino that in this section**

## Model Checkpoint

We need a model checkpoint in the first place. All the steps in the process in depend on it. We can see the name of the checkpoint in the above code cell.

In [4]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"

## Tokenizer

Let's build the Tokenizer

In [5]:
from transformers import AutoTokenizer
tokz = AutoTokenizer.from_pretrained(checkpoint)

In [6]:
input = [
    "I'm a good person",
    "I think that image is NSFW",
    "Wow. I love it"
]

In [7]:

tokens = tokz(
    input,
    padding=True, truncation=True, return_tensors="pt"
)
tokens

{'input_ids': tensor([[  101,  1045,  1005,  1049,  1037,  2204,  2711,   102,     0,     0],
        [  101,  1045,  2228,  2008,  3746,  2003, 24978,  2546,  2860,   102],
        [  101, 10166,  1012,  1045,  2293,  2009,   102,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])}

See, We have `101` and `102` at the start & end of the sentences. Also, `attention_mask` tells that values with `1` are the indcies which are valid.

## Using the Model

In [8]:
from transformers import AutoModel
model = AutoModel.from_pretrained(checkpoint)

Some weights of the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing DistilBertModel: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [9]:
model_output = model(**tokens)
model_output

BaseModelOutput(last_hidden_state=tensor([[[ 0.7946,  0.1293,  0.0419,  ...,  0.6159,  0.9946, -0.3216],
         [ 0.9947,  0.1156, -0.0252,  ...,  0.5007,  1.1025, -0.2222],
         [ 1.3280, -0.0156,  0.3877,  ...,  0.5459,  0.9096, -0.6583],
         ...,
         [ 1.2875,  0.0307,  0.6650,  ...,  0.6481,  0.6350, -0.7387],
         [ 0.7634,  0.0239, -0.1784,  ...,  0.7843,  1.0633, -0.2916],
         [ 0.8091,  0.0793, -0.1979,  ...,  0.7488,  1.0255, -0.2681]],

        [[-0.0835,  0.2276, -0.3575,  ..., -0.1147,  0.1530,  0.2999],
         [-0.2791,  0.3077, -0.4815,  ..., -0.0912,  0.0457,  0.1611],
         [-0.1804,  0.4280, -0.3141,  ..., -0.2418, -0.0639,  0.2643],
         ...,
         [-0.0749,  0.4424, -0.1150,  ..., -0.4032, -0.3183,  0.3057],
         [-0.1020,  0.2020, -0.3881,  ..., -0.0353, -0.2167,  0.0915],
         [ 0.3149,  0.3561, -0.2138,  ..., -0.2912, -0.3920, -0.1619]],

        [[ 0.5813,  0.2317,  0.0877,  ...,  0.3930,  0.9671, -0.5827],
         [ 

In [10]:
model_output.last_hidden_state.shape

torch.Size([3, 10, 768])

**This is not the result we want**

This is some generic response for the model. But we need to something specific for the classification task we are doing.

## AutoModel for Classifying

In [11]:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

In [12]:
output = model(**tokens)
output

SequenceClassifierOutput(loss=None, logits=tensor([[-4.2700,  4.6784],
        [ 2.8783, -2.4291],
        [-4.3149,  4.6853]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [13]:
output.logits.shape

torch.Size([3, 2])

Okay. This seems like what we need. Let's try to build the prediction out from this.

## Calculating the Predictions

In [14]:
import torch

In [15]:
preds = torch.nn.functional.softmax(output.logits, dim=1)
preds

tensor([[1.2992e-04, 9.9987e-01],
        [9.9507e-01, 4.9304e-03],
        [1.2337e-04, 9.9988e-01]], grad_fn=<SoftmaxBackward0>)

In [16]:
result = torch.argmax(preds, dim=1)
result

tensor([1, 0, 1])

In [17]:
# These are categories for above indexes
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

**Here comes the result**

In [18]:
input

["I'm a good person", 'I think that image is NSFW', 'Wow. I love it']

In [19]:
[model.config.id2label[i.item()] for i in result]

['POSITIVE', 'NEGATIVE', 'POSITIVE']