## Intro
In this notebook, we will go through each of the steps taken behind `pipline` separately and then compare the results to ensure that the step were correctly taken.

## Tokenizer

In [8]:
from transformers import AutoTokenizer

# Load the tokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Raw input
raw_inputs = [
    "These days the weather has been a little too hot.",
    "I really want this for you, girl!",
    "You'd better not bring her into this!"
]

# Tokenize
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
inputs

{'input_ids': tensor([[ 101, 2122, 2420, 1996, 4633, 2038, 2042, 1037, 2210, 2205, 2980, 1012,
          102],
        [ 101, 1045, 2428, 2215, 2023, 2005, 2017, 1010, 2611,  999,  102,    0,
            0],
        [ 101, 2017, 1005, 1040, 2488, 2025, 3288, 2014, 2046, 2023,  999,  102,
            0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])}

## Model


In [11]:
from transformers import AutoModelForSequenceClassification

# Load model from checkpoint
classifier = AutoModelForSequenceClassification.from_pretrained(checkpoint)

# Estimate logits
outputs = classifier(**inputs)
outputs

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.0402, -3.3439],
        [-4.0005,  4.2854],
        [ 3.7932, -3.1592]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

## Post Processing

In [36]:
from torch.nn.functional import softmax

# Convert logits to probabilities
predictions = softmax(outputs.logits, dim=-1)

# Find the labels
print(classifier.config.id2label)

for i in range(predictions.shape[0]):
  print([item.item() for item in predictions[i]])

for i in predictions.argmax(axis=1):
  print(classifier.config.id2label[i.item()])

{0: 'NEGATIVE', 1: 'POSITIVE'}
[0.9993792772293091, 0.0006206354591995478]
[0.0002519783447496593, 0.9997480511665344]
[0.9990445971488953, 0.0009554324205964804]
NEGATIVE
POSITIVE
NEGATIVE


## Compare with Pipline

In [22]:
from transformers import pipeline

# pipeline
pipeline = pipeline("sentiment-analysis", model=checkpoint)

# predict
pipeline(raw_inputs)


Device set to use cuda:0


[{'label': 'NEGATIVE', 'score': 0.9993792772293091},
 {'label': 'POSITIVE', 'score': 0.9997480511665344},
 {'label': 'NEGATIVE', 'score': 0.9990445971488953}]

Matching the values we can see that our step-by-step implementation of `pipeline`, matches up well with using the high-level API call.