# Behind the pipeline - replicate
> without looking into the original notebook

## The pipeline

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [19]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [20]:
sentence1 = '''
I cannot understand how people are rating this highly. 
    Unlike what I’ve come to expect from Larson, the narrative is bland, attempts at “suspense” fall flat, 
    and every person is vapid and one-dimensional. 
    This provides no insight whatsoever into events and people that have been well-chronicled elsewhere. 
    This book has neither a point nor a point of view'''

In [21]:
sentence2 = '''
This is one of those books that is hard to put down. 
    Easy to read (even though I don't know anything about the Navy or submarines) while it is PACKED with so much leadership truth. 
    My favorite part? It's not a "this is the way things must be done" kind of book. 
    He raises serious problems and tells you how he attacked them. 
    He shows his strengths and weaknesses in leadership. 
    I greatly appreciated this read. 
'''

In [22]:
sentence3 = '''
Sometimes insightful, sometimes preachy, mostly making sense. 
    Some learnings seem retrospectively added as Naval amassed massive wealth.
    A decent read on the principles and philosophy of one of the most revered investors of our times
'''

In [23]:
classifier([
    sentence1,
    sentence2,
    sentence3
])

[{'label': 'NEGATIVE', 'score': 0.9997822642326355},
 {'label': 'POSITIVE', 'score': 0.998445451259613},
 {'label': 'POSITIVE', 'score': 0.9997103810310364}]

## Behind the pipeline

### Tokenizer

In [24]:
from transformers import AutoTokenizer

In [25]:
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

In [35]:
inputs = tokenizer([sentence1, sentence2, sentence3], return_tensors="pt", padding=True, truncation=True)

In [36]:
inputs

{'input_ids': tensor([[  101,  1045,  3685,  3305,  2129,  2111,  2024,  5790,  2023,  3811,
          1012,  4406,  2054,  1045,  1521,  2310,  2272,  2000,  5987,  2013,
         21213,  1010,  1996,  7984,  2003, 20857,  1010,  4740,  2012,  1523,
         23873,  1524,  2991,  4257,  1010,  1998,  2296,  2711,  2003, 12436,
         23267,  1998,  2028,  1011,  8789,  1012,  2023,  3640,  2053, 12369,
         18971,  2046,  2824,  1998,  2111,  2008,  2031,  2042,  2092,  1011,
          9519,  2094,  6974,  1012,  2023,  2338,  2038,  4445,  1037,  2391,
          4496,  1037,  2391,  1997,  3193,   102,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0],
        [  101,  2023,  2003,  2028,  1997,  2216,  2808,  2008,  2003,  2524,
          2000,  2404,  2091,  1012,  3733,  2000,  3191,  1006,  2130,  2295,
          1045,  2123,  1005,  1056,  2113,  2505,  2055,  1996,  3212,  2030,
       

In [37]:
inputs.input_ids.shape

torch.Size([3, 93])

In [38]:
inputs.attention_mask.shape

torch.Size([3, 93])

## The Model without Head

In [39]:
from transformers import AutoModel

In [40]:
model = AutoModel.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

In [44]:
outputs = model(**inputs); outputs.last_hidden_state.shape

torch.Size([3, 93, 768])

Batch size is 3 <br/>
Sequence Length is 93 <br/>
Hidden size is 768

## The Model with Head

In [45]:
from transformers import AutoModelForSequenceClassification

In [46]:
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

In [47]:
outputs = model(**inputs); outputs.logits

tensor([[ 4.6795, -3.7526],
        [-3.1962,  3.2688],
        [-3.9479,  4.1987]], grad_fn=<AddmmBackward0>)

## Softmax

In [53]:
import torch
from torch.nn.functional import softmax

In [55]:
y = softmax(outputs.logits, dim=-1); y

tensor([[9.9978e-01, 2.1771e-04],
        [1.5545e-03, 9.9845e-01],
        [2.8963e-04, 9.9971e-01]], grad_fn=<SoftmaxBackward0>)

In [69]:
z = torch.argmax(y, dim=-1); z

tensor([0, 1, 1])

In [61]:
model.config.id2label

{0: 'NEGATIVE', 1: 'POSITIVE'}

In [73]:
list(map(lambda x: model.config.id2label.get(int(x)), torch.argmax(y, dim=-1)))

['NEGATIVE', 'POSITIVE', 'POSITIVE']

The above output is same as output from pipeline