# Outline

In [67]:
import transformers

classifier = transformers.pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598050713539124},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

# Behind the Pipeline API

4 Stage Pipeline
1. dataset
2. language tokenizer
3. pretrained_model
4. model output
![](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg)

## 1. initialize tokenizer & model

Stages of learning a new Language
1. Learn vocabulary
2. Learn meaning of words
3. Learn syntax & grammar rules
4. Speak the language

Similarly, In Artificial NLP
1. Build vocabulary with `tokenizer` & map it to numbers
2. Replace numbers with vectors with semantic meaning via embedding_value via `model`
3. Learn syntax & grammar rules via `model`
4. Inference

In [None]:
import torch

demo_sentence                   = "three words sentence"    # One sentence containing 3 Words
word_to_number_mapping          = [4, 6, 3 ]                # 3 Words, Each number mapped to a Word
one_number_to_meaning_mapping   = torch.zeros(512)          # Meaning in 512 int vector
demo_sentence_with_meaning      = torch.zeros(3, 512)       # 3 Words, Each word 512 int

### Pytorch vs Tensorflow code

```python
import transformers

checkpoint  = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer   = transformers. AutoTokenizer.                        from_pretrained(checkpoint)
model       = transformers. TFAutoModel.                          from_pretrained(checkpoint)
model       = transformers. TFAutoModelForSequenceClassification. from_pretrained(checkpoint)

tokenizer   = transformers. AutoTokenizer.                        from_pretrained(checkpoint)
model       = transformers. AutoModel.                            from_pretrained(checkpoint)
model       = transformers. AutoModelForSequenceClassification.   from_pretrained(checkpoint)
```

In [37]:
import transformers

checkpoint          = "bert-base-cased"

tokenizer           = transformers. AutoTokenizer.                        from_pretrained(checkpoint)

model_auto          = transformers. AutoModel.                            from_pretrained(checkpoint)
model_auto_sequence = transformers. AutoModelForSequenceClassification.   from_pretrained(checkpoint)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 2. tokenizer & input to model

In [60]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

numeric_ids = tokenizer(raw_inputs , padding=True , return_tensors="pt")    # Numeric ids => as PYTORCH TENSORS

numeric_ids

{'input_ids': tensor([[  101,   146,   112,  1396,  1151,  2613,  1111,   170, 20164, 10932,
          2271,  7954,  1736,  1139,  2006,  1297,   119,   102],
        [  101,   146,  4819,  1142,  1177,  1277,   106,   102,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}

In [61]:
print(f'tokenizer returns TWO things => {numeric_ids.keys()}')

index = 0
while index < len(numeric_ids['input_ids']):
    print(f"Row Number => {index+1}, \n\t input_ids => { numeric_ids['input_ids'][index]}, \n\t attention_mask => {numeric_ids['attention_mask'][index]}")
    index = index + 1

tokenizer returns TWO things => dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])
Row Number => 1, 
	 input_ids => tensor([  101,   146,   112,  1396,  1151,  2613,  1111,   170, 20164, 10932,
         2271,  7954,  1736,  1139,  2006,  1297,   119,   102]), 
	 attention_mask => tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Row Number => 2, 
	 input_ids => tensor([ 101,  146, 4819, 1142, 1177, 1277,  106,  102,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0]), 
	 attention_mask => tensor([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])


## 3. model output

### Auto model

In [62]:
outputs_auto = model_auto     (**numeric_ids)

outputs_auto

BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.4222,  0.4443, -0.0659,  ..., -0.1958,  0.3611,  0.1284],
         [ 0.5728, -0.1593,  0.6014,  ..., -0.1134,  0.1791,  0.1787],
         [ 0.4699,  0.4214,  0.1695,  ...,  0.2386,  0.9851, -0.1236],
         ...,
         [ 0.5847,  0.2552,  0.0266,  ...,  0.7203,  0.0650,  0.4277],
         [ 0.5573,  0.4506,  0.0353,  ..., -0.0607,  0.4209, -0.2525],
         [ 0.7136,  1.2932, -0.2937,  ...,  0.2917,  0.4270, -0.3874]],

        [[ 0.4829,  0.4291,  0.0264,  ..., -0.1489,  0.2953, -0.3113],
         [ 0.2735,  0.4520,  0.2760,  ...,  0.2572,  0.2059,  0.4097],
         [ 0.1391,  0.4234, -0.3385,  ...,  0.5858, -0.0834,  0.4344],
         ...,
         [ 0.0709,  0.4650, -0.1060,  ...,  0.2954,  0.1990,  0.1774],
         [ 0.1649,  0.4855, -0.0801,  ...,  0.3485,  0.1970,  0.1701],
         [ 0.2887,  0.4945,  0.0196,  ...,  0.2895,  0.2056,  0.0399]]],
       grad_fn=<NativeLayerNormBackward0>), pooler_ou

In [68]:
all_extracted_features_map  = outputs_auto['last_hidden_state']
compressed_features         = outputs_auto['pooler_output']

print(all_extracted_features_map.shape, compressed_features.shape)

torch.Size([2, 18, 768]) torch.Size([2, 768])


In [63]:
print(f'MODEL_AUTO OUTPUT FORMAT: {vars(outputs_auto).keys()} ' )

MODEL_AUTO OUTPUT FORMAT: dict_keys(['last_hidden_state', 'pooler_output', 'hidden_states', 'past_key_values', 'attentions', 'cross_attentions']) 


In [66]:
BATCH_SIZE, SENTENCE_LENGTH, WORD_EMBEDDING_SIZE = outputs_auto['last_hidden_state'].shape
print(outputs_auto['last_hidden_state'].shape)
print(f'Number of Senteces = {BATCH_SIZE}, Num of Words in Each Sentece = {SENTENCE_LENGTH}, Every Word Embedding Length = {WORD_EMBEDDING_SIZE}')

# EMBEDDING SIZE or HIDDEN SIZE

torch.Size([2, 18, 768])
Number of Senteces = 2, Num of Words in Each Sentece = 18, Every Word Embedding Length = 768


### Classification model

In [30]:
outputs_auto_sequence = model_auto_sequence     (**numeric_ids)
print(f'MODEL_AUTO_SEQUENCE OUTPUT FORMAT: {vars(outputs_auto_sequence)} ' )

MODEL_AUTO_SEQUENCE OUTPUT FORMAT: {'loss': None, 'logits': tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]], grad_fn=<AddmmBackward0>), 'hidden_states': None, 'attentions': None} 


In [74]:
import torch

from transformers import AutoModelForSequenceClassification

checkpoint = "bert-base-cased"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**numeric_ids)

print(outputs.logits)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

print(model.config.id2label)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tensor([[0.1532, 0.4257],
        [0.2897, 0.3980]], grad_fn=<AddmmBackward0>)
tensor([[0.4323, 0.5677],
        [0.4730, 0.5270]], grad_fn=<SoftmaxBackward0>)
{0: 'LABEL_0', 1: 'LABEL_1'}


### Model Architectures

In [31]:
model_auto

DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0-5): 6 x TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias=True)
          (lin2): Li

In [32]:
model_auto_sequence

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

Terms

1. Head
2. Hidden State
3. 