Next in the process of a `pipeline` we have the `Automodel`.

It works similar to `AutoTokenizer` in that it detects the model automatically.

In [None]:
from transformers import AutoModel, AutoTokenizer

# First tokenize
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
rawtext = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
tokens = tokenizer(rawtext, padding=True, truncation=True, return_tensors="pt")
print("Tokens: \n", tokens)

# Now the model
model = AutoModel.from_pretrained("bert-base-cased")
outputs = model(**tokens)
print("Outputs: \n", outputs)
print("Hidden State Shape: ", outputs.last_hidden_state.shape)


Tokens: 
 {'input_ids': tensor([[  101,   146,   112,  1396,  1151,  2613,  1111,   170, 20164, 10932,
          2271,  7954,  1736,  1139,  2006,  1297,   119,   102],
        [  101,   146,  4819,  1142,  1177,  1277,   106,   102,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}
Outputs: 
 BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.4222,  0.4443, -0.0659,  ..., -0.1958,  0.3611,  0.1284],
         [ 0.5728, -0.1593,  0.6014,  ..., -0.1134,  0.1791,  0.1787],
         [ 0.4699,  0.4214,  0.1695,  ...,  0.2386,  0.9851, -0.1236],
         ...,
         [ 0.5847,  0.2552,  0.0266,  ...,  0.7203,  0.0650,  0.4277],
         [ 0.557

Here hidden state size is $[2, 18, 768]$, since 2 is the batch size (number of examples), 18 is the max token number that we have padded the examples to, and 768 is the hidden size or the dimension of the contextualized embedding.

Now let us convert the results back to human intrepretable results: