Pipelines groups together three steps: preprocessing, passing the inputs through the model, and postprocessing.

### Preprocessing with tokenizer

* Transformers can't process raw text directly
* Need to convert text inputs into numbers that the model can make sense of 
* Tokenzizer does the following things:
    * Split input into words, subwords, called tokens
    * Map each token to an integer
    * Add additional inputs that maybe useful to the model

These processes need to be done in the same way it was done on the orignal transformer model. 

In [None]:
from transformers import AutoTokenizer

# Load tokenizer using in our transformer model
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

# Specifty type of tensor using return_tensors
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

Output for PyTorch tensors will be


```py
{
    'input_ids': tensor([
        [  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172, 2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,     0,     0,     0,     0,     0,     0]
    ]), 
    'attention_mask': tensor([
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
    ])
}

```

### Going through the model

In [None]:
from transformers import AutoModel

# Using the same checkpoint
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

This architecture only contains the base Transformer module
* Given some inputs, it outputs hidden states, or features
* For each model input, we'll retrieve high-dimesional vector representing contextual understanding

Vector Shape = (Batch Size, Sequence Length, Hidden Size)

In [None]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

Output will be
```py
torch.Size([2, 16, 768])
```

In [None]:
# For classification tasks, using AutoModelForSequenceClassification
# It will return (batch_size, logits) for each class
from transformers import AutoModelForSequenceClassification


checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)