# **Introduction**

As you was in `chapter_1`, Tranformer models are usually very large. With millions to tens of *billions* of parameters, training and deploying these models is a complicated undertaking. Furthermore, with new models being released on a near-daily basis and each having its own implementation, trying them all out is no easy task. 

The 🤗 Transformers library was created to solve this problem. Its goal is to provide a single API through which any Transformer model can be loaded, trainied, and saved. The library's main features are: 

* **Ease of use**: Downloading, loadingm and using a state-of-the-art NLP model for inference can be done in just two lines of code. 
* **Flexibility**: At their core, all models are simple PyTorch `nn.Module` or TrnsorFlow `tf.keras.Model` classes and can be handled like any other models in their respective machine learning (ML) framworks. 
* **Simplicity**: Hardly any abstractions are made across the library. The "All in one file" is a core concept: a model's forward pass is entirely defined in a single file, so that the code itself is unserstandable and hackable. 

This last feature makes 🤗 Transformers quite different from other ML libraries. The models are not built on modules that are shared across files; instead, each model has its own layers. In addition to making the models more approachable and understandable, this allows you to easily experiment on one model without affecting others.

This chapter will begin with an end-to-end example where we use a model and a tokenizer together to replicate the `pipeline()` function introduced in `Chapter 1`. Next, we’ll discuss the model API: we’ll dive into the model and configuration classes, and show you how to load a model and how it processes numerical inputs to output predictions.

Then we’ll look at the tokenizer API, which is the other main component of the `pipeline()` function. Tokenizers take care of the first and last processing steps, handling the conversion from text to numerical inputs for the neural network, and the conversion back to text when it is needed. Finally, we’ll show you how to handle sending multiple sentences through a model in a prepared batch, then wrap it all up with a closer look at the high-level `tokenizer()` function.

# **Behind the pipeline**

Let's start with a complete, taking a look at what happened behind the scenes when we executed the following code in *Chapter 1*:

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for you.",
        "I hate this so much!"
    ]
)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9972347617149353},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

As we saw in *chapter 1*,  this pipeline groups together three steps: preprocessing, pasing the inputs through the model, and postprecessing:

![](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/full_nlp_pipeline.svg)

Let's quickly go over each of these.

## **Preprocessing with a tokenizer**

Like other neural networks, transformer models can't process raw text directly, so the first step of our pieline is to convert the text inputs into numbers that the model can make sense of. To do this we use a *tokenizer*, which will be responsible for:

* Splitting the input words, subwords, or symbols (like punctuation) that are called *tokens*. 
* Mapping each token to an integer.
* Adding additional inputs that may be useful to the model.

All this preprocessing needs to be done in exactly the same way as when the model was pretrained, so we first need to download that information from the [Model Hub](https://huggingface.co/models). To do this, we use the *`AutoTokenizer`* class and its *`from_pretrained()`* method. Using the checkpoint name of out model, it will automatically fetch the data associated with the model's tokenizer and cache it(so it's only downloaded the first time you run the code below).

Since the default checkpoint of the *`sentiment-analysis`* pipeline is *`distilbert-base-uncased-finetuned-sst-2-english`* (you can see its models card [here](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)), we run the following:

In [3]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Once we have the tokenizer, we can directly pass our sentences to it and we'll get back a dictionary that's ready to feed to our model! The only things left to do is to convert the list of input IDs to tensors. 

You can use 🤗 Transformers without having to worry about which ML framework is used as a backbend; it might be PyTorch or TensorFlow, or Flax for some models. However, Transformer models only accept *tensor* as input. If this is your first time hearing about tensors, you can think of them as NumPy arrays insted. A NumPy array van be (OD), a vector (1D), a matrix(2D), or have more dimensions. It's effectively a tensor; other ML frameworks' tensors behave similarly, and are usually as simple to instantiate as NumPy arrays. 

To specify the type of tensors we want to get back (PyTorch, TensorFlow, or plain NumPy), we use the `return_tensors` argument:

``` python
raw_input = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

input = tokenizer(raw_input, padding=True, truncation=True, return_tensors='pt')
print(input)
```
Don’t worry about padding and truncation just yet; we’ll explain those later. The main things to remember here are that you can pass one sentence or a list of sentences, as well as specifying the type of tensors you want to get back (if no type is passed, you will get a list of lists as a result).

``` python
output:> {'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172, 2607,  2026,  2878,  2166,  1012,   102],
                               [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,    0,     0,     0,     0,     0,     0]]), 
        'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                                  [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```

The output itself is a dictionary containing two keys, `input_ids` and `attention_mask`. `input_ids` contains two rows of integers (one for each sentence) that are the unique identifiers of the tokens in each sentence.

In [9]:
raw_input = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

inputs = tokenizer(raw_input, padding=True, truncation=True, return_tensors='pt')
print(input)

{'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
             0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}


### **Going through the model**

We can download our pretrained model the same way we did with our tokenizer. 🤗 Transformers provides an `AutoModel` class which also has a `from_pretrained()` method:

```python
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)
```

In this code snippet, we have downloaded the same checkpoint we used in our pipeline before (it should actually have been cached already) and instantiated a model with it. 

This architecture contains only the base Transformer module; given some inputs, it outputs what we'll call *hidden states*, also known as *feature*. for each model input, we'll retrieve a high-dimensional vector representing the **contextual understanding of that input by the transformer model**.

If this doesn't make sense, don't worry about it.

While thses hidden states can be useful on their own, they're usually inputs to another part of the model, known as the *head*. The different taks could tasks could have been performed with the same architecture, but each of these tasks will have a different head associated with it.

In [10]:
from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

### **A high-dimensinal vector?**

The vector output by the Transformer module is usually large. It generally has three dimensions:

* **Batch size**: The number of sequences processed at a time(2 in our example).
* **Sequence length**: The length of the numerical representation of the sequence (16 in our example).
* **Hidden size**: The voctor dimension of each model input. 

It is said to be "high dimentional" because of the last value. The hidden size can be very large (768 is coomen for smaller models, and in larger models this can reach 3072). 

We can see this if we feed the inputs we preprocessed to our model: 
```python 
outputs = model(**inputs)
print(outputs.last_hidden_state.shape) 
```

Note that the output of 🤗 Transformers models behave like `namedtuple`s or dictionaries. You can access the elements by attributes (like we did) or by key (`outputs["last_hidden_state"]`), or even by index if you know exactly where the things you are looking for is (`outputs[0]`).

In [11]:
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])


### **Model heads: Making sense out of numbers**

The model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension. They are usually composed of one or a few linear layers:

![](https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/transformer_and_head.svg)

The output of the Transformer model is sent directly to the model head to be processed. 

In this diagram, the model is represented by its embeddings layer and the subsequent layers. The embeddings layer converts each input ID in the tokenized input into a vector that representation of the sentences.

There are many different architectures available in 🤗 Transformers, with each one designed around tacking a specific task. 
Here is a non- exhaustive list:

* Model (retrieve the hidden states)
* ForCausalLM
* ForMaskedLM
* ForMultipleChoice
* ForQuestionAnswering
* ForSequenceClassification
* ForTokenClassification
* and others 🤗

For our example, we will need a model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the `AutoModel` class, but `AutoModelForSequenceClassification`:

In [14]:
from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits.shape)

torch.Size([2, 2])
