#### Handling multiple sequences of different length ####

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [None]:
# Models expect a batch of inputs

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.tensor(ids)
# This line will fail.
model(input_ids)

The problem is that we sent a single sequence to the model, whereas ðŸ¤— Transformers models expect a batch of sequences by default.

In [None]:
# Try it again but this time add a dimension for batch

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids = torch.tensor([ids])
print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)

__Batching is the act of sending multiple sentences through the model, all at once. If you only have one sentence, you can just build a batch with a single sequence__

In [None]:
# Batch of two identical sequences

batched_ids = [ids, ids]
input_ids = torch.tensor(batched_ids)

print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)

#### Padding the inputs ####

The following list of lists cannot be converted to a tensor becasue they are of different lengths:

batched_ids = [
    [200, 200, 200],
    [200, 200]
]

In order to work around this, weâ€™ll use padding to make our tensors have a rectangular shape. 

* Padding makes sure all our sentences have the same length by adding a special word called the padding token to the sentences with fewer values. 
* For example, if you have 10 sentences with 10 words and 1 sentence with 20 words, padding will ensure all the sentences have 20 words. 
* The padding token ID can be found in tokenizer.pad_token_id. 

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
# Without padding
sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200],
]        

print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)

In our example, the resulting tensor looks like this after padding:

In [None]:
# With padding
sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]
print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)

Thereâ€™s something wrong with the logits in our batched predictions: the second row should be the same as the logits for the second sentence, but weâ€™ve got completely different values!

This is because the key feature of Transformer models is attention layers that contextualize each token. These will take into account the padding tokens since they attend to all of the tokens of a sequence. To get the same result when passing individual sentences of different lengths through the model or when passing a batch with the same sentences and padding applied, we need to tell those attention layers to ignore the padding tokens. This is done by using an __attention mask__.

#### Attention masks ####

Attention masks are tensors with the exact same shape as the input IDs tensor, filled with 0s and 1s: 1s indicate the corresponding tokens should be attended to, and 0s indicate the corresponding tokens should not be attended to (i.e., they should be ignored by the attention layers of the model).



In [None]:
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

attention_mask = [
    [1, 1, 1],
    [1, 1, 0],
]

outputs = model(torch.tensor(batched_ids), attention_mask=torch.tensor(attention_mask))
print(outputs.logits)

#### Longer sequences ####

With Transformer models, there is a limit to the lengths of the sequences we can pass the models. Most models handle sequences of up to 512 or 1024 tokens, and will crash when asked to process longer sequences. There are two solutions to this problem:

   * Use a model with a longer supported sequence length.
   * Truncate your sequences.

In [None]:
xx = tokenizer('The model should not be used to intentionally create hostile or alienating \
environments for people. In addition, the model was not trained to be factual or true representations of people or events, \
and therefore using the model to generate such content is out-of-scope for the abilities of this model. For instance, for sentences like This film was \
filmed in COUNTRY, this binary classification model will give radically different probabilities for the positive label depending on the \
country (0.89 if the country is France, but 0.08 if the country is Afghanistan) when nothing in the input indicates such a strong semantic shift. \
In this colab, AurÃ©lien GÃ©ron made an interesting map plotting these probabilities for each country. can refer to Hydrofluoric acid (a dangerously \
corrosive chemical, the aqueous solution of hydrogen fluoride gas) and High Frequency (a range of radio frequencies known for long-distance communication, \
like with amateur radio). It can also be an abbreviation for the international charity Humanity First or a variety of other organizations and technical \
terms depending on context. A 5% to 9% hydrofluoric acid gel is also commonly used to etch all ceramic dental restorations to improve bonding.[6] For similar reasons, \
dilute hydrofluoric acid is a component of household rust stain remover, in car washes in "wheel cleaner" compounds, in ceramic and fabric rust inhibitors, \
and in water spot removers.[5][7] Because of its ability to dissolve iron oxides as well as silica-based contaminants, \
hydrofluoric acid is used in pre-commissioning boilers that produce high-pressure steam. Hydrofluoric acid is also useful \
for dissolving rock samples (usually powdered) prior to analysis. In similar manner, this acid is used in acid macerations \
to extract organic fossils from silicate rocks. Fossiliferous rock may be immersed directly into the acid, or a cellulose \
nitrate film may be applied (dissolved in amyl acetate), which adheres to the organic component and allows the rock to be dissolved around it. [6] For similar reasons, \
dilute hydrofluoric acid is a component of household rust stain remover, in car washes in "wheel cleaner" compounds, in ceramic and fabric rust inhibitors, \
and in water spot removers.[5][7] Because of its ability to dissolve iron oxides as well as silica-based contaminants, \
hydrofluoric acid is used in pre-commissioning boilers that produce high-pressure steam. Hydrofluoric acid is also useful \
for dissolving rock samples (usually powdered) prior to analysis. In similar manner, this acid is used in acid macerations \
to extract organic fossils from silicate rocks. Fossiliferous rock may be immersed directly into the acid, or a cellulose \
nitrate film may be applied (dissolved in amyl acetate), which adheres to the organic component and allows the rock to be dissolved around it.')

In [None]:
print(f"Length of input_ids: {len(xx['input_ids'])}")

In [None]:
xx['input_ids'] = torch.tensor([xx['input_ids']])
model(xx['input_ids'])