## Install transformers

In [None]:
!pip install transformers

## How to handle batch inputs

In [2]:
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = tf.constant(ids)
# This line will fail.
model(input_ids)

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


InvalidArgumentError: ignored

#### Correction to match as expected by the model

In [6]:
print(ids)
input_ids = tf.constant([ids]) # convert into tf.Tensor Note : [ids]
print("Input IDs:", input_ids)
output = model(input_ids)
print("Logits:", output.logits)

[1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012]
Input IDs: tf.Tensor(
[[ 1045  1005  2310  2042  3403  2005  1037 17662 12172  2607  2026  2878
   2166  1012]], shape=(1, 14), dtype=int32)
Logits: tf.Tensor([[-2.7276192  2.8789363]], shape=(1, 2), dtype=float32)


In [7]:
# alternative way using tokenizer object and return_tensor = True
tokenized_inputs = tokenizer(sequence, return_tensors="tf")
print(tokenized_inputs["input_ids"])

tf.Tensor(
[[  101  1045  1005  2310  2042  3403  2005  1037 17662 12172  2607  2026
   2878  2166  1012   102]], shape=(1, 16), dtype=int32)


In [13]:
print(ids)
batched_ids = [ids, ids] # same batch ID for single input, so inputs should be always in batches
batched_ids = tf.constant(batched_ids)
output = model(batched_ids)
print(output.logits)

[1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012]
tf.Tensor(
[[-2.7276204  2.878937 ]
 [-2.72762    2.8789363]], shape=(2, 2), dtype=float32)


## Necessity of Attention mask

_Attention masks are tensors with the exact same shape as the input IDs tensor, filled with 0s and 1s: 1s indicate the corresponding tokens should be attended to, and 0s indicate the corresponding tokens should not be attended to (i.e., they should be ignored by the attention layers of the model)._

In [8]:
batched_ids = [
  [200, 200, 200],
  [200, 200]
]

In [9]:
padding_id = 100

batched_ids = [
  [200, 200, 200],
  [200, 200, padding_id]
]

In [14]:
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id]
]

print(model(tf.constant(sequence1_ids)).logits)
print(model(tf.constant(sequence2_ids)).logits)
print(model(tf.constant(batched_ids)).logits)

Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized: ['dropout_39']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tf.Tensor([[ 1.5693673 -1.3894573]], shape=(1, 2), dtype=float32)
tf.Tensor([[ 0.5803025 -0.4125263]], shape=(1, 2), dtype=float32)
tf.Tensor(
[[ 1.5693673 -1.3894577]
 [ 1.3373486 -1.2163193]], shape=(2, 2), dtype=float32)


Note that the 2nd logits for the squence is different for single sequence and for padded. This is because that we've not defined any attention mask which is heart of the algorithm, so for the 2nd squence it attend on [[200,200]] only but in batched_ids it attends on [200, 200, tokenizer.pad_token_id] as a whole. That is why the logits are different in both the cases

In [16]:
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id]
]

#define attention mask,1s represent tokens to attend, and 0s tokens to ignore
attention_mask = [
  [1, 1, 1],
  [1, 1, 0]
]

print('Output_seq_1 :',model(tf.constant(sequence1_ids)).logits)
print('Output_seq_2 :',model(tf.constant(sequence2_ids)).logits)
outputs = model(tf.constant(batched_ids), attention_mask=tf.constant(attention_mask))
print('After defining attetion mask : \n',outputs.logits)

Output_seq_1 : tf.Tensor([[ 1.5693673 -1.3894573]], shape=(1, 2), dtype=float32)
Output_seq_2 : tf.Tensor([[ 0.5803025 -0.4125263]], shape=(1, 2), dtype=float32)
After defining attetion mask : 
 tf.Tensor(
[[ 1.5693673  -1.3894577 ]
 [ 0.5803034  -0.41252705]], shape=(2, 2), dtype=float32)


## Longer Sequences

With Transformer models, there is a limit to the lengths of the sequences we can pass the models. Most models handle sequences of up to 512 or 1024 tokens, and will crash when asked to process longer sequences. There are two solutions to this problem:

1. Use a model with a longer supported sequence length

  Models have different supported sequence lengths, and some specialize in handling very long sequences. [Longformer](https://huggingface.co/transformers/model_doc/longformer.html) is one example, and another is [LED](https://huggingface.co/transformers/model_doc/led.html). If we’re working on a task that requires very long sequences, we may use any of the models designed to handle longer sequences.

2. Truncate your sequences.

  Otherwise, we may truncate our sequences by specifying the max_sequence_length parameter

In [None]:
sequence = sequence[:max_sequence_length]