# Transformers 101

Welcome to "Transformers 101," an introductory practical aiming to familiarize you with transformer-based models. Throughout this session, we'll cover essential tasks such as tokenization, model loading, inference, and building custom applications using transformer architectures.

Objectives:

Tokenization and Data Loading:
Learn the process of tokenizing text data using transformer-based tokenizers, essential for converting text into numerical representations.

Pre-trained Transformer Models:
Explore pre-trained transformer models, trained on extensive text data, and capable of performing various natural language processing tasks.

Inference with Transformer Models:
Understand how to make inferences with transformer models by inputting data and obtaining predictions or representations.

Building a BERT Classifier:
Focus on building a binary classifier using BERT, a popular transformer model, and leveraging its representations for efficient binary classification tasks.

Flan-T5 for Seq2Seq Modeling:
Utilize Flax-T5, a variant of the T5 model, to build a Seq2Seq model, enabling tasks like translation, summarization, and text generation.

### Step 1: Load the Tokenizer and Tokenize an Example Sentence

In [1]:
from transformers import BertTokenizer

# Load the tokenizer for `bert-base-uncased`
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [2]:
# Example sentence
example_sentence = "Transformers are amazing!"

In [4]:
# Apply the tokenizer on the example and look at the output
tokenized_input = tokenizer(example_sentence, return_tensors='pt')

In [5]:
tokenized_input

{'input_ids': tensor([[  101, 19081,  2024,  6429,   999,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}

### Step 2: Load the Model and Apply it on the Tokenized Input

In [6]:
from transformers import BertModel

# Load BERT model
bert_model = BertModel.from_pretrained('bert-base-uncased')

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

In [7]:
# Display the architecture
bert_model

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
  

In [9]:
# Make the inference with the BERT model
output = bert_model(**tokenized_input)

In [12]:
# Display the BERT output, extract the last_hidden_state and display the shape
output.last_hidden_state

tensor([[[ 0.1220,  0.0883,  0.1402,  ..., -0.2699,  0.3487,  0.0836],
         [ 1.5120,  0.0669, -0.2271,  ..., -0.6870,  0.7271,  0.1486],
         [ 0.5288,  0.2723,  0.1805,  ..., -0.4350,  0.5483, -0.0052],
         [ 0.7353,  0.3513, -0.4631,  ..., -0.7266,  0.3076, -0.6550],
         [-0.0383, -0.6582, -0.5449,  ...,  0.9175,  0.5790, -0.4320],
         [ 0.8342, -0.0045, -0.0415,  ...,  0.3027, -0.5258, -0.2700]]],
       grad_fn=<NativeLayerNormBackward0>)

In [13]:
output.last_hidden_state.shape

torch.Size([1, 6, 768])

### Step 3: Build a small classifier using BERT

Use the first vector of the last_hidden_state as features in your Linear classifier

In [17]:
# Build the classifier

import torch.nn as nn

class BertClassifier(nn.Module):
    def __init__(self, bert_model):
        super(BertClassifier, self).__init__()

        self.bert = bert_model
        self.dropout = nn.Dropout(self.bert.config.hidden_dropout_prob)
        self.classifier = nn.Linear(self.bert.config.hidden_size, 2)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        last_hidden_state = outputs.last_hidden_state

        first_vector = last_hidden_state[:, 0, :]
        first_vector = self.dropout(first_vector)
        logits = self.classifier(first_vector)
        return logits

In [18]:
classifier = BertClassifier(bert_model)

In [19]:
input_text = "This is a positive sentence."
input_ids = tokenizer.encode(input_text, return_tensors='pt')

In [20]:
outputs = classifier(input_ids=input_ids, attention_mask=input_ids.ne(0))
print(outputs)

tensor([[-0.0803,  0.3038]], grad_fn=<AddmmBackward0>)


### Step 4: Use flan-t5 for Sequence 2 sequence

In [22]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load `google/flan-t5-small` tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [23]:
# Load `google/flan-t5-small` model
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [34]:
s = "A step by step tutorial to learn computer science:"

In [35]:
# Tokenize the previous example
inputs = tokenizer(s, return_tensors="pt")

In [36]:
# Make the inference of the model
outputs = model.generate(**inputs)

In [37]:
# Use `batch_decode` to decode the output
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

['Computer science is a science that is taught in a computer lab.']
