# Сравнение Transformers и Adapters

## Environment loading

In [None]:
# !pip install transformers 

In [59]:
!pip install evaluate datasets wandb adapter-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [1]:
import pandas as pd
import numpy as np
import random

from transformers import pipeline
import datasets
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer
import evaluate

import wandb
# wandb.login()
from IPython.display import display, HTML

In [None]:
%env WANDB_PROJECT = SentAnalysis_BERT_vs_Adapter

env: WANDB_PROJECT=SentAnalysis_BERT_vs_Adapter


## Loading the dataset

In [2]:
dataset = load_dataset("ag_news")
dataset["train"][100]



  0%|          | 0/2 [00:00<?, ?it/s]

{'text': 'Comets, Asteroids and Planets around a Nearby Star (SPACE.com) SPACE.com - A nearby star thought to harbor comets and asteroids now appears to be home to planets, too. The presumed worlds are smaller than Jupiter and could be as tiny as Pluto, new observations suggest.',
 'label': 3}

In [3]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [4]:
show_random_elements(dataset["train"])

Unnamed: 0,text,label
0,"Amazon to Buy Chinese Retailer Joyo.com Internet retailer Amazon.com Inc. said on Thursday that it will buy Joyo.com Ltd., which\runs some of China's biggest retail Web sites, for about \$75 million to gain entry into China's fast-growing market.",Sci/Tech
1,"Spain: Madrid and Valencia both win and consolidate their &lt;b&gt;...&lt;/b&gt; MADRID, Nov 28 (SW) - A few magic minutes early in the second half were enough for Real Madrid to decide their home match against Levante.",Sports
2,"Malaysia's Anwar Seeks to Clear Name, Remove Ban PUTRAJAYA, Malaysia (Reuters) - Malaysian rebel politician Anwar Ibrahim, freed from jail last week, sought on Monday to clear the way for his return to active politics when his lawyers moved to have his criminal record wiped clean.",World
3,"Russians eyed in abductions ARGUN, Russia -- Just before sunrise one morning this month, a dozen armed men in camouflage uniforms and black masks burst into the house of Zalpa Mintayeva, shouting, ''Do you have a man at home? quot;",World
4,"Floods kill more than 600 people in Haiti, hundreds left homeless Rescuers dug through mud and ruined homes for bodies Tuesday, expecting the death toll of more than 600 from tropical storm Jeanne to rise even further as flood waters receded in this crowded northern city after devastating winds",World
5,"Egyptian Animals Were Mummified Same Way as Humans A new study suggests the ancient Egyptians put as much care into mummifying some cats, birds, and other animals as they did into preserving human corpses.",Sci/Tech
6,"Late Kalou goal seals Feyenoord win A late strike by Salomon Kalou sealed a 2-1 win for Feyenoord over NEC Nijmegen, while second placed AZ Alkmaar defeated ADO Den Haag 2-0 in the Dutch first division on Sunday.",Sports
7,"Time will pay \$510m in AOL settlement TIME WARNER has agreed to pay \$510 million (262 million) to settle claims that America Online (AOL), its internet division, overstated its earnings capacity.",Business
8,Prso the Rangers saviour RANGERS SURVIVED AN intense inspection of their title credentials at Easter Road yesterday before emerging with the points thanks to a second-half penalty from Dado Prso.,Sports
9,Juve Lead Is Cut AC Milan closed the gap on leaders Juventus to four points after receiving a helping hand from city-rivals Inter. Having already won their game at Chievo earlier in the day thanks to on-loan Chelsea striker,Sports


## Preprocessing the data

Before we can feed those texts to our model, we need to preprocess them. This is done by a Transformers **Tokenizer** which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires.

To do all of this, we instantiate our tokenizer with the **AutoTokenizer.from_pretrained** method, which will ensure:

- we get a tokenizer that corresponds to the model architecture we want to use,
- we download the vocabulary used when pretraining this specific checkpoint.

That vocabulary will be cached, so it's not downloaded again the next time we run the cell.

In [5]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

We pass along **use_fast=True** to the call above to use one of the fast tokenizers (backed by Rust) from the Tokenizers library. Those fast tokenizers are available for almost all models, but if you got an error with the previous call, remove that argument.

In [None]:
tokenizer("Hello, this one sentence!", "And this sentence goes with it.")

{'input_ids': [101, 7592, 1010, 2023, 2028, 6251, 999, 102, 1998, 2023, 6251, 3632, 2007, 2009, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

Depending on the model you selected, you will see different keys in the dictionary returned by the cell above. They don't matter much for what we're doing here (just know they are required by the model we will instantiate later), you can learn more about them in this tutorial if you're interested.

To preprocess our dataset, we will thus need the names of the columns containing the sentence(s). The following dictionary keeps track of the correspondence task to column names:

In [6]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)



Even better, the results are automatically cached by the Datasets library to avoid spending time on this step the next time you run your notebook. The Datasets library is normally smart enough to detect when the function you pass to map has changed (and thus requires to not use the cache data). For instance, it will properly detect if you change the task in the first cell and rerun the notebook. Datasets warns you when it uses cached files, you can pass load_from_cache_file=False in the call to map to not use the cached files and force the preprocessing to be applied again.

Note that we passed batched=True to encode the texts by batches together. This is to leverage the full benefit of the fast tokenizer we loaded earlier, which will use multi-threading to treat the texts in a batch concurrently.

In [7]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))



## Transformers model

### Fine-tuning the model

Now that our data is ready, we can download the pretrained model and fine-tune it. Since all our tasks are about sentence classification, we use the AutoModelForSequenceClassification class. Like with the tokenizer, the from_pretrained method will download and cache the model for us. The only thing we have to specify is the number of labels for our problem.

In [None]:
model_bert = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=5)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

The warning is telling us we are throwing away some weights (the vocab_transform and vocab_layer_norm layers) and randomly initializing some other (the pre_classifier and classifier layers). This is absolutely normal in this case, because we are removing the head used to pretrain the model on a masked language modeling objective and replacing it with a new head for which we don't have pretrained weights, so the library warns us we should fine-tune this model before using it for inference, which is exactly what we are going to do.

To instantiate a Trainer, we will need to define two more things. The most important is the TrainingArguments, which is a class that contains all the attributes to customize the training. It requires one folder name, which will be used to save the checkpoints of the model, and all other arguments are optional:

In [None]:
batch_size = 16

In [None]:
training_args = TrainingArguments(
    output_dir="test_trainer",
    evaluation_strategy="epoch",
    save_strategy ="epoch",
    optim="adamw_torch",
    # learning_rate=2e-5,
    # weight_decay=0.01,
    # adam_beta1=0.9,
    # adam_beta2=0.999,
    # adam_epsilon=1e-08,
    # per_device_train_batch_size=batch_size,
    # per_device_eval_batch_size=batch_size,
    num_train_epochs=1,
    load_best_model_at_end=True,
    report_to="wandb",
)



Here we set the evaluation to be done at the end of each epoch, tweak the learning rate, use the batch_size defined at the top of the notebook and customize the number of epochs for training, as well as the weight decay. Since the best model might not be the one at the end of training, we ask the Trainer to load the best model it saved at the end of training.


The last thing to define for our Trainer is how to compute the metrics from the predictions. We need to define a function for this, which will just use the metric we loaded earlier, the only preprocessing we have to do is to take the argmax of our predicted logits:

In [8]:
accuracy_metric = evaluate.load("accuracy")
recall_metric = evaluate.load('recall')
precision_metric = evaluate.load("precision")
f1_metric = evaluate.load("f1")

In [9]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)['accuracy']
    recall = recall_metric.compute(predictions=predictions, references=labels, average='micro')['recall']
    precision = precision_metric.compute(predictions=predictions, references=labels, average='micro')['precision']
    f1 = f1_metric.compute(predictions=predictions, references=labels, average='micro')['f1']

    return {'accuracy': accuracy, 'recall': recall, 'precision': precision, 'f1': f1}

You might wonder why we pass along the tokenizer when we already preprocessed our data. This is because we will use it once last time to make all the samples we gather the same length by applying padding, which requires knowing the model's preferences regarding padding (to the left or right? with which token?). The tokenizer has a pad method that will do all of this right for us, and the Trainer will use it. You can customize this part by defining and passing your own data_collator which will receive the samples like the dictionaries seen above and will need to return a dictionary of tensors.

We can now finetune our model by just calling the train method:

In [None]:
trainer_bert = Trainer(
    model=model_bert,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

In [None]:
trainer_bert.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 1000
  Num Epochs = 1
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 125
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mnedolivko[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss


ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-632a3dae6568>", line 1, in <module>
    trainer_bert.train()
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/lo

KeyboardInterrupt: ignored

In [None]:
trainer_bert.evaluate()

### Adapters model

In [10]:
from transformers import pipeline
import datasets
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoAdapterModel
from transformers import TrainingArguments
from transformers import Trainer, AdapterTrainer
import evaluate

In [11]:
model_adapt = AutoAdapterModel.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertAdapterModel: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertAdapterModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertAdapterModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [12]:
adapter_name = model_adapt.load_adapter("AdapterHub/bert-base-uncased-pf-sst2", source="hf")

Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

In [13]:
model_adapt.set_active_adapters(adapter_name)

In [14]:
model_adapt.add_classification_head("mrpc", num_labels=5)

In [15]:
model_adapt.train_adapter(adapter_name)

In [16]:
training_args =  TrainingArguments(
    output_dir="test_trainer",
    learning_rate=1e-4,
    num_train_epochs=2,
)

trainer = AdapterTrainer(
        model=model_adapt,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
    )



In [None]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertAdapterModel.forward` and have been ignored: text. If text are not expected by `BertAdapterModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 1000
  Num Epochs = 2
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 250
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mnedolivko[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
