# 1️⃣ Training an Adapter for a Transformer model

In this notebook, we train an adapter for a **RoBERTa** model for sequence classification on a **sentiment analysis** task using [adapter-transformers](https://github.com/Adapter-Hub/adapter-transformers), the _AdapterHub_ adaptation of HuggingFace's _transformers_ library.

We train a **Task Adapter** for a pre-trained model here. Most of the code is identical to a full finetuning setup using HuggingFace's transformers. For comparison, have a look at the [same guide using full finetuning](https://colab.research.google.com/drive/1brXJg5Mokm8h3shxqPRnoIsRwHQoncus?usp=sharing).

For training, we use the [movie review dataset by Pang and Lee (2005)](http://www.cs.cornell.edu/people/pabo/movie-review-data/). It contains movie reviews  from Rotten Tomatoes which are either classified as positive or negative. We download the dataset via HuggingFace's [datasets](https://github.com/huggingface/datasets) library.

## Installation

First, let's install the required libraries:

In [None]:
!pip install git+https://github.com/Adapter-Hub/adapter-transformers.git
!pip install datasets

## Dataset Preprocessing

Before we start to train our adapter, we first prepare the training data. Our training dataset can be loaded via HuggingFace `datasets` using one line of code:

In [None]:
from datasets import load_dataset

dataset = load_dataset("rotten_tomatoes")
dataset.num_rows

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1895.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=869.0, style=ProgressStyle(description_…

Using custom data configuration default



Downloading and preparing dataset rotten_tomatoes_movie_review/default (download: 476.34 KiB, generated: 1.28 MiB, post-processed: Unknown size, total: 1.75 MiB) to /root/.cache/huggingface/datasets/rotten_tomatoes_movie_review/default/1.0.0/9198dbc50858df8bdb0d5f18ccaf33125800af96ad8434bc8b829918c987ee8a...


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=487770.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset rotten_tomatoes_movie_review downloaded and prepared to /root/.cache/huggingface/datasets/rotten_tomatoes_movie_review/default/1.0.0/9198dbc50858df8bdb0d5f18ccaf33125800af96ad8434bc8b829918c987ee8a. Subsequent calls will reuse this data.


{'test': 1066, 'train': 8530, 'validation': 1066}

Every dataset sample has an input text and a binary label:

In [None]:
dataset['train'][0]

{'label': 1,
 'text': 'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .'}

Now, we need to encode all dataset samples to valid inputs for our Transformer model. Since we want to train on `roberta-base`, we load the corresponding `RobertaTokenizer`. Using `dataset.map()`, we can pass the full dataset through the tokenizer in batches:

In [None]:
from transformers import RobertaTokenizer

tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

def encode_batch(batch):
  """Encodes a batch of input data using the model tokenizer."""
  return tokenizer(batch["text"], max_length=80, truncation=True, padding="max_length")

# Encode the input data
dataset = dataset.map(encode_batch, batched=True)
# The transformers model expects the target class column to be named "labels"
dataset.rename_column_("label", "labels")
# Transform to pytorch tensors and only output the required columns
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, max=9.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




Now we're ready to train our model...

## Training

We use a pre-trained RoBERTa model from HuggingFace. We use `RobertaModelWithHeads`, a class unique to `adapter-transformers`, which allows us to add and configure prediction heads in a flexibler way.

In [None]:
from transformers import RobertaConfig, RobertaModelWithHeads

config = RobertaConfig.from_pretrained(
    "roberta-base",
    num_labels=2,
    id2label={ 0: "👎", 1: "👍"},
)
model = RobertaModelWithHeads.from_pretrained(
    "roberta-base",
    config=config,
)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=501200538.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModelWithHeads: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModelWithHeads from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing RobertaModelWithHeads from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModelWithHeads were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and infere

**Here comes the important part!**

We add a new adapter to our model by calling `add_adapter()`. We pass a name (`"rotten_tomatoes"`) and [the type of adapter](https://docs.adapterhub.ml/adapters.html#adapter-types) (task adapter). Next, we add a binary classification head. It's convenient to give the prediction head the same name as the adapter. This allows us to activate both together in the next step. The `train_adapter()` method does two things:

1. It freezes all weights of the pre-trained model so only the adapter weights are updated during training.
2. It activates the adapter and the prediction head such that both are used in every forward pass.

In [None]:
from transformers import AdapterType

# Add a new adapter
model.add_adapter("rotten_tomatoes", AdapterType.text_task)
# Add a matching classification head
model.add_classification_head("rotten_tomatoes", num_labels=2)
# Activate the adapter
model.train_adapter("rotten_tomatoes")

For training, we make use of the `Trainer` class built-in into `transformers`. We configure the training process using a `TrainingArguments` object and define a method that will calculate the evaluation accuracy in the end. We pass both, together with the training and validation split of our dataset, to the trainer instance.

**Note the differences in hyperparameters compared to full finetuning.** Adapter training usually required a few more training epochs than full finetuning.

In [None]:
import numpy as np
from transformers import TrainingArguments, Trainer, EvalPrediction

training_args = TrainingArguments(
    learning_rate=1e-4,
    num_train_epochs=6,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
)

def compute_accuracy(p: EvalPrediction):
  preds = np.argmax(p.predictions, axis=1)
  return {"acc": (preds == p.label_ids).mean()}

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_accuracy,
)

Start the training 🚀

In [None]:
trainer.train()

HBox(children=(FloatProgress(value=0.0, description='Epoch', max=6.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

  return torch.tensor(x, **format_kwargs)


{'loss': 0.4757337951660156, 'learning_rate': 8.75156054931336e-05, 'epoch': 0.7490636704119851, 'total_flos': 390226384896000, 'step': 200}



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

{'loss': 0.3162423706054687, 'learning_rate': 7.503121098626716e-05, 'epoch': 1.4981273408239701, 'total_flos': 779599149575040, 'step': 400}



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

{'loss': 0.2839324951171875, 'learning_rate': 6.254681647940075e-05, 'epoch': 2.247191011235955, 'total_flos': 1168971914254080, 'step': 600}
{'loss': 0.27122039794921876, 'learning_rate': 5.006242197253434e-05, 'epoch': 2.9962546816479403, 'total_flos': 1559198299150080, 'step': 800}



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

{'loss': 0.24722808837890625, 'learning_rate': 3.7578027465667915e-05, 'epoch': 3.7453183520599254, 'total_flos': 1948571063829120, 'step': 1000}



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

{'loss': 0.235828857421875, 'learning_rate': 2.50936329588015e-05, 'epoch': 4.49438202247191, 'total_flos': 2337943828508160, 'step': 1200}



HBox(children=(FloatProgress(value=0.0, description='Iteration', max=267.0, style=ProgressStyle(description_wi…

{'loss': 0.22592971801757813, 'learning_rate': 1.2609238451935081e-05, 'epoch': 5.2434456928838955, 'total_flos': 2727316593187200, 'step': 1400}
{'loss': 0.21846298217773438, 'learning_rate': 1.2484394506866418e-07, 'epoch': 5.992509363295881, 'total_flos': 3117542978083200, 'step': 1600}




TrainOutput(global_step=1602, training_loss=0.2841355875041452)

Looks good! Let's evaluate our adapter on the validation split of the dataset to see how well it learned:

In [None]:
trainer.evaluate()

HBox(children=(FloatProgress(value=0.0, description='Evaluation', max=34.0, style=ProgressStyle(description_wi…


{'eval_loss': 0.29718441593043127, 'eval_acc': 0.8855534709193246, 'epoch': 6.0, 'total_flos': 3120591621715200, 'step': 1602}


{'epoch': 6.0,
 'eval_acc': 0.8855534709193246,
 'eval_loss': 0.29718441593043127,
 'total_flos': 3120591621715200}

We can put our trained model into a `transformers` pipeline to be able to make new predictions conveniently:

In [None]:
 from transformers import TextClassificationPipeline

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, device=training_args.device.index)

classifier("This is awesome!")

[{'label': '👍', 'score': 0.9945221543312073}]

At last, we can also extract the adapter from our model and separately save it for later reuse. Note the size difference compared to a full model!

In [None]:
model.save_adapter("./final_adapter", "rotten_tomatoes")

!ls -lh final_adapter

total 5.8M
-rw-r--r-- 1 root root  631 Nov 16 10:53 adapter_config.json
-rw-r--r-- 1 root root  344 Nov 16 10:53 head_config.json
-rw-r--r-- 1 root root 3.5M Nov 16 10:53 pytorch_adapter.bin
-rw-r--r-- 1 root root 2.3M Nov 16 10:53 pytorch_model_head.bin


**Share your work!**

The next step after training is to share our adapter with the world via _AdapterHub_. [Read our guide](https://docs.adapterhub.ml/contributing.html) on how to prepare the adapter module we just saved and contribute it to the Hub!

➡️ Also continue with [the next Colab notebook](https://colab.research.google.com/drive/1ovA1_ENGU1TT4T6nz2bW2bzq8-Lg8mMW?usp=sharing) to learn how to use adapters from the Hub.