This notebook is taken from: https://adapterhub.ml

For this guide, we select **CommitmentBank** ([De Marneffe et al., 2019](https://github.com/mcdm/CommitmentBank)), a three-class textual entailment dataset, as our target task. We will fuse [adapters from AdapterHub](https://adapterhub.ml/explore/) which were pre-trained on different tasks. During training, their represantions are kept fix while a newly introduced fusion layer is trained. As our base model, we will use BERT (`bert-base-uncased`). 

## Installation

Again, we install `adapter-transformers` and HuggingFace's `datasets` library first:

In [12]:
!pip install -U adapter-transformers
!pip install datasets



In [13]:
import torch
torch.cuda.is_available()

True

In [14]:
from google.colab import drive
drive.mount("/content/gdrive/", force_remount=True)

Mounted at /content/gdrive/


In [15]:
import sys
sys.path.append('/content/gdrive/MyDrive/master_hpi/NLP_Project/code/')

In [16]:
path = "/content/gdrive/MyDrive/master_hpi/NLP_Project/code/"


## Dataset Preprocessing

Before setting up training, we first prepare the training data. CommimentBank is part of the SuperGLUE benchmark and can be loaded via HuggingFace `datasets` using one line of code:

In [17]:
from datasets import load_dataset

dataset = load_dataset("super_glue", "cb")
dataset.num_rows

Reusing dataset super_glue (/root/.cache/huggingface/datasets/super_glue/cb/1.0.2/2fb163bca9085c1deb906aff20f00c242227ff704a4e8c9cfdfe820be3abfc83)


{'test': 250, 'train': 250, 'validation': 56}

Every dataset sample has a premise, a hypothesis and a three-class class label:

In [18]:
dataset['train'].features

{'hypothesis': Value(dtype='string', id=None),
 'idx': Value(dtype='int32', id=None),
 'label': ClassLabel(num_classes=3, names=['entailment', 'contradiction', 'neutral'], names_file=None, id=None),
 'premise': Value(dtype='string', id=None)}

Now, we need to encode all dataset samples to valid inputs for our `bert-base-uncased` model. Using `dataset.map()`, we can pass the full dataset through the tokenizer in batches:

In [19]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def encode_batch(batch):
  """Encodes a batch of input data using the model tokenizer."""
  return tokenizer(
      batch["premise"],
      batch["hypothesis"],
      max_length=180,
      truncation=True,
      padding="max_length"
  )

# Encode the input data
dataset = dataset.map(encode_batch, batched=True)
# The transformers model expects the target class column to be named "labels"
dataset.rename_column_("label", "labels")
# Transform to pytorch tensors and only output the required columns
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc7043

New we're ready to setup AdapterFusion...

## Fusion Training

We use a pre-trained BERT model from HuggingFace and instantiate our model using `BertModelWithHeads`.

In [20]:
from transformers import BertConfig, BertModelWithHeads

id2label = {id: label for (id, label) in enumerate(dataset["train"].features["labels"].names)}

config = BertConfig.from_pretrained(
    "bert-base-uncased",
    id2label=id2label,
)
model = BertModelWithHeads.from_pretrained(
    "bert-base-uncased",
    config=config,
)

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "adapters": {
    "adapters": {},
    "config_map": {}
  },
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "contradiction",
    "2": "neutral"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.8.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_

Now we have everything set up to load our _AdapterFusion_ setup. First, we load three adapters pre-trained on different tasks from the Hub: MultiNLI, QQP and QNLI. As we don't need their prediction heads, we pass `with_head=False` to the loading method. Next, we add a new fusion layer that combines all the adapters we've just loaded. Finally, we add a new classification head for our target task on top.

In [21]:
from transformers.adapters.composition import Fuse

# Load the pre-trained adapters we want to fuse
model.load_adapter("nli/multinli@ukp", load_as="multinli", with_head=False)
model.load_adapter("sts/qqp@ukp", with_head=False)
model.load_adapter("nli/qnli@ukp", with_head=False)
# Add a fusion layer for all loaded adapters
model.add_adapter_fusion(Fuse("multinli", "qqp", "qnli"))
model.set_active_adapters(Fuse("multinli", "qqp", "qnli"))

# Add a classification head for our target task
model.add_classification_head("cb", num_labels=len(id2label))

No exactly matching adapter config found for this specifier, falling back to default.
Resolved adapter files at https://public.ukp.informatik.tu-darmstadt.de/AdapterHub/text_task/mnli/bert-base-uncased/pfeiffer/bert-base-uncased_nli_multinli_pfeiffer.zip.
Loading module configuration from ~/.cache/torch/adapters/2e2b596bb3b1b6db529d746b87272bda4c8892b0a26f6a960553852cc4378654-5ad6785a3c6c5d82b0a96c3612e27fccc2f710cd379f14326924e06e815c48eb-extracted/adapter_config.json
Adding adapter 'multinli'.
Loading module weights from ~/.cache/torch/adapters/2e2b596bb3b1b6db529d746b87272bda4c8892b0a26f6a960553852cc4378654-5ad6785a3c6c5d82b0a96c3612e27fccc2f710cd379f14326924e06e815c48eb-extracted/pytorch_adapter.bin
Some module weights could not be found in loaded weights file: invertible_adapters.multinli.F.0.weight, invertible_adapters.multinli.F.0.bias, invertible_adapters.multinli.F.2.weight, invertible_adapters.multinli.F.2.bias, invertible_adapters.multinli.G.0.weight, invertible_adapters.mul

The last preparation step is to define and activate our adapter setup. Similar to `train_adapter()`, `train_adapter_fusion()` does two things: It freezes all weights of the model (including adapters!) except for the fusion layer and classification head. It also activates the given adapter setup to be used in very forward pass.

The syntax for the adapter setup (which is also applied to other methods such as `set_active_adapters()`) works as follows:

- a single string is interpreted as a single adapter
- a list of strings is interpreted as a __stack__ of adapters
- a _nested_ list of strings is interpreted as a __fusion__ of adapters

In [22]:
# Unfreeze and activate fusion setup
adapter_setup = Fuse("multinli", "qqp", "qnli")
model.train_adapter_fusion(adapter_setup)

For training, we make use of the `Trainer` class built-in into `transformers`. We configure the training process using a `TrainingArguments` object and define a method that will calculate the evaluation accuracy in the end. We pass both, together with the training and validation split of our dataset, to the trainer instance.

In [23]:
import numpy as np
from transformers import TrainingArguments, Trainer, EvalPrediction

training_args = TrainingArguments(
    learning_rate=5e-5,
    num_train_epochs=5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
    # The next line is important to ensure the dataset labels are properly passed to the model
    remove_unused_columns=False,
)

def compute_accuracy(p: EvalPrediction):
  preds = np.argmax(p.predictions, axis=1)
  return {"acc": (preds == p.label_ids).mean()}

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_accuracy,
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


Start the training 🚀 (this will take a while)

In [24]:
trainer.train()

***** Running training *****
  Num examples = 250
  Num Epochs = 5
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 40


Step,Training Loss




Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=40, training_loss=0.7984646320343017, metrics={'train_runtime': 52.8561, 'train_samples_per_second': 23.649, 'train_steps_per_second': 0.757, 'total_flos': 180914605650000.0, 'train_loss': 0.7984646320343017, 'epoch': 5.0})

After completed training, let's check how well our setup performs on the validation set of our target dataset:

In [25]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 56
  Batch size = 32


{'epoch': 5.0,
 'eval_acc': 0.7321428571428571,
 'eval_loss': 0.7299303412437439,
 'eval_runtime': 1.0846,
 'eval_samples_per_second': 51.631,
 'eval_steps_per_second': 1.844}

We can also use our setup to make some predictions (the example is from the test set of CB):

In [26]:
import torch

def predict(premise, hypothesis):
  encoded = tokenizer(premise, hypothesis, return_tensors="pt")
  if torch.cuda.is_available():
    encoded.to("cuda")
  logits = model(**encoded)[0]
  pred_class = torch.argmax(logits).item()
  return id2label[pred_class]

predict("""
``It doesn't happen very often.'' Karen went home
happy at the end of the day. She didn't think that
the work was difficult.
""",
"the work was difficult"
)

'contradiction'

Finally, we can extract and save our fusion layer as well as all the adapters we used for training. Both can later be reloaded into the pre-trained model again.

In [27]:
model.save_adapter_fusion(path +"models/fusion/", "multinli,qqp,qnli")
model.save_all_adapters(path +"models/fusion/", "multinli,qqp,qnli")

print(model.active_adapters)


!ls -l saved

Configuration saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/adapter_fusion_config.json
Module weights saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/pytorch_model_adapter_fusion.bin
Configuration saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/multinli/adapter_config.json
Module weights saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/multinli/pytorch_adapter.bin
Configuration saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/qqp/adapter_config.json
Module weights saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/qqp/pytorch_adapter.bin
Configuration saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/qnli/adapter_config.json
Module weights saved in /content/gdrive/MyDrive/master_hpi/NLP_Project/code/models/fusion/qnli/pytorch_adapter.bin


Fuse[multinli, qqp, qnli]
ls: cannot access 'saved': No such file or directory


That's it. Do check out [the paper on AdapterFusion](https://arxiv.org/pdf/2005.00247.pdf) for a more theoretical view on what we've just seen.

➡️ `adapter-transformers` also enables other composition methods beyond AdapterFusion. For example, check out [the next notebook in this series](https://colab.research.google.com/github/Adapter-Hub/adapter-transformers/blob/master/notebooks/04_Cross_Lingual_Transfer.ipynb) on cross-lingual transfer.