## Adapter Training with Embeddings

The `adapters` library also allows you to train the embeddings with your adapter. This can also be used with a completly different tokenizer. This can be beneficial e.g. if the language you are working with is not well suited for the tokenizer of the model.

This notebook will show how to train embeddings for a new tokenizer with an example case. (Note that this is only if an illustrative example that trains for a shorter number of steps, so the difference between the original and the new embeddings performance is very small.)

In [None]:
! pip install -U adapters
! pip install -q datasets
! pip install -q accelerate

Adding embeddings follows the same structure as adding adapters. Simply call `add_embeddings` and provide a new name for the embedding and the tokenizer that the embeddings should work with.

To copy embeddings that are shared with an other tokenizer provide the name of the embeddings as `reference_embeddings` (or `default` if you want to use the original embeddings of the loaded model) and `reference_tokenizer` corresponding to the reference embeddings.

In [None]:
from adapters import AutoAdapterModel
from transformers import AutoTokenizer

model_name = "roberta-base"

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-chinese")

chinese_tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoAdapterModel.from_pretrained(model_name)
model.add_adapter("a")
model.add_embeddings("a", chinese_tokenizer, reference_embedding="default", reference_tokenizer=tokenizer)
model.add_classification_head("a", num_labels=2)

To set the active embeddings, call `set_active_embeddings` and pass the name of the embeddings you want to set as active.

In [3]:
model.set_active_embeddings("a")

To train the embeddings, set the `train_embeddings` attribute to true in the `train_adapter` method. This will set the passed adapter setup as active and freeze all weights except for the adapter weights and the embedding weights (make sure the correct embedding is activated with `set_active_embeddings`).

In [4]:
model.train_adapter("a", train_embeddings=True)

Next, we load and preprocess the dataset.

In [6]:
from datasets import load_dataset

dataset = load_dataset("shibing624/nli_zh", "ATEC")
dataset

DatasetDict({
    train: Dataset({
        features: ['sentence1', 'sentence2', 'label'],
        num_rows: 62477
    })
    validation: Dataset({
        features: ['sentence1', 'sentence2', 'label'],
        num_rows: 20000
    })
    test: Dataset({
        features: ['sentence1', 'sentence2', 'label'],
        num_rows: 20000
    })
})

In [7]:
def encode_batch(batch):
  """Encodes a batch of input data using the model tokenizer."""
  return tokenizer(batch["sentence1"], batch["sentence2"], max_length=80, truncation=True, padding="max_length")

# Encode the input data
dataset = dataset.map(encode_batch, batched=True)
# The transformers model expects the target class column to be named "labels"

dataset = dataset.rename_column(original_column_name="label", new_column_name="labels")
# Transform to pytorch tensors and only output the required columns
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

The trainings setup does not change compared to training just the adapter.

In [None]:
import numpy as np
from transformers import TrainingArguments, EvalPrediction
from adapters import AdapterTrainer

training_args = TrainingArguments(
    learning_rate=1e-4,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_steps=200,
    output_dir="./training_output",
    overwrite_output_dir=True,
    remove_unused_columns=False,
    # This would probably need to be bigger
    # but for illustration and for it to run in colab this is small
    max_steps = 5000,
)

def compute_accuracy(p: EvalPrediction):
  preds = np.argmax(p.predictions, axis=1)
  return {"acc": (preds == p.label_ids).mean()}

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    compute_metrics=compute_accuracy,
)


In [None]:
trainer.train()

In [10]:
trainer.evaluate()

{'eval_loss': 0.4556490182876587,
 'eval_acc': 0.81615,
 'eval_runtime': 89.8272,
 'eval_samples_per_second': 222.65,
 'eval_steps_per_second': 6.958,
 'epoch': 2.56}

You can dynamically change the embeddings. For instance, to evaluate with the original embedding you can simply do the following:

In [11]:
model.set_active_embeddings("default")

In [12]:
trainer.evaluate()

{'eval_loss': 0.5493476390838623,
 'eval_acc': 0.81535,
 'eval_runtime': 88.7409,
 'eval_samples_per_second': 225.375,
 'eval_steps_per_second': 7.043,
 'epoch': 2.56}

This notebook provides a a toy example on how to add, train and change the embedding. For more info, check our [documentation](https://docs.adapterhub.ml/embeddings.html) and the [EmbeddingMixin](https://docs.adapterhub.ml/classes/model_mixins.html#embeddingadaptersmixin). 