# 6️⃣ Task Arithmetics
This notebook introduces and explains how Task Arithmetics works in the _Adapters_ library. Task Arithmetics were first proposed by [Ilharco et al. (2022)](https://openreview.net/forum?id=6t0Kwf8-jrj) in the paper "Editing models with task arithmetic". Task Arithmetics are a way to merge multiple models into a new model by adding and subtracting the task vectors from the base model. For example, if you have an adapter trained to produce _less_ toxic results and then use it with a negative weight during the task arithmetic model merging, it makes the model _more_ toxic. 

The concept of "Task Arithmetics" does not only work with full-finetuning but also with adapters. We will reproduce Table 5 of the NeurIPS paper "Composing Parameter-Efficient Modules with Arithmetic Operation" by [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html). Here, the authors produced a classification adapter for yelp by taking a amazon classification adapter, subtracting an amazon language adapter and adding a yelp language adapter. This means that the weights of the new adapter are calculated as follows:

$\varTheta^{\text{yelp\_cls}} = \varTheta^{\text{amazon\_cls}} - \varTheta^{\text{amazon\_lm}} + \varTheta^{\text{yelp\_lm}}$

We will introduce 3 different methods for task arithmetics with LoRA and evaluate them on this task. So, let's start with setting everything up:

## Installation & Import
Install and import the required libraries.

In [None]:
!pip install transformers adapters datasets evaluate pandas scikit-learn

In [None]:
from transformers import AutoTokenizer
from datasets import load_dataset
from adapters import AutoAdapterModel
from evaluate import evaluator
import pandas as pd

## Load Model and Adapters

In [None]:
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
model = AutoAdapterModel.from_pretrained("google-t5/t5-base")

In [32]:
# Load all the adapters.
model.load_adapter("AdapterHub/xlm-roberta-base-lora-lm-amazon-polarity", load_as="amazon_lm")
model.load_adapter("AdapterHub/xlm-roberta-base-lora-lm-yelp-polarity", load_as="yelp_lm")
model.load_adapter("AdapterHub/xlm-roberta-base-lora-cls-amazon-polarity", load_as="amazon_cls", id2label={0: "LABEL_0", 1: "LABEL_1"})
model.load_adapter("AdapterHub/xlm-roberta-base-lora-cls-yelp-polarity", load_as="yelp_cls", id2label={0: "LABEL_0", 1: "LABEL_1"})

Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 10713.42it/s]
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 12833.16it/s]
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 16373.34it/s]
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 4129.61it/s]


'yelp_cls'

## Load Datasets
To reproduce the results of Table 5 of the NeurIPS paper "Composing Parameter-Efficient Modules with Arithmetic Operation" by [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html), we will evaluate Task Arithmetics on the Amazon Polarity and Yelp Polarity datasets. Therefore, before the evaluation, we have to load the datasets.

In [40]:
amazon_dataset = load_dataset("amazon_polarity", split="test")
yelp_dataset = load_dataset("yelp_polarity", split="test", trust_remote_code=True)

To speed up evaluation, we only evaluate on 1000 random samples each. Remove this line to evaluate on the full dataset:

In [41]:
amazon_dataset = amazon_dataset.shuffle().select(range(1000))
yelp_dataset = yelp_dataset.shuffle().select(range(1000))

## Evaluating Task Transfer

We will evaluate several different merging methods. For the evaluation, we will use the Hugging Faces text classification evaluator. Since we need to call the evaluator for each setup, we'll introduce a function here that calls the evaluator (to save lines):

In [37]:
task_evaluator = evaluator("text-classification")

def evaluate_adapter(model, adapter_to_use, active_head, dataset, tokenizer, input_column="content"):
    model.active_adapters = adapter_to_use
    model.active_head = active_head
    results = task_evaluator.compute(
        model_or_pipeline=model,
        tokenizer=tokenizer,
        data=dataset,
        label_column="label",
        input_column=input_column,
        label_mapping={"LABEL_0": 0, "LABEL_1": 1},
        metric="accuracy",
    )
    return results

### Evaluation on Yelp

As introduced in the introduction of this notebook, our aim is to create a classifier for the yelp polarity dataset like this:
$\varTheta^{\text{yelp\_cls}} = \varTheta^{\text{amazon\_cls}} - \varTheta^{\text{amazon\_lm}} + \varTheta^{\text{yelp\_lm}}$  
For a detailed explanation of the combination strategies you can also reference our [task arithmetics documentation](https://docs.adapterhub.ml/adapter_composition.html#merging-adapters).

In this notebook we use LoRA adapters. [LoRA](https://arxiv.org/abs/2106.09685) introduces $A$ and $B$ matrixes with $\Delta W = BA$. The Task Arithmetics calculation of our new adapter with $A_{new}$ and $B_{new}$ matrices can be done in several different ways:
1. **Linear Combination (linear):** This straightforward method linearly combines the weights of multiple adapters. This "linear" task arithmetics variant for LoRA has been investigated in the Google paper "Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization" by [Chronopoulou et al. (2023)](https://arxiv.org/abs/2311.09344). For each adapter's weight matrix, we calculate the weighted sum like this:
   - $A_{new} = \sum_{i=0}^N \lambda_i \cdot A_i$ (The $\lambda_i$ are the weights of the individual adapters. In Task Arithmetics these weights are usually either 1 or -1.)
   - $B_{new} = \sum_{i=0}^N \lambda_i \cdot B_i$.
2. **Linear Combination, only B negative (lora_linear_only_negate_b):** LoRA is special since it does not train the $\Delta W$ directly but splits it up into the *A* and *B* matrices with $\Delta W = BA$. This, however, introduces strong dependendencies between the weights in *A* and *B*. In the paper "Composing Parameter-Efficient Modules with Arithmetic Operation" by [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html) the authors find that for negative weights, one can not naively apply the weight to both matrices as they cancel out. They propose to only take the negative weight for the *B* matrix, i.e.:
   - $A_{new} = \sum_{i=0}^N |\lambda_i| \cdot A_i$
   - $B_{new} = \sum_{i=0}^N \lambda_i \cdot B_i$
3. **Merge delta W matrices with SVD (lora_delta_w_svd):** We also implemented a version to merge adapters by combining the $\Delta W$ matrices and then perform singular value decomposition (SVD) to split the combined matrix into new *A* and *B* matrices. The process is as follows:
   1. For every adapter *i* we calculate: $\Delta W_i = B_i \cdot A_i$
   2. $\Delta W_{new} = \sum_{i=0}^N \lambda_i \cdot W_i$ 
   3. Perform SVD on $\text{SVD}(\Delta W_{new})$ to obtain $A_{new}$ and $B_{new}$

While all adapters support "linear" combintation, "lora_linear_only_negate_b" and "lora_delta_w_svd" are only applicable for LoRA – hence the name. We will evaluate all three of these methods, so let's add an adapter for each of these LoRA task arithmetics versions:

In [34]:
# Linear
model.average_adapter(
    adapter_name=f"yelp_merge_linear",
    adapter_list=["amazon_cls", "amazon_lm", "yelp_lm"],
    weights=[1, -1, 1],
    combine_strategy="linear",
)

# Linear following Zhang et al. (2023)
model.average_adapter(
    adapter_name=f"yelp_merge_lora_linear_only_negate_b",
    adapter_list=["amazon_cls", "amazon_lm", "yelp_lm"],
    weights=[1, -1, 1],
    combine_strategy="lora_linear_only_negate_b",
)

# Merging the Delta W of the LoRA adapters and then applying SVD to split into A and B matrices.
model.average_adapter(
    adapter_name=f"yelp_merge_lora_delta_w_svd",
    adapter_list=["amazon_cls", "amazon_lm", "yelp_lm"],
    weights=[1, -1, 1],
    combine_strategy="lora_delta_w_svd",
    svd_rank=8, # Note: lora_delta_w_svd requires a svd_rank. This is the LoRA rank of the merged adapter.
)

Now let's evaluate the different adapters!

In [42]:
yelp_results = pd.DataFrame(
    {
        "source": evaluate_adapter(model=model, adapter_to_use="amazon_cls", active_head="amazon_cls", dataset=yelp_dataset, tokenizer=tokenizer, input_column="text"),
        "target": evaluate_adapter(model=model, adapter_to_use="yelp_cls", active_head="yelp_cls", dataset=yelp_dataset, tokenizer=tokenizer, input_column="text"),
        "linear": evaluate_adapter(model=model, adapter_to_use="yelp_merge_linear", active_head="amazon_cls", dataset=yelp_dataset, tokenizer=tokenizer, input_column="text"),
        "lora_linear_only_negate_b": evaluate_adapter(model=model, adapter_to_use="yelp_merge_lora_linear_only_negate_b", active_head="amazon_cls", dataset=yelp_dataset, tokenizer=tokenizer, input_column="text"),
        "lora_delta_w_svd": evaluate_adapter(model=model, adapter_to_use="yelp_merge_lora_delta_w_svd", active_head="amazon_cls", dataset=yelp_dataset, tokenizer=tokenizer, input_column="text"),
    }
)

The model 'T5AdapterModel' is not supported for text-classification. Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceClassification', 'ErnieForSequenceClassification', 'ErnieMForSequenceClassification', 'EsmForSequenceClassification', 'FalconForSequenceClassification', 'FlaubertForSequenceClassification', 'FNetForSequenceClassification', 'FunnelForSequenceClassification', 'GemmaForSequenceClassification', 'GPT2ForSequenceCl

As you can see in the output, Transformers incorrectly logs an error that the model is not supported for text-classification. This is wrong; you can ignore these messages.

In [43]:
yelp_results.loc[["accuracy"]]

Unnamed: 0,source,target,linear,lora_linear_only_negate_b,lora_delta_w_svd
accuracy,0.964,0.977,0.957,0.91,0.968


As we can see, only `lora_delta_w_svd` beats the source adapter. This may differ if we evaluate the full dataset instead of the 1000 example subset. Our results are close to he scores in Table 5 of the paper by [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html).

### Evaluation on Amazon

In [44]:
# Linear
model.average_adapter(
    adapter_name=f"amazon_merge_linear",
    adapter_list=["yelp_cls", "yelp_lm", "amazon_lm"],
    weights=[1, -1, 1],
    combine_strategy="linear",
)

# Linear following Zhang et al. (2023)
model.average_adapter(
    adapter_name=f"amazon_merge_lora_linear_only_negate_b",
    adapter_list=["yelp_cls", "yelp_lm", "amazon_lm"],
    weights=[1, -1, 1],
    combine_strategy="lora_linear_only_negate_b",
)

# Merging the Delta W of the LoRA adapters and then applying SVD to split into A and B matrices.
model.average_adapter(
    adapter_name=f"amazon_merge_lora_delta_w_svd",
    adapter_list=["yelp_cls", "yelp_lm", "amazon_lm"],
    weights=[1, -1, 1],
    combine_strategy="lora_delta_w_svd",
    svd_rank=8, # Note: lora_delta_w_svd requires a svd_rank. This is the LoRA rank of the merged adapter.
)

In [47]:
amazon_results = pd.DataFrame(
    {
        "source": evaluate_adapter(model=model, adapter_to_use="yelp_cls", active_head="yelp_cls", dataset=amazon_dataset, tokenizer=tokenizer, input_column="content"),
        "target": evaluate_adapter(model=model, adapter_to_use="amazon_cls", active_head="amazon_cls", dataset=amazon_dataset, tokenizer=tokenizer, input_column="content"),
        "linear": evaluate_adapter(model=model, adapter_to_use="amazon_merge_linear", active_head="yelp_cls", dataset=amazon_dataset, tokenizer=tokenizer, input_column="content"),
        "lora_linear_only_negate_b": evaluate_adapter(model=model, adapter_to_use="amazon_merge_lora_linear_only_negate_b", active_head="yelp_cls", dataset=amazon_dataset, tokenizer=tokenizer, input_column="content"),
        "lora_delta_w_svd": evaluate_adapter(model=model, adapter_to_use="amazon_merge_lora_delta_w_svd", active_head="yelp_cls", dataset=amazon_dataset, tokenizer=tokenizer, input_column="content"),
    }
)

The model 'T5AdapterModel' is not supported for text-classification. Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceClassification', 'ErnieForSequenceClassification', 'ErnieMForSequenceClassification', 'EsmForSequenceClassification', 'FalconForSequenceClassification', 'FlaubertForSequenceClassification', 'FNetForSequenceClassification', 'FunnelForSequenceClassification', 'GemmaForSequenceClassification', 'GPT2ForSequenceCl

In [48]:
pd.DataFrame(amazon_results).loc[["accuracy"]]

Unnamed: 0,source,target,linear,lora_linear_only_negate_b,lora_delta_w_svd
accuracy,0.952,0.96,0.951,0.927,0.953


## Averaging Heads
In this example so far we always combined 3 adapters: 1 classification adapter (with head) and 2 language adapters (without head).
However, for other tasks we may have multiple adapters with heads. This is something currently only supported by _Adapters_ since Hugging Face PEFT does not allow to **1\)** exchange heads and **2\)** merge heads. We can easily merge the heads of different adapters like this:

In [49]:
model.average_head(
    head_name="merged_head",
    head_list=["amazon_cls", "yelp_cls"],
    weights=[0.7, 0.3],
    set_active=True, # If you don't set the head active here, you can do it later with model.active_head = "<HEAD_NAME>"
)

This merges the weights of the head linearly. As you can see, we have now a new active head:

In [50]:
model.active_head

'merged_head'