Add support for Task Arithmetics #698

lenglaender · 2024-05-08T13:09:07Z

This PR adds support for various task arithmetic options for LoRA. Until now, our library supported averaging only by linearly combining different adapters. This may be insufficient, especially for LoRA — hence, several publications have proposed other ways to perform task arithmetic.

This PR:

makes it easier to implement different weighting methods
adds 2 additional merging methods for LoRA following these papers
adds method to merge heads
Docu & notebook

…statements in python

…ower 3.10

calpt

This looks very good overall!

Looked over everything except for the notebook and left some comments.

calpt · 2024-07-10T18:40:18Z

README.md

@@ -57,6 +57,9 @@ cd adapters
 pip install .
 ```

+> **Note**: The _Adapters_ library has replaced the [`adapter-transformers`](https://github.com/adapter-hub/adapter-transformers-legacy) package. All previously trained adapters are compatible with the new library. For transitioning, please read: https://docs.adapterhub.ml/transitioning.html.


why was this moved down? I'd prefer to keep it close to the top since there might still be adapter-transformers users that are redirected to this page

calpt · 2024-07-10T18:40:51Z

README.md

@@ -36,9 +36,9 @@ A Unified Library for Parameter-Efficient and Modular Transfer Learning
 [![GitHub](https://img.shields.io/github/license/adapter-hub/adapters.svg?color=blue)](https://github.com/adapter-hub/adapters/blob/main/LICENSE)
 [![PyPI](https://img.shields.io/pypi/v/adapters)](https://pypi.org/project/adapters/)

-_Adapters_ is an add-on library to [HuggingFace's Transformers](https://github.com/huggingface/transformers), integrating [various adapter methods](https://docs.adapterhub.ml/overview.html) into [state-of-the-art pre-trained language models](https://docs.adapterhub.ml/model_overview.html) with minimal coding overhead for training and inference.
+_Adapters_ is an add-on library to [HuggingFace's Transformers](https://github.com/huggingface/transformers), integrating [10+ adapter methods](https://docs.adapterhub.ml/overview.html) into [20+ state-of-the-art Transformer models](https://docs.adapterhub.ml/model_overview.html) with minimal coding overhead for training and inference. _Adapters_ provides a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. [Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning](https://github.com/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb)), [adapter merging via task arithmetics](https://docs.adapterhub.ml/adapter_composition.html#merging-adapters) or the composition of multiple adapters via [composition blocks](https://docs.adapterhub.ml/adapter_composition.html), allowing advanced research in parameter-efficient transfer learning for NLP tasks.


Suggested change

_Adapters_ is an add-on library to [HuggingFace's Transformers](https://github.com/huggingface/transformers), integrating [10+ adapter methods](https://docs.adapterhub.ml/overview.html) into [20+ state-of-the-art Transformer models](https://docs.adapterhub.ml/model_overview.html) with minimal coding overhead for training and inference. _Adapters_ provides a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. [Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning](https://github.com/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb)), [adapter merging via task arithmetics](https://docs.adapterhub.ml/adapter_composition.html#merging-adapters) or the composition of multiple adapters via [composition blocks](https://docs.adapterhub.ml/adapter_composition.html), allowing advanced research in parameter-efficient transfer learning for NLP tasks.

_Adapters_ is an add-on library to [HuggingFace's Transformers](https://github.com/huggingface/transformers), integrating [10+ adapter methods](https://docs.adapterhub.ml/overview.html) into [20+ state-of-the-art Transformer models](https://docs.adapterhub.ml/model_overview.html) with minimal coding overhead for training and inference.

_Adapters_ provides a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. [Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning](https://github.com/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb)), [adapter merging via task arithmetics](https://docs.adapterhub.ml/adapter_composition.html#merging-adapters) or the composition of multiple adapters via [composition blocks](https://docs.adapterhub.ml/adapter_composition.html), allowing advanced research in parameter-efficient transfer learning for NLP tasks.

calpt · 2024-07-10T18:41:10Z

README.md

@@ -156,7 +159,7 @@ Currently, adapters integrates all architectures and methods listed below:
 | UniPELT | [Mao et al. (2022)](https://arxiv.org/pdf/2110.07577.pdf) | [Docs](https://docs.adapterhub.ml/method_combinations.html#unipelt) |
 | Prompt Tuning | [Lester et al. (2021)](https://aclanthology.org/2021.emnlp-main.243/) | [Docs](https://docs.adapterhub.ml/methods.html#prompt-tuning) |
 | QLoRA | [Dettmers et al. (2023)](https://arxiv.org/pdf/2305.14314.pdf) | [Notebook](https://colab.research.google.com/github/Adapter-Hub/adapters/blob/main/notebooks/QLoRA_Llama_Finetuning.ipynb) |
-| ReFT | [Wu et al. (2024)](https://arxiv.org/pdf/2404.03592) | [Docs](https://docs.adapterhub.ml/methods.html#reft) |
+| ReFT | [Wu et al. (2024)](https://arxiv.org/pdf/2404.03592) | [Docs](https://docs.adapterhub.ml/methods.html#reft) |                                                                                                                                                  |


Add task arithmetics paper here?

calpt · 2024-07-10T18:43:22Z

docs/adapter_composition.md

-As this process is typically not done dynamically at runtime, `adapters` provides `average_adapter()` as a dedicated method for parameter averaging.
-In the example below, the parameters of the adapters `m`, `n` and `o` are averaged (with weights `0.1` `0.6` and `0.3`, respectively) to create a new adapter `avg`.
-Note that for this to succeed, all averaged adapters must use the same adapter configuration.
+### Merging Adapters


Maybe we can think about adding a separate doc page for merging to make it more discoverable?

calpt · 2024-07-10T18:53:09Z

src/adapters/methods/lora.py

+                        else:
+                            avg_state_dict[k] = zhang_weight * v
+
+            elif combine_strategy == "lora_delta_w_svd":


Can we move this strategy to a helper method since the implementation is slightly lengthy?

calpt · 2024-07-10T19:08:48Z

src/adapters/model_mixin.py

+    def _average_shared_parameters(self, adapter_name: str, input_adapters: Dict[str, float], combine_strategy: str):
+        if combine_strategy != "linear":
+            raise ValueError(
+                f"Combine strategy {combine_strategy} not supported for Compacter. Only 'linear' is supported."


Suggested change

f"Combine strategy {combine_strategy} not supported for Compacter. Only 'linear' is supported."

f"Combine strategy {combine_strategy} not supported for shared parameters. Only 'linear' is supported."

calpt · 2024-07-11T19:28:26Z

src/adapters/model_mixin.py

+        def _get_head_config_hash(config):
+            return get_adapter_config_hash({k: v for k, v in config.items() if k not in keys_to_ignore})


couldn't this directly call the hash method with the new ignore param?

calpt · 2024-07-11T19:29:48Z

src/adapters/model_mixin.py

        except ValueError as ex:
            self.delete_adapter(adapter_name)
            raise ex
        if set_active:
            self.set_active_adapters(adapter_name)

+    def average_head(


Should this method live in the flexible heads mixin since it only applies to those model classes?

calpt · 2024-07-11T19:33:43Z

tests/methods/test_lora.py

+                                    self.assertTrue(torch.allclose(expected_A, merged_lora.lora_A, atol=1e-5))
+                                    self.assertTrue(torch.allclose(expected_B, merged_lora.lora_B, atol=1e-5))
+
+                # 2. if we merge multiple adapters with weight 0 except one adapter with weight 1, the resulting adapter should be the same as the adapter with weight 1


i feel like cases 1 and 2 should live in separate test classes here

calpt · 2024-07-11T19:35:00Z

tests/methods/test_lora.py

+                            if isinstance(module, LoRALayer):
+                                if f"{name}_0" in module.loras and f"{combine_strategy}_case1" in module.loras:
+                                    original_lora = module.loras[f"{name}_0"]
+                                    merged_lora = module.loras[f"{combine_strategy}_case1"]
+
+                                    # Compute SVD of the original delta_w
+                                    u, s, v = torch.svd(original_lora.delta_w)
+                                    u = u[:, :svd_rank]
+                                    s = s[:svd_rank]
+                                    v = v[:, :svd_rank]
+
+                                    # Reconstruct A and B matrices
+                                    expected_A = v.t()
+                                    expected_B = u @ torch.diag(s)
+
+                                    # Compare with merged adapter
+                                    self.assertTrue(torch.allclose(expected_A, merged_lora.lora_A, atol=1e-5))
+                                    self.assertTrue(torch.allclose(expected_B, merged_lora.lora_B, atol=1e-5))
+


maybe move the svd part to a helper method since it is repeated multiple times in this test class?

hSterz

Looks good to me. I just have some small questions

hSterz · 2024-07-13T14:30:35Z

src/adapters/methods/bottleneck.py

@@ -94,28 +94,6 @@ def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:

        return False

-    def average_adapter(self, adapter_name: str, input_adapters: Dict[str, float]) -> bool:


Where did this method go?

hSterz · 2024-07-13T14:36:58Z

src/adapters/models/deberta/mixin_deberta.py

+
+class DebertaModelAdaptersMixin(BertModelAdaptersMixin):
+    # Same as BERT, except that Deberta does not support the "lora_delta_w_svd" combine_strategy
+    support_lora_delta_w_svd = False


Why is DeBERTa not supporting lora_delta_w_svd?

hSterz · 2024-07-13T14:38:36Z

docs/adapter_composition.md

+    \Phi_{merged} = \sum_{i=0}^{N} \lambda_i \Phi_i
+    $$
+
+2. `combine_strategy = "lora_linear_only_negate_b"` Following [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html), this method only uses negative weights for the B-matrix if the weight is negative:


The only in the name is redundant. I would remove it to make it shorter

hSterz · 2024-07-13T14:41:52Z

docs/adapter_composition.md

-In the example below, the parameters of the adapters `m`, `n` and `o` are averaged (with weights `0.1` `0.6` and `0.3`, respectively) to create a new adapter `avg`.
-Note that for this to succeed, all averaged adapters must use the same adapter configuration.
+### Merging Adapters
+We can create new adapters by combining the parameters of multiple trained adapters, i.e. merging multiple existing adapters into a new one. The `average_adapter()` method provides this functionality:


Maybe we can shortly say why this is a cool feature (average adapters on the same task can increase performance and combining adapters for different tasks can increase or reduce specific skills/features).

hSterz · 2024-07-13T14:44:46Z

tests/test_adapter_heads.py

+                    expected_new_head_weights[base_k] += weight * v
+
+        # Average the heads
+        model.average_head(


Where is this average_headmethod implemented?

WIP: Adding various task arithmetic combination methods for LoRA

91ae970

lenglaender mentioned this pull request Jun 20, 2024

Add ReFT (LoReFT, NoReFT, DiReFT) #705

Merged

6 tasks

lenglaender added 10 commits June 30, 2024 20:49

task arithmetics

f231f2c

make style; use black with -t py310 to be able to use "match case" …

10c69c3

…statements in python

README & notebook

ba74843

Merge branch 'main' into dev/task_arithmetics

3396b86

style; exchange "match case" with if/elif to support python version l…

e6ea5cd

…ower 3.10

replace "|" in typehints with "Union"

9bf5828

Merge branch 'main' into dev/task_arithmetics

39b6d0f

fix test

880eeaf

fix test_average_head to work with heads that have tied weights.

08fda39

Improve docs

0d8944c

lenglaender changed the title ~~WIP: Add support for Task Arithmetics~~ Add support for Task Arithmetics Jul 4, 2024

lenglaender marked this pull request as ready for review July 4, 2024 11:54

lenglaender added 2 commits July 4, 2024 13:54

resolve last TODO

488a4e6

Add test for head with tied weights.

3b367fa

lenglaender requested review from calpt, TimoImhof and hSterz and removed request for calpt and TimoImhof July 10, 2024 12:34

calpt reviewed Jul 11, 2024

View reviewed changes

hSterz reviewed Jul 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Task Arithmetics #698

Add support for Task Arithmetics #698

lenglaender commented May 8, 2024 •

edited

Loading

calpt left a comment

calpt Jul 10, 2024

calpt Jul 10, 2024

calpt Jul 10, 2024

calpt Jul 10, 2024

calpt Jul 10, 2024

calpt Jul 10, 2024

calpt Jul 11, 2024

calpt Jul 11, 2024

calpt Jul 11, 2024

calpt Jul 11, 2024

hSterz left a comment

hSterz Jul 13, 2024

hSterz Jul 13, 2024

hSterz Jul 13, 2024

hSterz Jul 13, 2024

hSterz Jul 13, 2024

	f"Combine strategy {combine_strategy} not supported for Compacter. Only 'linear' is supported."
	f"Combine strategy {combine_strategy} not supported for shared parameters. Only 'linear' is supported."

		def _get_head_config_hash(config):
		return get_adapter_config_hash({k: v for k, v in config.items() if k not in keys_to_ignore})

		@@ -94,28 +94,6 @@ def add_adapter(self, adapter_name: str, layer_idx: int) -> bool:

		return False

		def average_adapter(self, adapter_name: str, input_adapters: Dict[str, float]) -> bool:

Add support for Task Arithmetics #698

Are you sure you want to change the base?

Add support for Task Arithmetics #698

Conversation

lenglaender commented May 8, 2024 • edited Loading

calpt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hSterz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lenglaender commented May 8, 2024 •

edited

Loading