Unify & simplify model adapter integration #263

calpt · 2021-12-16T16:08:51Z

Note: Depends on #257

Background

Currently, we have a lot of redundant code in the model mixins of each model class integration (src/transformers/adapters/models).

Solution

This PR does a range of refactoring with the goal to unify and simplify adapter the adapter integrations into the currently supported models:

AdapterLayerBaseMixin has been converted to a torch module, AdapterLayer, with a proper forward() method. This makes the integration into most models cleaner (except for (Ro)BERT(a)/ T5) as they don't require specific module mixins anymore
The transformer now is expected to be passed to AdapterLayer instead of being a class attribute. This removes the need for a couple of model integration workarounds.
A lot of management code (train_adapter(), train_adapter_fusion(), add_adapter(), get_adapter() etc.) has been pulled up to ModelAdaptersMixin. All of these methods now are model-agnostic and have been removed from each model-specific mixin.
To enable the last point, a new abstract iter_layers() method has been introduced in ModelAdaptersMixin. In the best case (e.g. DistilBERT), this is the only method that has to be implemented by each model mixin.
A new mechanism to call methods on each AdapterLayer has been added (ModelAdaptersMixin.apply_to_adapter_layers()). This means a base model mixin can directly call AdapterLayer methods. All intermediate add_adapter(), enable_adapters() etc. methods in each model layer are not necessary anymore.
As a consequence of the last points, all module mixins between the module holding AdapterLayer and the base model mixin have been removed (e.g. BertEncoderAdaptersMixin).
All tensor adjustments for parallel adapter composition now use adjust_tensors_for_parallel() in composition.py.

Breaking changes

The return of get_adapter() has changed.

adapter-hub-bert · 2022-01-19T09:48:24Z

This PR/issue depends on:

~~Add AdapterSetup context manager #257~~

Automatically provided by Dependent Issues (🤖).

hSterz

This looks like some great changes that will make things easier. I just have some small questions :)

hSterz · 2022-01-24T10:22:26Z

src/transformers/adapters/layer.py

        else:
            hidden_states = hidden_states + input_tensor

        return hidden_states
+
+    def forward(self, hidden_states, input_tensor, layer_norm):


What is the advantage of passing that with every forward call instead of having a parameter in the class containing the layer norm?

The layer norm usually is a child module of the Transformer block module (or part of it as e.g. for BERT). As the AdapterLayer usually also is a child module of the same block module, we would have to keep a reference to an attribute of the parent module. As this makes the implementation less clean and poses some issues (cf. #228), I decided to pass the layer norm in the forward pass.

I might have missed some better solution though, please let me know if there are any :)

hSterz · 2022-01-24T10:28:41Z

src/transformers/adapters/models/bert.py

 from ..model_mixin import InvertibleAdaptersMixin, ModelAdaptersMixin


 logger = logging.getLogger(__name__)


-class BertSelfOutputAdaptersMixin(AdapterLayerBaseMixin):
+# For backwards compatibility, BertSelfOutput inherits directly from AdapterLayer
+class BertSelfOutputAdaptersMixin(AdapterLayer):


Why do some models still have AdapterMixins and others don't e.g. in the BART implementation, the BartSelfAttentionAdaptersModule is completely removed?

Mainly for compatibility reasons. I didn't want to change the weight names and write yet another weight renaming method, so mixins are kept wherever necessary.

However, we could go even further and also remove mixins such as BartEncoderLayerAdaptersMixin and just add the code to the main modeling classes as they mainly consist of creating child modules 🤔

* `AdapterLayer` module * Simplify adapter model mixin implementations * Remove unused imports * Fix AdapterSetup context test

github-actions bot added the dependent label Jan 6, 2022

calpt force-pushed the master branch 2 times, most recently from 6245d31 to d9c06ec Compare January 6, 2022 19:43

calpt changed the base branch from master to develop January 6, 2022 19:46

calpt mentioned this pull request Jan 6, 2022

Add ForwardContext to wrap model forward pass #267

Merged

calpt force-pushed the dev/layer_refactoring branch from 60c6e48 to e062a16 Compare January 6, 2022 21:39

calpt mentioned this pull request Jan 14, 2022

Add support for DebertaV2/V3 #269

Closed

adapter-hub-bert removed the dependent label Jan 19, 2022

adapter-hub deleted a comment from github-actions bot Jan 19, 2022

calpt added 4 commits January 19, 2022 10:51

AdapterLayer module

ba42c73

Simplify adapter model mixin implementations

a4de34b

Remove unused imports

92c8c65

Fix AdapterSetup context test

560fc79

calpt force-pushed the dev/layer_refactoring branch from 059e9ad to 560fc79 Compare January 19, 2022 09:51

calpt marked this pull request as ready for review January 19, 2022 09:52

calpt requested a review from hSterz January 19, 2022 10:18

hSterz reviewed Jan 24, 2022

View reviewed changes

calpt mentioned this pull request Jan 26, 2022

Fix Seq2Seq generation for XModelWithHeads classes #275

Merged

calpt requested a review from hSterz January 26, 2022 19:27

hSterz approved these changes Jan 28, 2022

View reviewed changes

calpt merged commit a357ffe into adapter-hub:develop Jan 28, 2022

calpt deleted the dev/layer_refactoring branch January 28, 2022 19:06

calpt added a commit that referenced this pull request Feb 2, 2022

Unify & simplify model adapter integration (#263)

7574056

* `AdapterLayer` module * Simplify adapter model mixin implementations * Remove unused imports * Fix AdapterSetup context test

calpt added a commit that referenced this pull request Feb 9, 2022

Unify & simplify model adapter integration (#263)

a9ea616

* `AdapterLayer` module * Simplify adapter model mixin implementations * Remove unused imports * Fix AdapterSetup context test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify & simplify model adapter integration #263

Unify & simplify model adapter integration #263

calpt commented Dec 16, 2021 •

edited

adapter-hub-bert commented Jan 19, 2022

hSterz left a comment

hSterz Jan 24, 2022

calpt Jan 24, 2022

hSterz Jan 24, 2022

calpt Jan 24, 2022

calpt Jan 24, 2022

Unify & simplify model adapter integration #263

Unify & simplify model adapter integration #263

Conversation

calpt commented Dec 16, 2021 • edited

Background

Solution

Breaking changes

adapter-hub-bert commented Jan 19, 2022

hSterz left a comment

Choose a reason for hiding this comment

hSterz Jan 24, 2022

Choose a reason for hiding this comment

calpt Jan 24, 2022

Choose a reason for hiding this comment

hSterz Jan 24, 2022

Choose a reason for hiding this comment

calpt Jan 24, 2022

Choose a reason for hiding this comment

calpt Jan 24, 2022

Choose a reason for hiding this comment

calpt commented Dec 16, 2021 •

edited