Skip to content

frozen transformer models adaptive layers selection for downstream tasks efficient solving

Notifications You must be signed in to change notification settings

alexdremov/adalayers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adalayers

Frozen transformer models adaptive layers selection for downstream tasks efficient solving.


Best results

Note: results are highly dependent on base model domain. Clearly, some task-specific model could have been used to achieve even higher scores. Though, such comparison would not be fair.

As you see, currently just one frozen base model with proposed algorithm achieves great results

Dataset Base model Score
IMDB RoBERTa-large 96.1% acc
(SOTA level)
CoLA RoBERTa-large 83.6% acc
(SOTA level)
CoNLL RoBERTa-large 89.4% f1

Problem

Let's consider the case when you already have a functioning SOTA-level large transformer model and you need to solve a different task on the same data. Aka speech recognition + emotion recognition, text summarization + NER, etc.

One possible solution is to use a second model. However, deploying a second transformer model requires lots of resources. Combining two tasks into a single model training is not always feasible and may deteriorate main task metrics.

Solution

Let's reuse transformer hidden states! It is well-known that different layers of the transformer model extract features of different levels. Therefore, if we effectively combine hidden features, we could achieve good results.

Moreover, in such a case, the base model stays intact, and as we reuse its computations, the proposed algorithm is highly computationally efficient.

The general idea is presented in the image

algo

Code for F can be found in adalayers/models/ada_layers_base.py. All models are implemented with Huggingface interfaces.

Launch

Training is omegaconf+hydra configurable. Configs can be found in configs.

The environment is poetry-controlled. You can set it up by calling poetry install

You can launch simple training by calling adalayers/training/run.py like

python adalayers/training/run.py --config-name adalayers_imdb.yaml

About

frozen transformer models adaptive layers selection for downstream tasks efficient solving

Topics

Resources

Stars

Watchers

Forks