degenerate-attention

Instead of performing softmax over the last dimension of the attention scores which corresponds to the keys dimension, we softmaxed over the penultimate dimension of the attention scores which corresponds to the queries dimension. This is a pitfall that leaks information by allowing the transformer to receive information from future tokens in the masked sequence. For more information, take a look at this blog post where I explain in detail why this happens.

Usage

Setup

You can install our flavour of torch with degenerate attention by either installing from source, or by making a local installation, then porting over the degenerate attention code files. For the second option to work, we need to enforce a consistent version of torch, so we use version 2.0.1.

# from source
. setup_from_src.sh

# local installation
. setup_from_cpy.sh

Training

We will setup multiple tasks on which to train different transformers with degenerate attention. The first task consists of language modelling with a variety of natural language datasets.

Language Modelling

To run a language model, there are a variety of hyper parameters that you can set. Please refer to the original README for more information. Here is an example where we run a vanilla transformer on the wikitext2 dataset.

cd nlp
python main.py --cuda --epochs 6 --model Transformer --lr 5 --wandb --degenerate
python main.py --cuda --epochs 6 --model Transformer --lr 5 --wandb

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
deg-attn-src		deg-attn-src
nlp		nlp
pytorch @ 2e41fca		pytorch @ 2e41fca
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt
setup_from_cpy.sh		setup_from_cpy.sh
setup_from_src.sh		setup_from_src.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deg-attn-src

deg-attn-src

nlp

nlp

pytorch @ 2e41fca

pytorch @ 2e41fca

.gitignore

.gitignore

.gitmodules

.gitmodules

README.md

README.md

requirements.txt

requirements.txt

setup_from_cpy.sh

setup_from_cpy.sh

setup_from_src.sh

setup_from_src.sh

Repository files navigation

degenerate-attention

Usage

Setup

Training

Language Modelling

About

Releases

Packages

Languages

TheMatrixMaster/degenerate-attention

Folders and files

Latest commit

History

Repository files navigation

degenerate-attention

Usage

Setup

Training

Language Modelling

About

Resources

Stars

Watchers

Forks

Languages