Skip to content

AUST-Hansen/awesome-fast-attention

 
 

Repository files navigation

awesome-fast-attention Awesome

A curated list of efficient attention modules (last update: Thu, 27 Aug 2020 12:59:43 +0000)

Table of Contents

Efficient Attention

Paper (citations) Implementation Complexity AutoRegressive Main Idea
Generating Wikipedia by Summarizing Long Sequences (210) memory-compressed-attention formula
EXPAND

compresses key and value + blocked attention

CBAM: Convolutional Block Attention Module (714) attention-module formula
EXPAND

combines the SE attention with a per pixel(local) weight

CCNet: Criss-Cross Attention for Semantic Segmentation (160) CCNet formula
EXPAND

each pixel attends to its row and column simultaneously

Efficient Attention: Attention with Linear Complexities (2) efficient-attention formula
EXPAND

Softmax(Q)*(Softmax(K^T)*V)

Star-Transformer (26) fastNLP formula
EXPAND

uses a relay(global) node and attends to/from that node

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (108) GCNet formula
EXPAND

squeeze and excitation with an attention pooling (instead of a GAP)

Generating Long Sequences with Sparse Transformers (149) torch-blocksparse formula ✔️
EXPAND

sparse block based attention

SCRAM: Spatially Coherent Randomized Attention Maps (1) - formula ✔️
EXPAND

uses PatchMatch to find close keys

Interlaced Sparse Self-Attention for Semantic Segmentation (15) IN_PAPER formula ✔️
EXPAND

combination of a short length and then long range(dilated) attention

Permutohedral Attention Module for Efficient Non-Local Neural Networks (2) Permutohedral_attention_module formula
EXPAND

uses permutohedral lattice approximation algorithm to approximate the attention output

Large Memory Layers with Product Keys (30) XLM formula ✔️
EXPAND

search for nearest neighbor keys

Expectation-Maximization Attention Networks for Semantic Segmentation (42) EMANet formula
EXPAND

applys expectation maximization to cluster keys into k clusters

Compressive Transformers for Long-Range Sequence Modelling (21) compressive-transformer-pytorch formula ✔️
EXPAND

compresses distant tokens instead of just stop_grad() ing them, more efficient version of transformerXL

BP-Transformer: Modelling Long-Range Context via Binary Partitioning (9) BPT formula ✔️
EXPAND

attends to distant tokens coarsely and attends to close tokens in a more fine-grained manner

Axial Attention in Multidimensional Transformers (5) axial-attention formula ✔️
EXPAND

apply attention on each axis separately

Reformer: The Efficient Transformer (76) trax formula ✔️
EXPAND

uses LSH to find close keys

Transformer on a Diet (2) transformer-on-diet formula ✔️
EXPAND

dilated transformer like wavenet

Sparse Sinkhorn Attention (4) sinkhorn-transformer formula ✔️
EXPAND

uses a cost matrix to limit attention between buckets

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection (1) - formula ✔️
EXPAND

learns the q, k connections == dynamically creates a sparse attention matrix

Efficient Content-Based Sparse Attention with Routing Transformers (12) routing-transformer formula ✔️
EXPAND

computes attention with same-cluster tokens (computed by online k-means)

Longformer: The Long-Document Transformer (19) longformer formula ✔️
EXPAND

global + blocked attention

ETC: Encoding Long and Structured Data in Transformers (2) - formula
EXPAND

combines global attention (star transformer with multiple global tokens) with local attention

Neural Architecture Search for Lightweight Non-Local Networks (4) AutoNL formula
EXPAND

computes Q(KV) and also down samples q, k, v both in spatial and channel dimensions

Multi-scale Transformer Language Models (1) IN_PAPER formula ✔️
EXPAND

UNet like + retina attetion is something close to BP-Transformer

Jukebox: A Generative Model for Music (11) jukebox formula ✔️
EXPAND

better attention patterns from Sparse Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models (8) - formula ✔️
EXPAND

does not compute pairwise interactions

GMAT: Global Memory Augmentation for Transformers (0) gmat formula
EXPAND

adds global tokens

Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer (0) - formula ✔️
EXPAND

does not compute pairwise interactions and uses fixed mask patters

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers (1) google-research formula ✔️
EXPAND

calculate an unbiased stochastic approximation of the attention matrix

Linformer: Self-Attention with Linear Complexity (3) linformer-pytorch formula
EXPAND

project key and value from nd to kd

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (3) fast-transformers formula ✔️
EXPAND

uses phi(q)(phi(k)v) and also improves the sequential sampling step

Real-time Semantic Segmentation with Fast Attention (0) - formula
EXPAND

l2_norm(q)*(l2_norm(k)*v)

Fast Transformers with Clustered Attention (0) fast-transformers formula
EXPAND

groups queries together with LSH

Kronecker Attention Networks (0) kronecker-attention-pytorch formula
EXPAND

uses horizontal and lateral average matrices

Big Bird: Transformers for Longer Sequences (1) - formula
EXPAND

ETC with random connections

Tensor Low-Rank Reconstruction for Semantic Segmentation (1) - formula
EXPAND

decompose the full attention tensor into rank one tensors (CP decomposition)

Articles

About

list of efficient attention modules

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%