Gated Linear Attention Layer

Standalone module of Gated Linear Attention (GLA) from Gated Linear Attention Transformers with Hardware-Efficient Training.

pip install -U git+https://github.com/sustcsonglin/flash-linear-attention

Warning: fused_chunk mode needs Triton2.2 + CUDA12 (See issue). You can use test to quickly see if you can use fused_chunk mode. If cannot, please refer to link and use chunk mode instead.

Usage

Load the checkpoint from huggingface.

from gla_model import GLAForCausalLM
model = GLAForCausalLM.from_pretrained("bailin28/gla-1B-100B")
vocab_size = model.config.vocab_size
bsz, seq_len = 32, 2048
x = torch.randint(high=vocab_size, size=(bsz, seq_len))
model_output = model(x)
loss = model_output.loss
logits = model_output.logits

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
gla.py		gla.py
gla_model.py		gla_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gated Linear Attention Layer

Usage

About

Releases

Packages

Contributors 2

Languages

License

berlino/gated_linear_attention

Folders and files

Latest commit

History

Repository files navigation

Gated Linear Attention Layer

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages