Skip to content

berlino/gated_linear_attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gated Linear Attention Layer

Standalone module of Gated Linear Attention (GLA) from Gated Linear Attention Transformers with Hardware-Efficient Training.

pip install -U git+https://github.com/sustcsonglin/flash-linear-attention

Warning: fused_chunk mode needs Triton2.2 + CUDA12 (See issue). You can use test to quickly see if you can use fused_chunk mode. If cannot, please refer to link and use chunk mode instead.

Usage

Load the checkpoint from huggingface.

from gla_model import GLAForCausalLM
model = GLAForCausalLM.from_pretrained("bailin28/gla-1B-100B")
vocab_size = model.config.vocab_size
bsz, seq_len = 32, 2048
x = torch.randint(high=vocab_size, size=(bsz, seq_len))
model_output = model(x)
loss = model_output.loss
logits = model_output.logits

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages