Skip to content

Pinned Loading

  1. flash-linear-attention Public

    🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

    Python 2.4k 164

  2. flame Public

    🔥 A minimal training framework for scaling FLA models

    Python 130 18

  3. native-sparse-attention Public

    🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

    Python 657 30

Repositories

Showing 10 of 10 repositories
  • flame Public

    🔥 A minimal training framework for scaling FLA models

    Python 130 MIT 18 5 0 Updated May 13, 2025
  • flash-linear-attention Public

    🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton

    Python 2,376 MIT 164 35 6 Updated May 13, 2025
  • fla-zoo Public

    Flash-Linear-Attention models beyond language

    Python 13 1 0 0 Updated May 13, 2025
  • fla-synthetic-kit Public

    Painless Evaluation of Flash Linear Attention models on Synthetic Tasks

    3 0 0 0 Updated May 13, 2025
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 0 Apache-2.0 7,475 0 0 Updated May 9, 2025
  • fla-rl Public

    A minimal RL frame work for scaling FLA models on long-horizon reasoning and agentic scenarios.

    4 MIT 0 0 0 Updated Apr 1, 2025
  • ThunderKittens Public Forked from HazyResearch/ThunderKittens

    Tile primitives for speedy kernels

    Cuda 2 MIT 142 0 0 Updated Mar 27, 2025
  • native-sparse-attention Public

    🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

    Python 657 MIT 30 8 0 Updated Mar 19, 2025
  • 7 0 0 0 Updated Mar 5, 2025
  • flash-bidirectional-linear-attention Public

    Triton implement of bi-directional (non-causal) linear attention

    Python 47 MIT 1 1 0 Updated Feb 4, 2025

Top languages

Loading…

Most used topics

Loading…