Stars
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Official Implementation for Diffusion Models Without Classifier-free Guidance
Implementation of Autoregressive Diffusion in Pytorch
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
A PyTorch native library for large model training
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch