Skip to content

ccc0168/FlashMLA-PyTorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlashMLA PyTorch

PyTorch implementation of FlashMLA.

FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving. Currently released: BF16; Paged kvcache with block size of 64.

About

PyTorch implementation of FlashMLA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 55.2%
  • Python 44.8%