LLM, CUDA/System, Distributed training
-
KAUST (King Abdullah University of Science and Technology)
Pinned Loading
-
TheCoreTeam/core_scheduler
TheCoreTeam/core_scheduler PublicCoreScheduler: A High-Performance Scheduler for Large Model Training
-
Tiny-DeepSpeed
Tiny-DeepSpeed PublicTiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
-
Flash-Attention-Implementation
Flash-Attention-Implementation PublicImplementation of Flash-Attention (both forward and backward) with PyTorch, CUDA, and Triton
Python 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.