ParCIS Lab, BUPT
Popular repositories Loading
-
FlashSparse
FlashSparse PublicFlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by…
-
DNN-cpp-proxies
DNN-cpp-proxies PublicC++/MPI proxies for distributed training of deep neural networks.
C++ 1
Repositories
- FlashSparse Public
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.
- Ok-Topk Public
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.