A cross-platform Sparse Matrix Vector Multiplication (SpMV) framework for many-core architectures (GPUs and Xeon Phi).
The work is pulished in TACO'16, which is an extended work of yaSpMV. To exploit the performance of Intel Xeon Phi, we extend the BCCOO format by introducing inner-block transpose and propose a new segmented sum/scan to better utilize the 512-bit SIMD instructions. We also add double precision support for SpMV on NVIDIA and AMD GPUs. See the paper for details. Contact Shigang Li (shigangli.cs@gmail.com) and Shengen Yan (yanshengen@gmail.com) for more questions.
To cite our work:
@article{zhang2016cross,
title={A cross-platform SpMV framework on many-core architectures},
author={Zhang, Yunquan and Li, Shigang and Yan, Shengen and Zhou, Huiyang},
journal={ACM Transactions on Architecture and Code Optimization (TACO)},
volume={13},
number={4},
pages={1--25},
year={2016},
publisher={ACM New York, NY, USA}
}
See LICENSE.