Preview version 0.9.0
Version 0.9.0
- Updated to version 6.0 of the CLCudaAPI C++11 OpenCL header
- Improved performance significantly of rotated GEMV computations
- Improved performance of unseen/un-tuned devices by a better default tuning parameter selection
- Fixed proper MSVC dllimport and dllexport declarations
- Fixed memory leaks related to events not being released
- Fixed a bug with a
size_t
andcl_ulong
mismatch on 32-bit systems - Fixed a bug related to the cache and retrieval of programs based on the OpenCL context
- Fixed a performance issue (caused by fp16 support) by optimizing alpha/beta parameter passing to kernels
- Fixed a bug in the OpenCL kernels: now placing
__kernel
before__attribute__
- Fixed a bug in level-3 routines when beta is zero and matrix C contains NaNs
- Added an option (
-warm_up
) to do a warm-up run before timing in the performance clients - Various minor fixes and enhancements
- Added tuned parameters for various devices (see README)