Release v0.3.0
Release: v0.3.0 [skip ci]
Minor
Patch
- Fix:
fmt::printthousand separator (6250d0f) - Fix: Detecting NUMA (8da88f6)
- Improve: Uniform benchmark naming (567d2e8)
- Improve: Scaling CUDA kernels (e57eff0)
- Make: Find
libnuma(d0f74e4) - Improve: Huge Pages and code style (aeabdce)
- Improve:
setzerooverset1intrinsics (1fbcd94) - Fix: Handling SSE tail (2731049)
- Make: Load STL symbols in GDB (f91941c)
- Fix: Calling unrolled AVX-512 variant (b5d4070)
- Fix: Misaligned non-temporal loads to ZMM (eed4f57)