Skip to content

Release v0.3.0

Choose a tag to compare

@ashvardanian ashvardanian released this 19 Jan 19:23
· 82 commits to main since this release

Release: v0.3.0 [skip ci]

Minor

  • Add: BLAS with zero-stride (c39768e)
  • Add: Latency Hiding in AVX-512 (9a5a61a)

Patch

  • Fix: fmt::print thousand separator (6250d0f)
  • Fix: Detecting NUMA (8da88f6)
  • Improve: Uniform benchmark naming (567d2e8)
  • Improve: Scaling CUDA kernels (e57eff0)
  • Make: Find libnuma (d0f74e4)
  • Improve: Huge Pages and code style (aeabdce)
  • Improve: setzero over set1 intrinsics (1fbcd94)
  • Fix: Handling SSE tail (2731049)
  • Make: Load STL symbols in GDB (f91941c)
  • Fix: Calling unrolled AVX-512 variant (b5d4070)
  • Fix: Misaligned non-temporal loads to ZMM (eed4f57)