Skip to content

v0.4: Arm NEON, SVE, and OpenMP-like Pools

Choose a tag to compare

@ashvardanian ashvardanian released this 03 May 20:30
· 59 commits to main since this release

This minor release implements the missing NEON and SVE kernel variants, which don't yield any noticeable improvements for float single-precision inputs compared to GCC 12 auto-vectorization on dual-socket Graviton 4. This release also adds a minimalistic thread-pool implementation via fork_union, that yields lower performance than OpenMP on small inputs, highlighting the need for more work.

Minor

  • Add: fork_union parallel version (c40b7f3)
  • Add: Generic OpenMP pool (cce8be4)
  • Add: NEON and SVE kernels (32d7d3e)

Patch

  • Docs: Remove namespace nesting (5e73225)
  • Make: Formatting CMake (82119ea)
  • Docs: Ordering header notes (a7891c7)