Skip to content

Matmul optim#616

Merged
FrancescAlted merged 9 commits intomainfrom
matmul-optim
Apr 12, 2026
Merged

Matmul optim#616
FrancescAlted merged 9 commits intomainfrom
matmul-optim

Conversation

@FrancescAlted
Copy link
Copy Markdown
Member

@FrancescAlted FrancescAlted commented Apr 12, 2026

This can detect either specialized BLAS libraries that are used by NumPy (Linux/Windows) or MacOS Accelerate to use dgemm/sgemm in combination with Blosc2 prefilters. The result is a great acceleration in matmul, as can be seen in the plot below.

matmul-float32-m4pro

FrancescAlted and others added 9 commits March 23, 2026 13:34
…tics

  Implement a runtime-discovered CBLAS backend for the matmul fast path on
  Linux/Windows, alongside the existing Accelerate/macOS path and naive fallback.
  Probe BLAS candidates from the active NumPy/conda environment, load providers
  exporting cblas_sgemm/cblas_dgemm, and fall back cleanly to naive when none fit.

  Control nested BLAS threading from Python with threadpoolctl around fast
  blosc2.matmul calls, but only for Linux CBLAS and only for small blocks.
  Use a benchmark-derived threshold of 192x192 to keep BLAS single-threaded for
  small GEMMs while avoiding regressions on larger ones; never apply this on macOS.

  Expose backend introspection via blosc2.get_matmul_library(), returning the
  loaded CBLAS library path or Accelerate.framework when available.
  Add BLOSC_TRACE diagnostics for CBLAS candidate probing, rejection, selection,
  and backend fallback decisions.

  Extend matmul benchmarks to report the active matmul library, compare against
  plain NumPy matmul, support warmup iterations, and use larger default problem
  sizes for steadier out-of-the-box results.

  Add tests covering backend selection, threadpoolctl usage/skips, threshold
  behavior, Darwin scoping, and matmul-library introspection; add threadpoolctl
  as a regular non-wasm dependency.
@FrancescAlted FrancescAlted merged commit 8845a05 into main Apr 12, 2026
16 of 17 checks passed
@FrancescAlted FrancescAlted deleted the matmul-optim branch April 12, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant