Merged
Conversation
…tics Implement a runtime-discovered CBLAS backend for the matmul fast path on Linux/Windows, alongside the existing Accelerate/macOS path and naive fallback. Probe BLAS candidates from the active NumPy/conda environment, load providers exporting cblas_sgemm/cblas_dgemm, and fall back cleanly to naive when none fit. Control nested BLAS threading from Python with threadpoolctl around fast blosc2.matmul calls, but only for Linux CBLAS and only for small blocks. Use a benchmark-derived threshold of 192x192 to keep BLAS single-threaded for small GEMMs while avoiding regressions on larger ones; never apply this on macOS. Expose backend introspection via blosc2.get_matmul_library(), returning the loaded CBLAS library path or Accelerate.framework when available. Add BLOSC_TRACE diagnostics for CBLAS candidate probing, rejection, selection, and backend fallback decisions. Extend matmul benchmarks to report the active matmul library, compare against plain NumPy matmul, support warmup iterations, and use larger default problem sizes for steadier out-of-the-box results. Add tests covering backend selection, threadpoolctl usage/skips, threshold behavior, Darwin scoping, and matmul-library introspection; add threadpoolctl as a regular non-wasm dependency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This can detect either specialized BLAS libraries that are used by NumPy (Linux/Windows) or MacOS Accelerate to use dgemm/sgemm in combination with Blosc2 prefilters. The result is a great acceleration in matmul, as can be seen in the plot below.