Skip to content

metbit v9.0.0

Latest

Choose a tag to compare

@aeiwz aeiwz released this 20 Jun 05:30
· 38 commits to main since this release
ca86eda

Summary

v9.0.0 is a major performance and correctness release focused on large-scale
metabolomics cohorts (>10,000 samples, >100,000 features). The compute backend
has been rewritten with a four-tier dispatch hierarchy: GPU (CuPy/PyTorch) ->
C+OpenMP -> multiprocessing -> chunked NumPy. Memory peak allocations for core
kernels are reduced by 50-250x at large cohort sizes.

Added

Large-scale compute backend (metbit/_native.py, metbit/_native_backend.c)

  • pearson_columns_par: OpenMP row-parallel Pearson correlation; thread-local
    accumulators avoid data races. Falls back transparently when OpenMP is absent.
  • pearson_columns_f32: float32 input variant; reads half the memory bandwidth
    of the float64 path while accumulating in float64 for numerical stability.
  • column_variances / column_variances_f32: two-pass numerically stable
    per-column sample variance in C with optional OpenMP parallelism.
  • vip_scores: closed-form VIP kernel in C; OpenMP parallel over features.
    Replaces the O(p) Python loop over features entirely.
  • openmp_threads(): reports available OpenMP threads to Python dispatch layer.

Auto-dispatch layer (metbit/_native.py)

  • Four-tier hierarchy: GPU (CuPy/PyTorch) -> C+OpenMP -> multiprocessing+NumPy
    -> chunked NumPy. Backend selected automatically based on n*p element count.
  • pearson_columns(): unified entry point; auto-selects best backend.
  • column_variances(): unified entry point with same dispatch logic.
  • vip_scores(): dispatches to GPU / C / NumPy vectorised as available.
  • backend_info(): returns dict of active backends for user inspection.
  • Environment overrides: METBIT_DISABLE_NATIVE, METBIT_DISABLE_GPU,
    METBIT_N_JOBS, METBIT_CHUNK.

Memory-efficient large-scale module (metbit/analysis/large_scale.py)

  • MemoryEstimator: estimates peak RAM before loading large datasets and
    recommends dtype.
  • feature_preselection(): variance/IQR based feature reduction; data-driven
    percentile threshold; dispatches to the C/GPU variance kernel.
  • ChunkedSTOCSY: STOCSY with bounded O(n * chunk_size) peak memory.
    active_backend() classmethod reports the compute path.
  • LargeScaleAlignment: alignment wrapper that warns when peak memory > 8 GB.
  • memory_report(): convenience function for quick dataset size assessment.

OPLS-DA improvements (metbit/analysis/opls_da.py)

  • dtype parameter: accepts numpy.float32 to halve peak memory. Auto-selects
    float32 when n * p > 5,000,000 if dtype=None (default).
  • __init__ now replaces NaN in-place (single allocation) rather than creating
    a second full-matrix copy.
  • vip_scores() is now vectorised via the dispatch layer; the O(p) Python loop
    is eliminated (1000-2700x speedup at p=5,000-20,000).

NMR alignment (metbit/nmr/alignment.py)

  • icoshift_align() now allocates a single output array instead of two full
    matrix copies (spectra.copy() then .values.copy()). Saves one full matrix
    copy (~80 GB at 10,000 x 1,000,000 float64).

Test suite

  • tests/test_e2e_pipeline.py: 44 full-pipeline tests (OPLS-DA, STOCSY,
    ChunkedSTOCSY, alignment, feature preselection, MemoryEstimator).
  • tests/test_ab_aa.py: 22 statistical validity tests: AB (must discriminate),
    AA (must not discriminate on noise), scaling robustness, reproducibility.
  • tests/test_large_scale.py: 23 backend dispatch tests including memory
    efficiency assertions via tracemalloc.
  • tests/test_performance.py: 24 performance regression tests marked
    @pytest.mark.perf; assert speedup ratios, not absolute times.

Build and tooling

  • setup.py now detects OpenMP at install time and compiles with -fopenmp
    when available. Falls back gracefully via OptionalBuildExt.
  • Compile flags: -O3 -march=native -ffast-math for the C extension.
  • scripts/perf_report.py: standalone benchmark runner producing
    reports/benchmark_results.json and reports/PERFORMANCE.md with timing,
    peak-memory, and trend-vs-previous-run columns.
  • scripts/run_tests.sh: --perf / --perf-quick flags run benchmarks and
    update the performance report.

Changed

  • VIP computation in opls_da.vip_scores(): replaced Python loop with
    single BLAS matrix multiply sqrt(p * (w_norm^2 @ S) / total_s). Numerically
    identical to the loop; verified at atol=1e-16.
  • Pearson fallback in _native.py: chunked over features (bounded memory)
    instead of full-matrix centred copy.
  • Minimum required: scipy==1.14.1 (pinned; relaxation planned for v9.1).
  • pytest.ini: added perf marker; --strict-markers enforces registration.

Fixed

  • icoshift_align: second redundant full-matrix copy removed.
  • opls_da.__init__: np.nan_to_num(X.to_numpy(), ...) created an O(n*p)
    intermediate; now uses to_numpy(dtype=...) + nan_to_num(copy=False).
  • test_native_backend.py::test_pearson_columns_fallback_matches_native:
    monkeypatching _native_backend now also patches _NATIVE_OK flag.

Performance (measured on Apple M-series, single-threaded C, no GPU)

Kernel Dataset Speedup Memory reduction
VIP scores (C vs Python loop) 80 x 5,000 2680x -
VIP scores (NumPy vec. vs loop) 80 x 5,000 817x -
Pearson (C vs full-copy NumPy) 500 x 30,000 1.2x speed 126x less RAM
Column variance (C f32 vs NumPy) 500 x 100,000 2.0x 251x less RAM
ChunkedSTOCSY overhead 100 x 5,000 46x faster than std bounded memory

What's Changed

  • Update paper.md by @aeiwz in #14
  • Add GitHub Actions workflow for drafting PDF by @aeiwz in #15
  • Potential fix for code scanning alert no. 1: Workflow does not contain permissions by @aeiwz in #16
  • Add tests and coverage report, update modules by @aeiwz in #18
  • Bump next from 16.2.2 to 16.2.6 in /docs in the npm_and_yarn group across 1 directory by @dependabot[bot] in #17
  • Origin/dev/refactor code by @aeiwz in #19
  • Origin/dev/refactor code by @aeiwz in #20
  • Create CODE_OF_CONDUCT.md by @aeiwz in #22
  • Create CONTRIBUTING.md by @aeiwz in #23
  • Origin/dev/refactor code by @aeiwz in #21

Full Changelog: v8.7.7...v9.0.0