Skip to content

This PR ports high-performance computing (HPC) features from the rustynum library into ndarray, adding comprehensive linear algebra, statistical operations, hyperdimensional computing (HDC), and signal processing capabilities. The implementation uses a pluggable backend architecture with runtime CPU detection (AVX-512 → AVX2 → scalar) for optimal performance across different hardware.#3

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/setup-adaworld-ndarray-5IxqY
Mar 15, 2026

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

This PR ports high-performance computing (HPC) features from the rustynum library into ndarray, adding comprehensive linear algebra, statistical operations, hyperdimensional computing (HDC), and signal processing capabilities. The implementation uses a pluggable backend architecture with runtime CPU detection (AVX-512 → AVX2 → scalar) for optimal performance across different hardware.

Key Changes

  • Backend Architecture (src/backend/): Implemented pluggable SIMD dispatch system using LazyLock for one-time CPU tier detection. Supports AVX-512, AVX2+FMA, and scalar fallbacks with tier-specific GEMM register block sizes.

  • BLAS Operations (src/hpc/blas_level*.rs):

    • Level 1: Vector operations (dot, axpy, scal, nrm2, asum, iamax, copy, swap)
    • Level 2: Matrix-vector operations (gemv, ger, symv, trmv, trsv)
    • Level 3: Matrix-matrix operations (gemm, syrk, trsm, symm)
  • Quantized GEMM (src/hpc/quantized.rs): BF16 (bfloat16) type with f32 conversions and quantized matrix multiplication supporting multiple dequantization modes (symmetric, asymmetric, per-channel).

  • Statistical Operations (src/hpc/statistics.rs): Median, variance, standard deviation, percentiles, and axis-wise reductions extending ndarray's existing mean/sum.

  • LAPACK Factorizations (src/hpc/lapack.rs): LU, Cholesky, and QR decompositions with pure Rust implementations and optional MKL FFI backend.

  • Hyperdimensional Computing (src/hpc/hdc.rs, src/hpc/graph.rs): Binary hypervector operations (bind/XOR, permute/rotate, bundle/majority), VerbCodebook for knowledge graphs, and causality checking.

  • Signal Processing (src/hpc/fft.rs): Cooley-Tukey radix-2 FFT with optional MKL acceleration.

  • Specialized Containers (src/hpc/cogrecord.rs): 16KB cognitive units (4 × 4096-byte containers) queryable via Hamming distance or int8 dot product.

  • Utility Operations (src/hpc/bitwise.rs, src/hpc/vml.rs, src/hpc/activations.rs, src/hpc/projection.rs): Bitwise operations, vectorized math library, neural network activations, and SimHash projection.

  • Feature Gates (Cargo.toml): Added mutually exclusive backend selection: native (default, pure Rust SIMD), intel-mkl, and openblas.

  • Test Suite (bf16_test_src/): Comprehensive BF16 quantization test with 219 qualia items, demonstrating 16-dim + 1-bit intensity encoding with nibble-based codebook and distance metrics.

  • Documentation: Added agent profiles, strategic planning documents, and comparison analysis (rustynum vs ndarray).

Implementation Details

  • All BLAS operations use extension traits on ndarray's ArrayBase<S, D> for zero-cost abstraction
  • Runtime CPU detection happens once via LazyLock<Tier> at first function call
  • Quantized operations maintain f32 accumulation for numerical stability
  • Pure Rust implementations provide baseline; feature gates enable external BLAS libraries (MKL, OpenBLAS)
  • HDC operations leverage bitwise XOR for efficient hypervector binding
  • CogRecord provides 32KB cognitive units with SIMD-accelerated Hamming distance queries

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7

claude added 2 commits March 15, 2026 18:57
Implements the full rustynum → ndarray porting pipeline:

Backend:
- LinalgBackend trait with NativeBackend (SIMD-dispatched)
- AVX2+FMA dot product and axpy kernels with scalar fallback
- Tiled GEMM (64×64 tiles for L1 cache), GEMV
- Feature gates: native, intel-mkl, openblas (mutually exclusive)

BLAS Level 1-3 (src/hpc/blas_level*.rs):
- L1: dot, axpy, scal, nrm2, asum, iamax, copy, swap
- L1 SIMD: scalar/vec add/sub/mul/div for f32/f64
- L2: gemv, ger, symv, trmv, trsv (with Uplo/Diag enums)
- L3: gemm, syrk, symm, trsm (with Side enum)

Quantized GEMM (src/hpc/quantized.rs):
- BF16 type with f32 conversions (truncation + round-to-nearest)
- bf16_gemm_f32, mixed_precision_gemm
- Int8: quantize_f32_to_u8/i8/i4, int8_gemm_i32/f32, per-channel

LAPACK (src/hpc/lapack.rs):
- LU factorization (getrf) + solve (getrs)
- Cholesky factorization (potrf) + solve (potrs)
- QR factorization (geqrf)

FFT (src/hpc/fft.rs):
- Cooley-Tukey radix-2: fft/ifft for f32/f64
- Real-to-complex: rfft_f32

VML (src/hpc/vml.rs):
- vsexp/vdexp, vsln/vdln, vssqrt/vdsqrt, vsabs/vdabs
- vsadd, vsmul, vsdiv, vssin, vscos, vspow

Statistics (src/hpc/statistics.rs):
- median, variance, std_dev, percentile (with axis variants)
- argmin, argmax, top_k, cumsum, cosine_similarity, norm(p)

Activations (src/hpc/activations.rs):
- sigmoid, softmax, log_softmax (numerically stable)

HDC (src/hpc/hdc.rs):
- bind (XOR), permute (rotate), bundle (majority vote)
- bundle_byte_slices, dot_i8

Bitwise (src/hpc/bitwise.rs):
- hamming_distance (AVX2-accelerated), popcount
- hamming_distance_batch, hamming_top_k

Projection (src/hpc/projection.rs):
- simhash_project, simhash_batch_project, simhash_int8_project

CogRecord (src/hpc/cogrecord.rs):
- 4×4096-byte cognitive unit with sweep_adaptive, hdr_sweep
- to_bytes/from_bytes, sweep_cogrecords batch

Graph (src/hpc/graph.rs):
- VerbCodebook with encode_edge, decode_target (permute-based causality)
- causality_asymmetry, causality_check, find_non_causal_edges, infer_verb

All 209 tests pass. Clippy clean with -D warnings.

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
Use LazyLock<Tier> + dispatch! macro for SIMD dispatch instead of
LinalgBackend trait indirection. One detection at first call, cached
forever. SGEMM_NR now runtime-selected by tier (16/8/4 for
AVX-512/AVX2/Scalar). BlasFloat wired directly to dispatch functions.

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
@AdaWorldAPI AdaWorldAPI merged commit 5d0653e into master Mar 15, 2026
AdaWorldAPI pushed a commit that referenced this pull request Mar 30, 2026
Tactics 1-12 from the 34-tactic integration plan, adapted to ndarray:
  styles::rte  — #1  Recursive Thought Expansion (Hofstadter)
  styles::htd  — #2  Hierarchical Thought Decomposition (CLAM)
  styles::smad — #3  Structured Multi-Agent Debate (NARS revision)
  styles::tcp  — #5  Thought Chain Pruning (Berry-Esseen)
  styles::irs  — #9  Iterative Roleplay Synthesis (XOR binding)
  styles::mcp  — #10 Meta-Cognition (Brier score calibration)
  styles::tca  — #12 Temporal Context (Reichenbach tense)

Plus additions to existing modules:
  causal_diff.rs — #4  reverse_trace() (Pearl Rung 3)
  bgz17_bridge.rs — #6  inject_noise() (simulated annealing)
  nars.rs — #7  adversarial_critique(), #11 detect_contradiction()
  cascade.rs — #8  adaptive_resolution()

Every tactic is fn(Base17, NarsTruth) → result. No LLM prompting.
16 tests passing. API: crate::hpc::styles::rte::expand() etc.

https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants