This PR ports high-performance computing (HPC) features from the rustynum library into ndarray, adding comprehensive linear algebra, statistical operations, hyperdimensional computing (HDC), and signal processing capabilities. The implementation uses a pluggable backend architecture with runtime CPU detection (AVX-512 → AVX2 → scalar) for optimal performance across different hardware.#3
Merged
Conversation
Implements the full rustynum → ndarray porting pipeline: Backend: - LinalgBackend trait with NativeBackend (SIMD-dispatched) - AVX2+FMA dot product and axpy kernels with scalar fallback - Tiled GEMM (64×64 tiles for L1 cache), GEMV - Feature gates: native, intel-mkl, openblas (mutually exclusive) BLAS Level 1-3 (src/hpc/blas_level*.rs): - L1: dot, axpy, scal, nrm2, asum, iamax, copy, swap - L1 SIMD: scalar/vec add/sub/mul/div for f32/f64 - L2: gemv, ger, symv, trmv, trsv (with Uplo/Diag enums) - L3: gemm, syrk, symm, trsm (with Side enum) Quantized GEMM (src/hpc/quantized.rs): - BF16 type with f32 conversions (truncation + round-to-nearest) - bf16_gemm_f32, mixed_precision_gemm - Int8: quantize_f32_to_u8/i8/i4, int8_gemm_i32/f32, per-channel LAPACK (src/hpc/lapack.rs): - LU factorization (getrf) + solve (getrs) - Cholesky factorization (potrf) + solve (potrs) - QR factorization (geqrf) FFT (src/hpc/fft.rs): - Cooley-Tukey radix-2: fft/ifft for f32/f64 - Real-to-complex: rfft_f32 VML (src/hpc/vml.rs): - vsexp/vdexp, vsln/vdln, vssqrt/vdsqrt, vsabs/vdabs - vsadd, vsmul, vsdiv, vssin, vscos, vspow Statistics (src/hpc/statistics.rs): - median, variance, std_dev, percentile (with axis variants) - argmin, argmax, top_k, cumsum, cosine_similarity, norm(p) Activations (src/hpc/activations.rs): - sigmoid, softmax, log_softmax (numerically stable) HDC (src/hpc/hdc.rs): - bind (XOR), permute (rotate), bundle (majority vote) - bundle_byte_slices, dot_i8 Bitwise (src/hpc/bitwise.rs): - hamming_distance (AVX2-accelerated), popcount - hamming_distance_batch, hamming_top_k Projection (src/hpc/projection.rs): - simhash_project, simhash_batch_project, simhash_int8_project CogRecord (src/hpc/cogrecord.rs): - 4×4096-byte cognitive unit with sweep_adaptive, hdr_sweep - to_bytes/from_bytes, sweep_cogrecords batch Graph (src/hpc/graph.rs): - VerbCodebook with encode_edge, decode_target (permute-based causality) - causality_asymmetry, causality_check, find_non_causal_edges, infer_verb All 209 tests pass. Clippy clean with -D warnings. https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
Use LazyLock<Tier> + dispatch! macro for SIMD dispatch instead of LinalgBackend trait indirection. One detection at first call, cached forever. SGEMM_NR now runtime-selected by tier (16/8/4 for AVX-512/AVX2/Scalar). BlasFloat wired directly to dispatch functions. https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
AdaWorldAPI
pushed a commit
that referenced
this pull request
Mar 30, 2026
Tactics 1-12 from the 34-tactic integration plan, adapted to ndarray: styles::rte — #1 Recursive Thought Expansion (Hofstadter) styles::htd — #2 Hierarchical Thought Decomposition (CLAM) styles::smad — #3 Structured Multi-Agent Debate (NARS revision) styles::tcp — #5 Thought Chain Pruning (Berry-Esseen) styles::irs — #9 Iterative Roleplay Synthesis (XOR binding) styles::mcp — #10 Meta-Cognition (Brier score calibration) styles::tca — #12 Temporal Context (Reichenbach tense) Plus additions to existing modules: causal_diff.rs — #4 reverse_trace() (Pearl Rung 3) bgz17_bridge.rs — #6 inject_noise() (simulated annealing) nars.rs — #7 adversarial_critique(), #11 detect_contradiction() cascade.rs — #8 adaptive_resolution() Every tactic is fn(Base17, NarsTruth) → result. No LLM prompting. 16 tests passing. API: crate::hpc::styles::rte::expand() etc. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR ports high-performance computing (HPC) features from the rustynum library into ndarray, adding comprehensive linear algebra, statistical operations, hyperdimensional computing (HDC), and signal processing capabilities. The implementation uses a pluggable backend architecture with runtime CPU detection (AVX-512 → AVX2 → scalar) for optimal performance across different hardware.
Key Changes
Backend Architecture (
src/backend/): Implemented pluggable SIMD dispatch system usingLazyLockfor one-time CPU tier detection. Supports AVX-512, AVX2+FMA, and scalar fallbacks with tier-specific GEMM register block sizes.BLAS Operations (
src/hpc/blas_level*.rs):Quantized GEMM (
src/hpc/quantized.rs): BF16 (bfloat16) type with f32 conversions and quantized matrix multiplication supporting multiple dequantization modes (symmetric, asymmetric, per-channel).Statistical Operations (
src/hpc/statistics.rs): Median, variance, standard deviation, percentiles, and axis-wise reductions extending ndarray's existing mean/sum.LAPACK Factorizations (
src/hpc/lapack.rs): LU, Cholesky, and QR decompositions with pure Rust implementations and optional MKL FFI backend.Hyperdimensional Computing (
src/hpc/hdc.rs,src/hpc/graph.rs): Binary hypervector operations (bind/XOR, permute/rotate, bundle/majority), VerbCodebook for knowledge graphs, and causality checking.Signal Processing (
src/hpc/fft.rs): Cooley-Tukey radix-2 FFT with optional MKL acceleration.Specialized Containers (
src/hpc/cogrecord.rs): 16KB cognitive units (4 × 4096-byte containers) queryable via Hamming distance or int8 dot product.Utility Operations (
src/hpc/bitwise.rs,src/hpc/vml.rs,src/hpc/activations.rs,src/hpc/projection.rs): Bitwise operations, vectorized math library, neural network activations, and SimHash projection.Feature Gates (
Cargo.toml): Added mutually exclusive backend selection:native(default, pure Rust SIMD),intel-mkl, andopenblas.Test Suite (
bf16_test_src/): Comprehensive BF16 quantization test with 219 qualia items, demonstrating 16-dim + 1-bit intensity encoding with nibble-based codebook and distance metrics.Documentation: Added agent profiles, strategic planning documents, and comparison analysis (rustynum vs ndarray).
Implementation Details
ArrayBase<S, D>for zero-cost abstractionLazyLock<Tier>at first function callhttps://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7