feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB#4
Conversation
- Add 32 AVX-512 kernel functions in kernels_avx512.rs using raw core::arch::x86_64::* intrinsics (no wrapper types). Safe fn with #[target_feature], unsafe only around load/store (Rust 1.94). - BLAS L1: dot/axpy/scal/asum/nrm2/iamax for f32 and f64 - Element-wise: add/sub/mul/div for scalar and vec, f32 and f64 - Binary/HDC: hamming_distance (VPOPCNTDQ), popcount, dot_i8, hamming_batch - Wire AVX-512 kernels into dispatch! macro for all BLAS-1 ops - Add sgemm_mr()/dgemm_mr() for microkernel height by tier - GEMM: derive tile sizes from 64KB L1 cache constraint AVX-512 (MR=6, NR=16): KC=740, fills 99.4% of L1 AVX2 (MR=6, NR=8): KC=1163 Scalar (MR=4, NR=4): KC=2036 - GEMM inner loop uses dispatched axpy for SIMD acceleration - Fix CogRecord: CONTAINER_BYTES=16384 (was 4096), COGRECORD_BYTES=65536 - 209 tests pass, clippy clean https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d1baa6acbd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #[target_feature(enable = "avx512f")] | ||
| pub fn dot_f32(x: &[f32], y: &[f32]) -> f32 { | ||
| let n = x.len().min(y.len()); | ||
| let mut acc0 = _mm512_setzero_ps(); |
There was a problem hiding this comment.
Preserve MSRV by gating new AVX-512 code
The new AVX-512 module introduces #[target_feature(enable = "avx512f")] functions and direct AVX-512 intrinsics in the default build path, but this crate declares MSRV 1.64 and runs MSRV CI (Cargo.toml and .github/workflows/ci.yaml). On those toolchains, AVX-512 target-feature support/intrinsics are not available in stable form, so x86_64 builds fail before tests run; this effectively raises the minimum compiler version and breaks the project’s published compatibility contract.
Useful? React with 👍 / 👎.
Tactics 1-12 from the 34-tactic integration plan, adapted to ndarray: styles::rte — #1 Recursive Thought Expansion (Hofstadter) styles::htd — #2 Hierarchical Thought Decomposition (CLAM) styles::smad — #3 Structured Multi-Agent Debate (NARS revision) styles::tcp — #5 Thought Chain Pruning (Berry-Esseen) styles::irs — #9 Iterative Roleplay Synthesis (XOR binding) styles::mcp — #10 Meta-Cognition (Brier score calibration) styles::tca — #12 Temporal Context (Reichenbach tense) Plus additions to existing modules: causal_diff.rs — #4 reverse_trace() (Pearl Rung 3) bgz17_bridge.rs — #6 inject_noise() (simulated annealing) nars.rs — #7 adversarial_critique(), #11 detect_contradiction() cascade.rs — #8 adaptive_resolution() Every tactic is fn(Base17, NarsTruth) → result. No LLM prompting. 16 tests passing. API: crate::hpc::styles::rte::expand() etc. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
AVX2 (MR=6, NR=8): KC=1163
Scalar (MR=4, NR=4): KC=2036
https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7