feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB by AdaWorldAPI · Pull Request #4 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-03-15T21:46:08Z

Add 32 AVX-512 kernel functions in kernels_avx512.rs using raw core::arch::x86_64::* intrinsics (no wrapper types). Safe fn with #[target_feature], unsafe only around load/store (Rust 1.94).
BLAS L1: dot/axpy/scal/asum/nrm2/iamax for f32 and f64
Element-wise: add/sub/mul/div for scalar and vec, f32 and f64
Binary/HDC: hamming_distance (VPOPCNTDQ), popcount, dot_i8, hamming_batch
Wire AVX-512 kernels into dispatch! macro for all BLAS-1 ops
Add sgemm_mr()/dgemm_mr() for microkernel height by tier
GEMM: derive tile sizes from 64KB L1 cache constraint AVX-512 (MR=6, NR=16): KC=740, fills 99.4% of L1
AVX2 (MR=6, NR=8): KC=1163
Scalar (MR=4, NR=4): KC=2036
GEMM inner loop uses dispatched axpy for SIMD acceleration
Fix CogRecord: CONTAINER_BYTES=16384 (was 4096), COGRECORD_BYTES=65536
209 tests pass, clippy clean

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7

- Add 32 AVX-512 kernel functions in kernels_avx512.rs using raw core::arch::x86_64::* intrinsics (no wrapper types). Safe fn with #[target_feature], unsafe only around load/store (Rust 1.94). - BLAS L1: dot/axpy/scal/asum/nrm2/iamax for f32 and f64 - Element-wise: add/sub/mul/div for scalar and vec, f32 and f64 - Binary/HDC: hamming_distance (VPOPCNTDQ), popcount, dot_i8, hamming_batch - Wire AVX-512 kernels into dispatch! macro for all BLAS-1 ops - Add sgemm_mr()/dgemm_mr() for microkernel height by tier - GEMM: derive tile sizes from 64KB L1 cache constraint AVX-512 (MR=6, NR=16): KC=740, fills 99.4% of L1 AVX2 (MR=6, NR=8): KC=1163 Scalar (MR=4, NR=4): KC=2036 - GEMM inner loop uses dispatched axpy for SIMD acceleration - Fix CogRecord: CONTAINER_BYTES=16384 (was 4096), COGRECORD_BYTES=65536 - 209 tests pass, clippy clean https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1baa6acbd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-15T21:51:43Z

+#[target_feature(enable = "avx512f")]
+pub fn dot_f32(x: &[f32], y: &[f32]) -> f32 {
+    let n = x.len().min(y.len());
+    let mut acc0 = _mm512_setzero_ps();


Preserve MSRV by gating new AVX-512 code

The new AVX-512 module introduces #[target_feature(enable = "avx512f")] functions and direct AVX-512 intrinsics in the default build path, but this crate declares MSRV 1.64 and runs MSRV CI (Cargo.toml and .github/workflows/ci.yaml). On those toolchains, AVX-512 target-feature support/intrinsics are not available in stable form, so x86_64 builds fail before tests run; this effectively raises the minimum compiler version and breaks the project’s published compatibility contract.

Useful? React with 👍 / 👎.

Tactics 1-12 from the 34-tactic integration plan, adapted to ndarray: styles::rte — #1 Recursive Thought Expansion (Hofstadter) styles::htd — #2 Hierarchical Thought Decomposition (CLAM) styles::smad — #3 Structured Multi-Agent Debate (NARS revision) styles::tcp — #5 Thought Chain Pruning (Berry-Esseen) styles::irs — #9 Iterative Roleplay Synthesis (XOR binding) styles::mcp — #10 Meta-Cognition (Brier score calibration) styles::tca — #12 Temporal Context (Reichenbach tense) Plus additions to existing modules: causal_diff.rs — #4 reverse_trace() (Pearl Rung 3) bgz17_bridge.rs — #6 inject_noise() (simulated annealing) nars.rs — #7 adversarial_critique(), #11 detect_contradiction() cascade.rs — #8 adaptive_resolution() Every tactic is fn(Base17, NarsTruth) → result. No LLM prompting. 16 tests passing. API: crate::hpc::styles::rte::expand() etc. https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK

chatgpt-codex-connector Bot reviewed Mar 15, 2026

View reviewed changes

AdaWorldAPI merged commit c7f4c40 into master Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB#4

feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB#4
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-adaworld-ndarray-5IxqY

AdaWorldAPI commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Mar 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants