Skip to content

feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB#4

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-adaworld-ndarray-5IxqY
Mar 15, 2026
Merged

feat: AVX-512 kernels, cache-derived GEMM tiles, CogRecord 64KB#4
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-adaworld-ndarray-5IxqY

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

  • Add 32 AVX-512 kernel functions in kernels_avx512.rs using raw core::arch::x86_64::* intrinsics (no wrapper types). Safe fn with #[target_feature], unsafe only around load/store (Rust 1.94).
  • BLAS L1: dot/axpy/scal/asum/nrm2/iamax for f32 and f64
  • Element-wise: add/sub/mul/div for scalar and vec, f32 and f64
  • Binary/HDC: hamming_distance (VPOPCNTDQ), popcount, dot_i8, hamming_batch
  • Wire AVX-512 kernels into dispatch! macro for all BLAS-1 ops
  • Add sgemm_mr()/dgemm_mr() for microkernel height by tier
  • GEMM: derive tile sizes from 64KB L1 cache constraint AVX-512 (MR=6, NR=16): KC=740, fills 99.4% of L1
    AVX2 (MR=6, NR=8): KC=1163
    Scalar (MR=4, NR=4): KC=2036
  • GEMM inner loop uses dispatched axpy for SIMD acceleration
  • Fix CogRecord: CONTAINER_BYTES=16384 (was 4096), COGRECORD_BYTES=65536
  • 209 tests pass, clippy clean

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7

- Add 32 AVX-512 kernel functions in kernels_avx512.rs using raw
  core::arch::x86_64::* intrinsics (no wrapper types). Safe fn with
  #[target_feature], unsafe only around load/store (Rust 1.94).
- BLAS L1: dot/axpy/scal/asum/nrm2/iamax for f32 and f64
- Element-wise: add/sub/mul/div for scalar and vec, f32 and f64
- Binary/HDC: hamming_distance (VPOPCNTDQ), popcount, dot_i8, hamming_batch
- Wire AVX-512 kernels into dispatch! macro for all BLAS-1 ops
- Add sgemm_mr()/dgemm_mr() for microkernel height by tier
- GEMM: derive tile sizes from 64KB L1 cache constraint
  AVX-512 (MR=6, NR=16): KC=740, fills 99.4% of L1
  AVX2    (MR=6, NR=8):  KC=1163
  Scalar  (MR=4, NR=4):  KC=2036
- GEMM inner loop uses dispatched axpy for SIMD acceleration
- Fix CogRecord: CONTAINER_BYTES=16384 (was 4096), COGRECORD_BYTES=65536
- 209 tests pass, clippy clean

https://claude.ai/code/session_01QnZH2adQ6oXzTYPkLTH6G7
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1baa6acbd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +26
#[target_feature(enable = "avx512f")]
pub fn dot_f32(x: &[f32], y: &[f32]) -> f32 {
let n = x.len().min(y.len());
let mut acc0 = _mm512_setzero_ps();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve MSRV by gating new AVX-512 code

The new AVX-512 module introduces #[target_feature(enable = "avx512f")] functions and direct AVX-512 intrinsics in the default build path, but this crate declares MSRV 1.64 and runs MSRV CI (Cargo.toml and .github/workflows/ci.yaml). On those toolchains, AVX-512 target-feature support/intrinsics are not available in stable form, so x86_64 builds fail before tests run; this effectively raises the minimum compiler version and breaks the project’s published compatibility contract.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit c7f4c40 into master Mar 15, 2026
AdaWorldAPI pushed a commit that referenced this pull request Mar 30, 2026
Tactics 1-12 from the 34-tactic integration plan, adapted to ndarray:
  styles::rte  — #1  Recursive Thought Expansion (Hofstadter)
  styles::htd  — #2  Hierarchical Thought Decomposition (CLAM)
  styles::smad — #3  Structured Multi-Agent Debate (NARS revision)
  styles::tcp  — #5  Thought Chain Pruning (Berry-Esseen)
  styles::irs  — #9  Iterative Roleplay Synthesis (XOR binding)
  styles::mcp  — #10 Meta-Cognition (Brier score calibration)
  styles::tca  — #12 Temporal Context (Reichenbach tense)

Plus additions to existing modules:
  causal_diff.rs — #4  reverse_trace() (Pearl Rung 3)
  bgz17_bridge.rs — #6  inject_noise() (simulated annealing)
  nars.rs — #7  adversarial_critique(), #11 detect_contradiction()
  cascade.rs — #8  adaptive_resolution()

Every tactic is fn(Base17, NarsTruth) → result. No LLM prompting.
16 tests passing. API: crate::hpc::styles::rte::expand() etc.

https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants