Skip to content

feat(examples): NdarrayGraphPlugin — Bevy SIMD-accelerated graph rendering via ndarray polyfill#1

Merged
AdaWorldAPI merged 4 commits into
mainfrom
claude/ndarray-simd-review-S0zXK
May 13, 2026
Merged

feat(examples): NdarrayGraphPlugin — Bevy SIMD-accelerated graph rendering via ndarray polyfill#1
AdaWorldAPI merged 4 commits into
mainfrom
claude/ndarray-simd-review-S0zXK

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Real Bevy plugin demonstrating SIMD-accelerated graph (nodes+edges) rendering using AdaWorldAPI/ndarray's crate::simd::F32x16 polyfill and the Pumpkin/Mindcraft-derived palette framebuffer. Produced by a 12-agent CCA2A fleet (full breakdown in ndarray's AGENT_LOG.md).

What ships

  • examples/ndarray_graph_plugin.rs (~270 lines) — NdarrayGraphPlugin with GraphRenderer Resource, startup seeder (64 nodes circle layout + 80 edges), tick_renderer + render_to_framebuffer Update systems. Uses crate::simd::F32x16::mul_add via Renderer::tick → integrate_simd, and compose_neo4j (Pumpkin rasterizer) into a long-lived 512×512 Framebuffer → palette-expanded RGBA8 → Bevy ImageSprite.
  • examples/ndarray_graph_palette.rs — shared PALETTE_LUT (16 × RGBA8) + blit_u8_palette_to_rgba helper.
  • examples/ndarray_graph_plugin_tests.rs — 5 headless integration tests (all pass on Sapphire Rapids).
  • examples/README_NDARRAY_PLUGIN.md — usage doc + architecture diagram.
  • .github/workflows/ndarray-smoke.yml — CI workflow targeting x86-64-v3 (CI runners don't have AVX-512).

Test results

$ cargo run --release --example ndarray_graph_plugin_tests
[test 1] PASS: GraphRenderer resource present, tick_count=0
[test 2] PASS: front.len=2 edges.len=1
[test 3] PASS: position[0] 10.0 → 10.016666 (= 1.0 * DT_60 + 10.0)
             confirms F32x16::mul_add polyfill ran inside Bevy
[test 4] PASS: compose_neo4j emitted 106 non-zero pixels
[test 5] simd_caps: avx512f=true avx2=true fma=true; lanes=8
[test 5] PASS: x86_64 has avx512f or avx2
=== ALL TESTS PASSED ===

Architecture

Bevy App
   ↓
NdarrayGraphPlugin (Plugin trait)
   ↓ Startup
seed_graph (64 nodes in circle + 80 edges)
   ↓ Update (every frame)
tick_renderer → ndarray::hpc::renderer::Renderer::tick
              → integrate_simd → F32x16::mul_add (polyfill)
   ↓
render_to_framebuffer → ndarray::hpc::framebuffer::compose_neo4j
                      → Framebuffer (512×512 u8 palette)
                      → blit_u8_palette_to_rgba → Bevy Image
   ↓
Sprite → wgpu (2D upload + present)

GPU-less hosts (Railway / HuggingFace / Cloudflare / serverless)

This plugin is a pure CPU path by design. The Pumpkin/Mindcraft framebuffer was built for the no-GPU case — palette indices on CPU, 4 bpp wire format via Framebuffer::pack(), client paints. The fleet's audit confirmed that Bevy's bevy_pbr / atmosphere / skinning paths are GPU-offloaded on hosts with GPUs (so SIMD wins there are unreachable), but this plugin works identically on GPU-less hosts because nothing in its hot path touches wgpu's compute layer.

CI

.github/workflows/ndarray-smoke.yml targets stock ubuntu-latest runners (x86-64-v3, AVX2 baseline). Does NOT use cargo build-avx512 (would SIGILL on CI runners that lack AVX-512). Local AVX-512 builds use the alias from .cargo/config_ndarray_simd.toml (already on this branch in commit 67182a9).

Companion PR

ndarray-side SimdCaps extensions (AMX/VNNI/BF16 fields) ship in AdaWorldAPI/ndarray claude/simd-caps-amx-round2. Not a hard dependency — this plugin works against ndarray master as-is.

Audit deferrals (fleet output)

The 6 audit agents inventoried Bevy upstream SIMD opportunities. Top findings:

  • Frustum cull: still running at fleet wrap; expected to be the highest-ROI rewrite target (per-entity CPU work even with GPU)
  • Skinning: NOT-WORTH (GPU-side WGSL; CPU SIMD saves 14µs vs 0.5-2ms GPU floor)
  • Mesh vertex loops: setup-once (asset-import only, not per-frame)
  • Atmosphere/SSAO color: 0/10 candidates (GPU-only formats)
  • ndarray hpc/ cosmetic-SIMD sweep*: blocked on completing the U8x32 polyfill in simd_avx2.rs (currently absent — keystone work)
  • AMX detection: 7/8 sites foldable into SimdCaps; 1 (Linux prctl) is per-thread and will SIGILL on rayon workers if AMX paths get rayon-parallelized later

Generated by Claude Code

claude added 3 commits May 13, 2026 11:21
End-to-end smoke test verifying the AdaWorldAPI/ndarray SIMD polyfill is
reachable from a Bevy downstream crate.

Asserts:
  1. simd_caps() LazyLock reports the live CPU tier
  2. F32x16::mul_add is bit-exact against scalar f32::mul_add
  3. integrate_simd advances positions by exactly v * dt
  4. integrate_simd_par (rayon × SIMD) is bit-identical to sequential
  5. compose_neo4j emits both node and edge palette pixels

What it proves: target-cpu propagation, runtime↔compile-time tier
agreement, the Pumpkin-derived rasterizer is library-callable, and
rayon par_chunks_mut composes cleanly with F32x16::mul_add.

Headless: links the full bevy crate and runs MinimalPlugins for one
Update tick before exiting via MessageWriter<AppExit>. Verifies the
link, no window.
GitHub Actions runners support x86-64-v3 (AVX2) but NOT x86-64-v4
(AVX-512). Unconditionally setting target-cpu=x86-64-v4 would break CI;
unconditionally leaving the default would mean the ndarray polyfill
never picks its AVX-512 type path even on capable hardware (the
ndarray_simd_smoke example proved this is observable: avx512f=true at
runtime but PREFERRED_F32_LANES=8 at compile time).

This template provides both profiles, opt-in:

  cargo build            → x86-64-v3 (AVX2 baseline, CI-safe)
  cargo build-avx512     → x86-64-v4 (AVX-512, 16-lane F32x16)
  cargo run-avx512       → ditto
  cargo test-avx512      → ditto
  cargo check-avx512     → ditto

Follows the existing Bevy convention of providing .cargo/config_*.toml
template files that users copy into the gitignored .cargo/config.toml.

Companion to AdaWorldAPI/ndarray PR bevyengine#142 (VBMI gate + Inf clamp + NaN
preservation in simd_exp_f32).
… SIMD

Produced by the 12-agent CCA2A round-2 fleet (see ndarray's
.claude/board/AGENT_LOG.md for full agent breakdown). Delivers the
"Bevy works on SIMD" goal: a real Bevy plugin that uses ndarray's
crate::simd polyfill end-to-end for graph rendering, plus a CI workflow,
headless integration tests, a shared palette LUT, and usage docs.

Files added:
- examples/ndarray_graph_plugin.rs (~270 lines) — NdarrayGraphPlugin
  with GraphRenderer Resource, startup seeder (64 nodes in circle layout,
  80 ring + cross edges), tick_renderer + render_to_framebuffer Update
  systems. Uses crate::simd::F32x16::mul_add via Renderer::tick →
  integrate_simd, and compose_neo4j (Pumpkin-derived rasterizer) into a
  long-lived 512x512 Framebuffer that gets palette-expanded to RGBA8
  and blitted into a Bevy Image displayed as a Sprite.
- examples/ndarray_graph_palette.rs — shared PALETTE_LUT [16 x RGBA8]
  + blit_u8_palette_to_rgba helper, both imported by the plugin via
  #[path = "ndarray_graph_palette.rs"] mod palette.
- examples/ndarray_graph_plugin_tests.rs — 5 headless integration tests
  (resource init, startup seed, F32x16::mul_add position advance,
  compose_neo4j pixel emission, simd_caps runtime detect). Runs as
  cargo run --example ndarray_graph_plugin_tests; all pass.
- examples/README_NDARRAY_PLUGIN.md — usage doc (build, run, what it
  shows, architecture ASCII diagram, compile-time vs runtime tier
  explanation, companion files).
- .github/workflows/ndarray-smoke.yml — GitHub Actions x86-64-v3
  baseline build (CI runners don't have AVX-512); installs Bevy system
  deps (libwayland-dev / libasound2-dev / libudev-dev); clones sibling
  ndarray via the same branch name with master fallback; cargo check
  on ndarray_simd_smoke + ndarray_graph_plugin.

Cargo.toml: two [[example]] entries (ndarray_graph_plugin,
ndarray_graph_plugin_tests).

Verified (Sapphire Rapids, x86-64-v3 build):
  cargo check --example ndarray_graph_plugin: clean
  cargo check --example ndarray_graph_plugin_tests: clean
  cargo check --example ndarray_simd_smoke: clean (regression-safe)
  cargo run --release --example ndarray_graph_plugin_tests:
    [test 1] PASS: GraphRenderer resource present, tick_count=0
    [test 2] PASS: front.len=2 edges.len=1
    [test 3] PASS: position[0] 10.0 -> 10.016666 (= 1.0 * DT_60 + 10.0,
                   confirms F32x16::mul_add polyfill ran inside Bevy)
    [test 4] PASS: compose_neo4j emitted 106 non-zero pixels
    [test 5] simd_caps: avx512f=true avx2=true fma=true; lanes=8
    [test 5] PASS: x86_64 has avx512f or avx2

Notable: the [test 5] line surfaces the compile-time vs runtime mismatch
(lanes=8 because CI-baseline cargo build, but CPU has avx512f=true).
cargo run-avx512 from .cargo/config_ndarray_simd.toml (already on this
branch) lifts that to lanes=16.

Architecture note for GPU-less hosts (Railway / HuggingFace Spaces /
Cloudflare / serverless): this plugin is a CPU-only path. The
Pumpkin-derived framebuffer was designed for the no-GPU case — palette
indices on CPU, 4 bpp wire format via Framebuffer::pack(). The audit
sub-fleet confirmed bevy_pbr / atmosphere / skinning paths are
GPU-offloaded on hosts with GPUs, but this plugin remains entirely
SIMD-CPU and works identically without a GPU.
@github-actions
Copy link
Copy Markdown

Welcome, new contributor!

Please make sure you've read our contributing guide, as well as our policy regarding AI usage, and we look forward to reviewing your pull request shortly ✨

Comment thread .github/workflows/ndarray-smoke.yml Fixed
Comment thread .github/workflows/ndarray-smoke.yml Fixed
Comment thread .github/workflows/ndarray-smoke.yml Fixed
Comment thread .github/workflows/ndarray-smoke.yml Fixed
Comment thread .github/workflows/ndarray-smoke.yml Fixed
Comment thread .github/workflows/ndarray-smoke.yml Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca4a973aea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Cargo.toml Outdated
# checkout for the ndarray_simd_smoke example (proves crate::simd::F32x16
# routes correctly through to AVX-512/AMX/AVX2/NEON from a downstream crate
# and that integrate_simd_par composes rayon × SIMD bit-identically).
ndarray = { path = "../ndarray", features = ["rayon"] }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid requiring a sibling ndarray checkout

When the repository is checked out normally without a manually cloned ../ndarray, this path dev-dependency prevents Cargo from loading the workspace at all, not just these new examples. I verified this in the repo with cargo metadata --no-deps --format-version 1, which fails while reading /workspace/ndarray/Cargo.toml; that means ordinary commands for unrelated packages/examples are blocked unless every developer and CI job creates the sibling checkout first. Please avoid a mandatory parent-directory path dependency in the root manifest, or gate/patch it so the default workspace remains loadable.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

You added a new example but didn't add metadata for it. Please update the root Cargo.toml file.

… PR #1

Three classes of CI failure caught by the PR #1 review pass:

1. CODEX P1 — path = "../ndarray" broke cargo metadata workspace-wide on
   any host without the sibling checkout. Every cargo command on every
   package failed. Switched to a git dev-dep on
   https://github.com/AdaWorldAPI/ndarray.git branch master.

2. UPSTREAM CI MATRIX FAILURE — adding the ndarray dev-dep made upstream
   bevy CI try to fetch + build ndarray on macOS / Windows runners, where
   ndarray's AMX inline asm + Linux prctl path do not yet compile. Two
   gates layered:
   a. Target.cfg-gated the ndarray dev-dep to
      `cfg(all(target_os = "linux", target_arch = "x86_64"))` so non-
      supported platforms never try to resolve the dep.
   b. Added `ndarray-examples = []` feature + `required-features =
      ["ndarray-examples"]` on all three [[example]] entries so
      `cargo build --examples` without the feature does not pick them
      up at all. Upstream CI does not enable this feature; our
      .github/workflows/ndarray-smoke.yml does.

3. ZIZMOR security findings on the workflow:
   - "Workflow does not contain permissions" → added explicit
     `permissions: contents: read` at workflow level.
   - "code injection via template expansion" → the
     `${{ github.head_ref || github.ref_name }}` in the run: block was
     a code-injection surface (a maliciously-named branch could inject
     shell). Removed entirely: with the git dev-dep change above, the
     workflow no longer needs to clone ../ndarray, so the template
     expansion site is gone.
   - "unpinned action reference" → pinned actions/checkout@v4 to commit
     SHA 692973e3d937129bcbf40652eb9f2f61becf3332 (v4.1.7) and
     dtolnay/rust-toolchain@1.95.0 to its commit SHA
     f04cf2e09f5b6448b46c0aa9893a76ee36ed64c2.

Verified:
  cargo check --example ndarray_graph_plugin --features ndarray-examples
  → clean (git dep resolves, plugin compiles, ndarray builds via git)
  cargo check --examples (no feature)
  → does NOT touch the ndarray_* examples (required-features works)
@AdaWorldAPI AdaWorldAPI merged commit e55cf4d into main May 13, 2026
27 of 40 checks passed
# The action treats "1.95.0" as a toolchain version, but the action ref
# itself must be a commit SHA. Commit f04cf2e09f5b6448b46c0aa9893a76ee36ed64c2
# corresponds to the stable tag.
- uses: dtolnay/rust-toolchain@f04cf2e09f5b6448b46c0aa9893a76ee36ed64c2
steps:
# Pinned to commit SHA per zizmor unpinned-action rule on PR #1.
# v4.1.7 corresponds to commit 692973e3d937129bcbf40652eb9f2f61becf3332.
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332
@github-actions
Copy link
Copy Markdown

You added a new feature but didn't update the readme. Please run cargo run -p build-templated-pages -- update features to update it, and commit the file change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants