Skip to content

feat(net): PhyriadNet/1 pillar — Phases 0-6 complete#8

Merged
Swately merged 7 commits into
mainfrom
claude/nostalgic-knuth-d05369
May 19, 2026
Merged

feat(net): PhyriadNet/1 pillar — Phases 0-6 complete#8
Swately merged 7 commits into
mainfrom
claude/nostalgic-knuth-d05369

Conversation

@Swately
Copy link
Copy Markdown
Owner

@Swately Swately commented May 19, 2026

Summary

New framework pillar (phyriad_net) exposing the Phyriad pool runtime over
UDP via a custom 16-byte-header protocol (PhyriadNet/1). Plus a thin C++17
client (phyriad_netclient) for resource-constrained edges.

Six implementation phases landed in one PR:

  • Phase 0 — Protocol foundation: PN1Frame, PN1Codec, runtime xxHash32 (520 checks)
  • Phase 1 — Reliability: RetransmitQueue + PN1Session state machine + sliding-window dedup (103 checks)
  • Phase 2 — I/O backend: POSIX sockets + WinSock2 unified, with multicast support (43 checks)
  • Phase 3NetGateway: session table, peer validation, anti-spoofing
  • Phase 4NetPheromone<T,N>: latest-wins stigmergy sync over UDP multicast (24 checks)
  • Phase 5 — Client library: framework-side C++23 NetClient (33 checks) + thin C++17 single-header in framework/netclient/ (25 checks)
  • Phase 6 — Hardening: codec/RTT/throughput benches + head-to-head comparisons vs raw UDP / raw TCP / gRPC

Total: 77/77 project tests green; 748 new test checks across 6 net-pillar test binaries.

Honest benchmark results

Codec micro-bench (gcc 15.2 + Release+LTO, 7950X3D — reproducible, CPU-bound):

operation throughput
xxHash32 over 4 KiB 10.4 GB/s
PN1Codec::encode (4 KiB frame) 2.27 GB/s
PN1Codec::decode (4 KiB + checksum verify) 2.28 GB/s

Head-to-head vs alternatives (32-byte payload, 5 000 iters, loopback):

min RTT p50 RTT p99 RTT throughput
PhyriadNet/1 15.4 µs 16.3 µs 33.5 µs 60 k ops/s
Raw UDP echo 13.1 µs 16.0 µs 38.4 µs 60 k ops/s
Raw TCP echo 27.2 µs 29.4 µs 52.6 µs 33 k ops/s
gRPC gated on libs

Windows numbers are clamped by the 15.6 ms multimedia timer — both raw UDP and PN1 hit the same clamp. The meaningful comparison is Δ vs raw UDP at p50 ≈ +200 ns (codec + framing + checksum overhead) and 98.7 % of raw UDP throughput. Linux CI numbers will be uniformly tighter.

DX

  • ~12 LOC server, ~9 LOC client (see examples/quickstart_net/)
  • Zero external dependencies (no protoc, no gRPC, no Boost)
  • Endpoint::any(9742) / Endpoint::localhost(9742) / Endpoint::ipv4(192,168,1,5, 9742) — no htons boilerplate
  • gw.run(std::stop_token) — drive the gateway with one line
  • Side-by-side comparison vs raw UDP / TCP / gRPC: docs/internal/NET_PILLAR_DX_COMPARISON.md (internal)

Documentation

  • README.md — new "Optional — Network dispatch" section with honest benchmarks
  • docs/QUICKSTART_NET.md — 5-minute runnable walkthrough
  • docs/LLM_INTEGRATION_GUIDE.md §5.5 — AI-agent-friendly quickstart + gotchas

Test plan

  • 77/77 project tests pass (ctest --exclude bench from build/)
  • Portability self-check: clone to C:/temp and build from scratch passes
  • Quickstart server+client exchange 3 RPCs successfully on loopback
  • WIP-pillar foreach gate picks up both net and netclient
  • CI matrix green (Linux gcc/clang, Windows MinGW/MSVC, ASan/UBSan/TSan, bench-regression)

🤖 Generated with Claude Code

Swately and others added 6 commits May 19, 2026 05:28
The `last_block` local in hash_long() is reserved for the final
block-stripe finalize step that will land with full SecretGen support.
Currently unused — benign until you combine -Werror with a TU that
instantiates schema_hash<T>() (e.g. via PodMessage.hpp's static_assert
on SampleTick).

The net pillar (next commit) is the first TU to hit that combination.
Mark the local [[maybe_unused]] to preserve the documentation intent
without tripping -Werror.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New framework pillar exposing the Phyriad pool runtime over UDP via a
custom 16-byte-header protocol (PhyriadNet/1). Six implementation phases
landed in this commit:

  Phase 0 — Protocol foundation
    framework/net/include/phyriad/net/PN1Frame.hpp
    framework/net/include/phyriad/net/PN1Codec.hpp
    framework/net/include/phyriad/net/Xxhash32.hpp
    framework/net/src/Net.cpp           (out-of-line compute_checksum)
    framework/net/tests/test_pn1_codec.cpp  (520 checks)

  Phase 1 — Reliability layer
    framework/net/include/phyriad/net/PendingAck.hpp
    framework/net/include/phyriad/net/RetransmitQueue.hpp
    framework/net/include/phyriad/net/PN1Session.hpp
    framework/net/tests/test_pn1_session.cpp  (103 checks)
    Backoff schedule: 50/100/200/400/800 ms (conservative for Windows
    multimedia-timer-clamped runners; LAN deployments can override).

  Phase 2 — I/O backend
    framework/net/include/phyriad/net/IoBackend.hpp
    framework/net/src/IoBackend_posix.cpp  (POSIX sockets + WinSock2)
    framework/net/tests/test_io_backend.cpp  (43 checks)
    Includes multicast support (IP_ADD_MEMBERSHIP / IP_MULTICAST_TTL /
    IP_MULTICAST_LOOP) for the Phase 4 pheromone path.

  Phase 3 — NetGateway (server side)
    framework/net/include/phyriad/net/NetGateway.hpp
    Session table is heap-allocated transparently at construction so
    declaring a NetGateway on the stack doesn't blow Windows' default
    1 MB stack (default config: 64 sessions × 16 frame caches × 4 KiB).
    Peer-validates every non-SESSION_INIT frame against recorded src
    endpoint to drop spoofed / stale packets cleanly.

  Phase 4 — NetPheromone (multicast stigmergy sync)
    framework/net/include/phyriad/net/NetPheromone.hpp
    framework/net/tests/test_net_pheromone.cpp  (24 checks)
    Latest-wins T-up-to-8-bytes slot array, UDP-multicast-synced.

  Phase 5 — Client library (two variants)
    framework/net/include/phyriad/net/NetClient.hpp        (C++23, full features)
    framework/netclient/include/phyriad/netclient/NetClient.hpp
                                                          (C++17, zero
                                                           framework deps,
                                                           single header)
    framework/net/tests/test_net_e2e.cpp           (33 checks)
    framework/netclient/tests/test_netclient_thin.cpp (25 checks)

  Phase 6 — Hardening + benchmarks
    bench/bench_pn1_codec.cpp        — xxHash32 10.4 GB/s, encode/decode 2.27 GB/s
    bench/bench_net_rtt.cpp          — loopback RTT distribution
    bench/bench_net_throughput.cpp   — sustained tasks/s
    bench/comparisons/bench_pn1_vs_raw_udp.cpp   — 98.7% of raw-UDP throughput
    bench/comparisons/bench_pn1_vs_tcp_echo.cpp  — 1.84x TCP throughput
    bench/comparisons/bench_pn1_vs_grpc.cpp      — gated on libgrpc++-dev
    docs/framework/PERF_BASELINE.json updated with net_metrics_reference

Ergonomic helpers (DX-driven, see docs/internal/NET_PILLAR_DX_COMPARISON.md):

  Endpoint::any(port_le)
  Endpoint::localhost(port_le)
  Endpoint::ipv4(a,b,c,d, port_le)
  Endpoint::port()  // host-byte-order readback

  NetGateway::run(std::stop_token)
  NetGateway::run_blocking()

WIP gate: root CMakeLists.txt foreach already includes `net` and now
also `netclient` — the gates skip cleanly when the directories aren't
present (verified via portability self-check by cloning to C:/temp).

Documentation:
  README.md            — new "Optional — Network dispatch" section with
                         honest benchmark results (codec micro-bench
                         numbers are CPU-bound + stable; loopback RTT
                         on Windows is timer-clamped; Δ vs alternatives
                         is the meaningful comparison)
  docs/QUICKSTART_NET.md       — 5-minute runnable walkthrough
  docs/LLM_INTEGRATION_GUIDE.md — §5.5 NetGateway+NetClient quickstart
                                   for AI agents; net-pillar gotchas in §6
  examples/quickstart_net/{server,client}.cpp  — 18+9 LOC runnable demo

Total: 77/77 project tests green; 748 new test checks across 6 net-pillar
test binaries; full bench suite compiles + runs on Windows MinGW.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit landed 22 raw std::memory_order_* uses across the
net pillar + bench + examples. The official scripts/lint_hal.sh enforces
that these be replaced with the hal:: wrappers everywhere outside
framework/hal/. The lint was silently passing on my local box due to
a grep -P locale bug (Git-Bash without UTF-8 locale returns exit 2,
suppressed by `2>/dev/null`); on Linux CI it would have failed loud.

Categorised fixes:
  • Thread stop flags in tests, benches, and examples → ctrl_*_acquire /
    ctrl_*_release (signal semantics — driver-thread shutdown handshake).
  • NetPheromone slot store/load → stat_*_relaxed (slot values do NOT
    synchronise with any sibling payload — same intentional relaxed
    semantics as the in-process Pheromone<T,N>).
  • IoBackend_posix WSAStartup refcount → kept raw acq_rel with the
    documented `// HAL: acq_rel refcount …` justification on the same
    line (per the documented escape in MemoryOrder.hpp; the refcount
    needs acq_rel to publish WSAStartup completion to peers and to
    observe all prior socket use before WSACleanup).

Verification:
  LC_ALL=C.UTF-8 bash scripts/lint_hal.sh
  → 432 files inspected, 0 violations

77/77 project tests still green after the wrapper rewrite — the HAL
wrappers are zero-overhead inline calls so behaviour is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… maybe-uninit

Three failures detected on PR #8 CI matrix; all are warning-as-error
escalations that only fire under the specific toolchains they fail on:

  • TSan (gcc 13, -fsanitize=thread):
    test_io_backend.cpp:202-203 — `pn::PN1Header hdr; const uint8_t* pp;`
    uninitialized when the lambda body is inlined into the std::function
    invoker. gcc -O1+ + TSan's interceptor instrumentation makes the
    no-throw decode() path that DOES initialize them invisible to the
    flow-sensitive maybe-uninit pass. Fixed with `{}` and `= nullptr`
    explicit initializers (zero-cost — decode() overwrites them).

  • clang-18 + libc++ (-Werror -Wunused-result):
    test_io_backend.cpp:144,175,186 — IoBackend::open() is [[nodiscard]]
    by design (callers should branch on the status), but two test setup
    sites called it for side-effects only. Fixed with `(void)` casts.

  • MSVC /WX (C4127 conditional expression is constant):
    test_pn1_codec.cpp + every other net test — the standard
    `do { … } while (false)` test-macro idiom trips MSVC's strict
    constant-conditional warning. Standard portable fix is the
    GoogleTest-style `while ((void)0,0)` — the comma-expression with a
    void cast suppresses constant-folding while preserving the do-once
    semantics that statement-expansion macros need.

Local verification:
  net pillar:  6/6 tests pass
  project:     77/77 tests pass

These are the precise three checks that were red on PR #8; expecting all
four checks (incl. lint-hal which already turned green on the previous
commit) green after this push.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Round 2 of the CI portability fixes (round 1 was 5190f8f). Two checks
still failed because the earlier fixes covered only the obvious sites:

  • TSan (gcc 13): same maybe-uninitialized pattern as before but in
    test_pn1_codec.cpp instead of test_io_backend.cpp. Six sites of
    `pn::PN1Header hdr; const uint8_t* pp; size_t fsize;` declared
    without explicit init — value-initialize all of them so the
    flow-sensitive pass sees a definite assignment before the lambda
    invocation, no matter what gcc -O1 + TSan instrumentation does to
    the std::function dispatch chain.

  • MSVC /WX (C4127 conditional expression is constant): the
    `while ((void)0,0)` fix from 5190f8f handled the while-side of the
    macro, but MSVC also fires C4127 on the `if (_aa == _bb)` inside
    EXPECT_EQ when BOTH operands are compile-time constants
    (`EXPECT_EQ(sizeof(pn::PN1Header), 16u)` and friends). The standard
    portable fix is a file-level `#pragma warning(disable: 4127)` —
    GoogleTest and Catch2 do the exact same thing for the same reason.
    Applied to all six net+netclient test files.

Local verification: 6/6 net pillar tests pass post-fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI on PR #8 surfaced two issues round-2 didn't catch:

1. TSan data race (gcc 13 -fsanitize=thread):
   ServerSession::active was a plain bool, written by the driver thread
   (handle_session_close / handle_session_init / abort in tick()) and
   read by the user thread (active_session_count() called from main in
   the integration tests). This is a real race, not a false positive —
   the test_disconnect path explicitly reads the flag from main right
   after the driver finishes processing SESSION_CLOSE.

   Fixed by making `active` a std::atomic<bool> with ctrl_store_release
   / ctrl_load_acquire wrappers on every site (9 read/write spots in
   NetGateway.hpp). The store-release on `active = true` also publishes
   the freshly-initialized peer / sess / retx_cache fields to any
   future reader that observes active via ctrl_load_acquire — fixes a
   second latent visibility race in allocate_session().

2. MSVC C4267 (size_t -> uint32_t in test_pn1_codec.cpp:373):
   the misaligned-input test loop used `offset` (size_t) directly as
   the session_id parameter of encode(). Explicit static_cast<uint32_t>.

Local verification: 6/6 net pillar tests pass after the change.
Single-threaded local Windows can't reproduce the TSan race; the fix
is structural and matches the existing HAL discipline for
publisher/subscriber atomic flags (same pattern phyriad_pool uses for
worker.active state).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bench_pn1_codec.cpp:34 used the GCC/Clang idiom
    asm volatile("" :: "r"(x) : "memory")
inside an `[[gnu::noinline]]` template. MSVC has neither feature, so
the windows-msvc-Release CI job failed with C2760 (unexpected
'volatile') + C3878.

Switched to the standard portable trick: write the value to a local
`volatile T sink`. The compiler cannot prove the write is dead, so it
must materialize the value (same observable effect as the asm clobber,
just without the asm). Verified bench results unchanged:

  xxhash32(4 KiB)             10.37 GB/s   (was 10.41 GB/s)
  encode(4 KiB) full path     2.27 GB/s    (same)
  decode(4 KiB) full path     2.30 GB/s    (same)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Swately Swately merged commit ecdee8d into main May 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant