Skip to content

Capability Handshake & Proper Ping#38

Merged
maxholman merged 6 commits intoblock65:mainfrom
maxholman:feat/capability-handshake
Feb 28, 2026
Merged

Capability Handshake & Proper Ping#38
maxholman merged 6 commits intoblock65:mainfrom
maxholman:feat/capability-handshake

Conversation

@maxholman
Copy link
Copy Markdown
Contributor

Capability Handshake & Proper Ping

Scope

Wire protocol foundation: replace one-way ExitNodeHello with bidirectional
Capability exchange, wire up proper ping/pong with latency tracking, remove
dead capability code, rename bridge.rs.

Touches: crates/wire/proto/ (data, control, management protos),
crates/core/src/transport/protocol.rs (renamed from bridge.rs),
crates/core/src/control/peers.rs, crates/core/src/node_api.rs,
crates/daemon/src/mode/{entry,exit,relay}.rs,
crates/core/src/client/{quic,ws}/mod.rs,
crates/core/src/server/{quic,ws}/mod.rs.

Out of scope

  • Auto-negotiation / role derivation from capabilities (Phase 13c)
  • Indeterminate mode (Phase 13b)
  • Route announcement and TUN routing table updates (Phase 13e)
  • Hints (--prefer, --exclude-role, --fixed-role) (Phase 13d)
  • Mode transitions / RoleTransition messages (Phase 13g)
  • Security posture bundling (--psk suppressing auto-negotiate) (Phase 13f)
  • routes and hint fields are defined in the proto but not acted on

Why

ExitNodeHello is one-directional (connector → acceptor only), leaks the PSK
in plaintext inside the TLS tunnel, and carries no capability information. Every
subsequent auto-negotiation phase depends on both sides having a full picture of
each other's capabilities before tunnel traffic flows. This phase establishes
that foundation and also fixes the PSK authentication to use HMAC channel
binding instead of plaintext.

Notes

  • Design spec: docs/tasks/13a-capability-handshake.md — items 1-6 are the
    implementation checklist. Tests section is mandatory.
  • PSK proof: HMAC-SHA256 over TLS export_keying_material() channel binding.
    Requires threading TLS session handle from quinn/rustls into handshake path.
    After validation, zeroize the plaintext PSK from memory.
  • Concurrent exchange: Both sides send Capability immediately after
    transport connects and wait to receive. No ordering — bidi control stream
    supports this already.
  • Server accept paths: QUIC and WS servers currently read the first
    Capability directly in accept() before spawning run_control_loop,
    bypassing run_control_stream_acceptor. The new bidirectional exchange needs
    to unify this — both sides should go through the same exchange path.
  • Ping latency: Periodic ping timer already fires in run_control_loop.
    update_latency() and PeerInfo.latency_ms exist in the registry. Missing
    link: pong receipt → latency calculation → registry.update_latency().
  • Rename: bridge.rsprotocol.rs — done.

Progress (impl agent)

Branch: feat/capability-handshake (from main at c712ddc)

Build status: just check passes — fmt, clippy (slim + default), tests
(172 pass), musl cross-build, VM smoke/resilience tests all green. No warnings.

Completed (prior sessions)

  1. Proto changes (items 1-2)ExitNodeHelloCapability in all protos.
  2. Rename bridge.rs → protocol.rs (item 6) — done.
  3. Dead code removal (item 4)NodeCapability, set_relay_capability(),
    capability_to_proto() all removed. Registry uses tun_capable/listening/
    connecting fields directly.
  4. Client Capability sending — QUIC and WS clients send Capability via
    control stream immediately after connecting.
  5. Server Capability reading — QUIC and WS servers read Capability as
    first control message with 10s timeout. AcceptResult carries
    peer_capability.
  6. Daemon mode updatesentry.rs validates PSK (temporary raw bytes),
    exit.rs reads peer_capability().
  7. Bidirectional capability exchange (item 3) — DONE
    • Added local_capability: Option<Capability> to ServerOptions.
    • QUIC and WS server accept() now writes server's Capability to control
      stream after reading client's Capability (before spawning control loop).
    • All daemon modes (entry, exit listen, exit relay, relay) populate
      local_capability in their ServerOptions construction.
    • run_control_stream_initiator now accepts capability_tx: Option<oneshot::Sender<Capability>> and passes it to run_control_loop.
    • QUIC and WS clients create a oneshot and pass the sender to the control
      loop; the receiver is stored in ConnectResult.
    • ConnectResult gained peer_capability_rx: Option<oneshot::Receiver<Capability>> with take_peer_capability_rx().
    • exit.rs: renamed _node_namenode_name (now used for
      local_capability).
  8. Ping/pong latency tracking (item 5) — DONE
    • run_control_loop signature changed: pong_tx: Option<&mpsc::Sender<Pong>>
      latency_tx: Option<&mpsc::Sender<f64>>.
    • Pong handler now computes RTT as now_ms - pong.timestamp_ms and sends the
      f64 via latency_tx (instead of forwarding raw Pong).
    • QUIC and WS server accept() creates latency_tx/latency_rx mpsc channel;
      passes latency_tx to control loop, latency_rx via AcceptResult.
    • entry.rs handle_connection takes latency_rx parameter and uses it in
      the select loop: periodic latency updates registry, one-shot REPL ping
      injects Ping via control_tx and stashes oneshot in pending_ping.

Completed (previous session)

  1. Split hmac_proof.rs → hmac.rs + psk.rs — DONE
  2. Rename relay_capability functions — DONE
  3. Wire PSK proof into capability construction (items 2-3) — DONE
  4. Clippy suppressions justified — DONE (now replaced by real fixes)
  5. just check passes — DONE

Completed (this session — review fixes)

  1. CapabilityHandshake rename — DONE

    • Proto: message Capabilitymessage Handshake in data.proto.
    • Proto: capability = 1handshake = 1 in control.proto ControlMessage.
    • All Rust code: types, field names, function names, doc comments, log messages.
    • peer_capabilitypeer_handshake, local_capabilitylocal_handshake,
      with_capabilitywith_handshake, update_capabilityupdate_handshake,
      serialize_capability_fieldsserialize_handshake_fields, etc.
  2. RoleHint proto shape fixed — DONE

    • Flat enum RoleHintmessage RoleHint { HintLevel level = 1; DataNodeRole target = 2; }
    • Added enum HintLevel (PREFER/EXCLUDE/FIXED) and enum DataNodeRole in data.proto.
    • Updated psk.rs serialization and test to handle RoleHint message fields.
  3. Channel proliferation resolved — DONE

    • Created ControlChannels struct in protocol.rs grouping outgoing_rx,
      handshake_tx, latency_tx, control_response_tx.
    • run_control_loop, run_control_stream_initiator, run_control_stream_acceptor
      signatures simplified from 7 args to 4.
    • All callers (QUIC/WS server and client) updated.
  4. handle_connection refactored — DONE

    • ConnectionParams<T> struct groups 8 arguments.
    • validate_handshake() extracted: PSK proof + identity validation.
    • spawn_data_tasks() extracted: incoming/outgoing data task spawning.
    • run_connection_loop() extracted: manager + ping/latency select loop.
    • #[allow(clippy::too_many_arguments, clippy::too_many_lines)] removed.
  5. PSK zeroize — DONE

    • Added explicit zeroize = "1" dep to wallhack-core, wallhackd, wallhack-cli.
    • Option<String>Option<Zeroizing<String>> across entire PSK pipeline:
      GlobalConfig, SecurityParams, ServerConfig, ClientConfig,
      QuicClient, QuicServer, WsServer, ConnectionParams.
    • PSK bytes are zeroized on drop (when the owning struct goes out of scope).
  6. Mandatory tests — DONE

    • test_handshake_exchange: concurrent bidirectional handshake via MockBiStream.
    • test_malformed_handshake: non-Handshake first message → handshake_tx unfulfilled.
    • test_ping_latency: Ping auto-reply verified; Pong with past timestamp →
      latency computed and forwarded via latency_tx.
    • test_periodic_ping: timer fires at configured interval (start_paused).
  7. just check passes — All quality gates green (fmt, clippy slim+default,
    tests, musl cross-build, VM smoke/resilience, website build).

Completed (cleanup session)

  1. Renamed cap_ variables to handshake — DONE

    • cap_tx/cap_rxhandshake_tx/handshake_rx in QUIC and WS clients.
    • cap_resulthandshake_result in QUIC and WS servers.
    • local_cap/let mut caplocal/let mut handshake in server accept paths.
    • Local cap variables building Handshake structs → handshake in both clients.
  2. Capability vs handshake naming audit — DONE

    • update_handshake()update_capabilities() (peers.rs + entry.rs call site).
    • cap parameter in serialize_handshake_fields, compute_proof, verify_proof
      handshake.
    • All test variables cap/cap1/cap2handshake/handshake1/handshake2.
    • Test different_capabilities_produce_different_proofs
      different_handshakes_produce_different_proofs.
  3. DataNodeRole dedup — DONE

    • Moved NodeRole from control.proto to data.proto. Removed DataNodeRole.
    • Renamed ROLE_UNKNOWNROLE_INDETERMINATE (Phase 13b terminology).
    • RoleHint.target now uses NodeRole directly (no duplicate enum).
    • control.proto references wallhack.data.NodeRole via existing import.
    • Rust: types.rs and handler.rs now import from wallhack_wire::data::NodeRole.
  4. Capabilities struct grouping — DONE

    • Added message Capabilities { tun_capable, listening, connecting } to data.proto.
    • Handshake now nests Capabilities capabilities = 1 (field numbers renumbered).
    • Proto comments fixed: "Can accept incoming connections" / "Can initiate
      outgoing connections" (not "Started with --listen/--connect").
    • Internal PeerInfo (peers.rs) and node_api::PeerInfo/NodeStatus use
      Capabilities instead of 3 separate bools.
    • update_capabilities() takes &Capabilities instead of 3 bools.
    • serialize_handshake_fields() replaced with encode_to_vec() — uses protobuf's
      deterministic encoding instead of manual byte packing. psk_proof zeroed before
      encoding. Removed ~40 lines of manual serialization.
    • Management proto (management.proto) kept flat — it's the CLI-facing IPC
      boundary, flat fields are appropriate there.

Remaining work

None — all review items addressed. Ready for re-review.


Review Notes

Addressed

  • run_quic_relay_capability — renamed to run_quic_exit_both (and
    siblings). Uses ConnectivitySpec::Both terminology.
  • "Dual mode" — replaced with "both" to match ConnectivitySpec::Both.
  • binding in hmac tests — renamed to context throughout hmac.rs
    (generic module uses generic names). Added known-output test that verifies
    actual HMAC-SHA256 bytes.
  • clippy suppressions — all now have justification comments.

Outstanding (resolved)

  • CapabilityHandshake rename — the message carries identity,
    capabilities, authentication, and topology metadata. "Capability" described
    one of eight fields. Renamed to Handshake in proto, spec docs (13a-13g),
    and parent design doc. Impl agent must rename in Rust code to match.
  • Channel proliferation — addressed by review finding chore: release main #4 (group into
    struct).
  • channel_binding terminology — standard RFC 9266 term. No change.
  • "Exit (both):" — correct for this phase. Parent spec says
    ConnectivitySpec::Both = relay, but role derivation is Phase 13c scope.
    Leave as-is for now.

Review (Gemini) — All Fixed

  1. Mandatory tests — ✅ Added: test_handshake_exchange,
    test_malformed_handshake, test_ping_latency, test_periodic_ping.
  2. PSK zeroize — ✅ Zeroizing<String> across entire config pipeline.
  3. RoleHint proto shape — ✅ message RoleHint { HintLevel level; NodeRole target; }.
  4. Channel proliferation — ✅ ControlChannels struct, 7 args → 4.
  5. Clippy suppressions — ✅ handle_connection refactored into sub-functions.
  6. CapabilityHandshake rename — ✅ Proto + all Rust code.

Questions for review agent (all resolved)

  • local_handshake: Option<Handshake> naming — correct. Stores the full
    Handshake message (name, version, psk_proof, routes, hint, plus capability
    flags). "Handshake" accurately describes the message type.
  • cap named variables — all renamed to handshake (full name, no abbreviations).
  • Capability vs handshake naming — update_handshake()update_capabilities().
    serialize_handshake_fields stays (serializes the full Handshake, not just capabilities).
  • DataNodeRole duplication — removed. NodeRole moved to data.proto,
    ROLE_UNKNOWNROLE_INDETERMINATE, control.proto uses via import.

maxholman and others added 6 commits February 28, 2026 15:39
Phase 13 design documents covering capability handshake (13a),
indeterminate mode (13b), auto-negotiation (13c), hints (13d),
route announcement (13e), security posture (13f), and mode
transitions (13g).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ExitNodeHello → Handshake message with nested Capabilities struct
- Add Ping/Pong messages for latency measurement
- Add RoleHint and HintLevel for future role negotiation
- Move NodeRole to data.proto (canonical location), remove DataNodeRole
- Rename ROLE_UNKNOWN → ROLE_INDETERMINATE (phase 13b terminology)
- control.proto references wallhack.data.NodeRole via import
- management.proto: add tun_capable/listening/connecting to PeerInfo
  and StatusResponse, reserve removed capability field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- hmac.rs: generic HMAC-SHA256 compute/verify using ring
- psk.rs: PSK proof via TLS channel binding (RFC 9266 tls-exporter),
  serialization uses protobuf encode_to_vec for determinism
- Rename bridge.rs → protocol.rs, add ControlChannels struct,
  bidirectional handshake support in control loop, ping/pong
  latency tracking, mandatory protocol tests
- types.rs: NodeRole import moved from control to data module,
  ROLE_UNKNOWN → ROLE_INDETERMINATE

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- QUIC and WS clients send Handshake with PSK proof on connect
- QUIC and WS servers read peer Handshake, send own back
- AcceptResult carries peer_handshake, latency_rx, channel_binding
- ConnectResult carries peer handshake via oneshot
- PeerInfo and NodeStatus use Capabilities struct instead of 3 bools
- update_capabilities() takes &Capabilities
- Handler and IPC layer updated for Capabilities grouping
- Add zeroize dep for PSK memory safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- All daemon modes (entry, exit, relay) populate local_handshake
  with Capabilities struct in ServerOptions
- Entry mode validates PSK proof and updates peer capabilities
- handle_connection refactored: ConnectionParams struct,
  validate_handshake(), spawn_data_tasks(), run_connection_loop()
- Zeroizing<String> for PSK across daemon config pipeline
- CLI and API handlers updated for Capabilities field access

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move `serialize_handshake_fields` free function into
  `Handshake::serialize_for_proof()` method (better API locality).
- Bump binary size thresholds to ~1% above current measured sizes
  after handshake/PSK/proto additions (+59KB).
- Fix transport-modes copy (WebSocket RTT description).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@maxholman maxholman merged commit 9db613a into block65:main Feb 28, 2026
4 checks passed
maxholman added a commit that referenced this pull request May 6, 2026
Sweep of website/ deps to latest within ranges, plus a vite downgrade
from 8 -> 7 to match astro's transitive vite (7.3.2) and avoid a
rolldown regression with @tailwindcss/vite 4.2.4.

Closes alerts #28 #29 #30 #31 #33 #34 #35 #36 #37 #38 #39 #40 #44 #48
covering vite, picomatch, postcss, yaml, astro, smol-toml.

- vite ^8.0.1 -> ^7.3.2 (drops the now-redundant vite 8 lineage; astro
  pulls 7.3.2 transitively, which is the patched version)
- astro 6.0.6 -> 6.2.2 (#44)
- @tailwindcss/vite 4.2.2 -> 4.2.4
- smol-toml: lockfile bump to 1.6.1 (#28)
- postcss: lockfile bump to 8.5.14 (#48)
- picomatch: lockfile bumps to 2.3.2 + 4.0.4 (#29 #30 #39 #40)
- yaml is now omitted entirely (it was an optional vite peer)

Verified: pnpm build succeeds; no @tailwindcss/vite peer-dep warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant