Skip to content

Add WsClient: resilient WebSocket subscriptions with auto-reconnect#51

Merged
koko1123 merged 3 commits intomainfrom
feat/ws-client
May 1, 2026
Merged

Add WsClient: resilient WebSocket subscriptions with auto-reconnect#51
koko1123 merged 3 commits intomainfrom
feat/ws-client

Conversation

@koko1123
Copy link
Copy Markdown
Contributor

@koko1123 koko1123 commented May 1, 2026

Summary

Adds ws_client.WsClient: a resilient WebSocket client that wraps the existing ws_transport.WsTransport with the three features production bots need (issue #35):

  • Transparent reconnect with exponential backoff + jitter. When the socket drops, next() and request() rebuild the transport and re-issue every active eth_subscribe before returning. Subscription handles stay valid across reconnects -- the underlying server-side sub-id is swapped automatically.
  • Multiplexed subscriptions. Multiple Subscription handles share one connection; notifications are dispatched to the right handle. Fixes a latent bug where subscription.Subscription.next() silently dropped notifications destined for other subs on the same transport.
  • Application-layer keepalive. A ping is sent if no frames arrive for ping_interval_ms. If no pong arrives within pong_timeout_ms, the connection is treated as dead and reconnect is triggered.

Single-threaded by design -- all I/O happens synchronously inside next() / request() / subscribe() / unsubscribe(). No background threads, locks, or callbacks. Matches the existing I/O model in eth.zig and avoids the Zig 0.16 mutex-copy pitfalls flagged in CLAUDE.md.

The lower-level WsTransport and connectWithReconnect API stays unchanged.

Notable changes

  • NEW src/ws_client.zig -- WsClient, handle-style Subscription, Opts/Event/Error, reconnect+resubscribe state machine, pending-notification queue.
  • src/ws_transport.zig -- readFrameDeadline + pollReadable (poll-based deadline reads), sendPing, and a frames_received counter so WsClient can detect liveness via control frames.
  • src/subscription.zig -- promoted extractResultString, isSubscriptionNotification, getNotificationResult to pub. Lifted nextBlock/nextLog/nextTxHash bodies into free parseBlockFromNotification / parseLogFromNotification / parseTxHashFromNotification so WsClient reuses them without needing a Subscription instance. Added getSubscriptionId and extractResponseId helpers.
  • README.md -- new "Resilient WebSocket subscriptions" Quick Start example.

Drive-by

tests/integration_tests.zig had 3 pre-existing compile errors on origin/main (callers of parseEther -- which returns ?u256 -- forgot to unwrap). The file was uncompilable. Fixed inline so the new integration tests can run.

Test plan

  • make ci passes (build + zig fmt --check src/ tests/ + 1300+ unit tests)
  • zig build integration-test against a local Anvil: 18/18 tests pass, including:
    • WsClient subscribe newHeads receives a fresh block
    • WsClient multiplexes two subscriptions on one connection
    • WsClient unsubscribe frees handle and removes from registry
  • coderabbit review --prompt-only --type committed --base origin/main: no findings
  • No std.debug.print in library code, no emojis

Out of scope

Closes #35.

Summary by CodeRabbit

  • New Features

    • Resilient WebSocket client with automatic reconnect, backoff + jitter, resubscribe on reconnect, multiplexed subscriptions, queued notifications, ping/pong keepalive, and public client APIs (connect/subscribe/unsubscribe/next/request).
    • New public helpers for parsing and handling subscription notifications.
  • Documentation

    • README updated with examples and guidance for the new WebSocket client and transport.
  • Tests

    • Added WebSocket integration tests and unit tests covering helpers, backoff/jitter, notification dispatch, and defaults.

WsClient wraps WsTransport with three features production bots need:
transparent reconnect with exponential backoff and jitter, multiplexed
subscriptions on a single socket, and application-layer ping keepalive.
Subscription handles stay valid across reconnects -- the underlying
server-side sub-id is swapped automatically on each resubscribe.

- src/ws_client.zig: new WsClient + Subscription (handle-style),
  Opts/Event/Error, single-threaded reconnect+resubscribe state machine,
  pending-notification queue for request/notify multiplexing.
- src/ws_transport.zig: add readFrameDeadline + pollReadable for
  deadline-aware reads, plus sendPing and a frames_received counter so
  WsClient can detect liveness via control frames.
- src/subscription.zig: promote extractResultString,
  isSubscriptionNotification, getNotificationResult to pub; lift
  nextBlock/nextLog/nextTxHash bodies into free
  parseBlockFromNotification / parseLogFromNotification /
  parseTxHashFromNotification so WsClient (and others) can reuse them
  without holding a Subscription. Add getSubscriptionId and
  extractResponseId helpers.
- tests: 8 new unit tests in ws_client.zig (backoff, jitter, params
  clone, dispatch, queue ordering, server_id remap), 5 new tests in
  subscription.zig for the new helpers, 3 new integration tests
  against Anvil (newHeads, multiplexed subs, unsubscribe).
- Drive-by: fix three pre-existing parseEther usages in
  tests/integration_tests.zig that no longer compiled (parseEther
  returns ?u256, callers had unwrapped it inconsistently). The
  integration test file was uncompilable on origin/main.

Closes #35.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eth-zig Ready Ready Preview, Comment May 1, 2026 7:15pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

A new resilient WebSocket client (ws_client.zig) with automatic exponential-backoff reconnect, subscription multiplexing and remapping, event queuing, and keepalive (ping/pong) is added. Supporting changes include deadline-aware reads and ping support in ws_transport.zig, subscription-notification parsing helpers in subscription.zig, README updates, root export, and integration tests.

Changes

Cohort / File(s) Summary
WebSocket Client Core
src/ws_client.zig
New production-ready WsClient implementing connect/deinit, subscribe/unsubscribe, request, next (event queue), automatic reconnect with exponential backoff + jitter, on_reconnect callback, ping/pong keepalive, resubscribe/remap server IDs, and exported helpers computeBackoffMs/applyJitter.
WebSocket Transport Enhancements
src/ws_transport.zig
Adds frames_received, deadline-aware reads (readMessageDeadline, readFrameDeadline, fillReadBufDeadline), pollReadable, sendPing, and a Timeout variant; refactors read helpers to use deadline-aware paths.
Subscription Parsing Utilities
src/subscription.zig
Refactors per-subscription parsing to exported helpers (parseBlockFromNotification, parseLogFromNotification, parseTxHashFromNotification), exposes JSON helpers (getNotificationResult, extractResultString, isSubscriptionNotification), and adds getSubscriptionId, extractResponseId; adds tests for whitespace/string-scanner edge cases.
API Export & Docs
README.md, src/root.zig
Documents ws_client.WsClient with a Zig example and mentions lower-level ws_transport primitives; updates Modules/Features tables and publicly re-exports ws_client from src/root.zig and adds it to test compilation.
Integration Tests
tests/integration_tests.zig
Updates tests to treat parseEther as fallible and adds WebSocket integration tests (connect to ws://127.0.0.1:8545, subscribe to new_heads, verify routing, multiplexing, and unsubscribe cleanup) plus an evm_mine helper.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Client as WsClient
    participant Transport as WsTransport
    participant Server as WebSocket Server

    Note over App,Server: Connect & Subscribe
    App->>Client: connect(url, opts)
    Client->>Transport: create/open connection
    Transport->>Server: TCP + WS upgrade
    Server-->>Transport: upgrade response

    App->>Client: subscribe(params)
    Client->>Transport: sendFrame(eth_subscribe)
    Transport->>Server: send JSON-RPC
    Server-->>Transport: response (subscription_id)
    Client->>Client: store handle & server_id

    Note over App,Server: Event Dispatch
    Server-->>Transport: subscription notification
    Transport->>Client: deliver frame
    Client->>Client: map server_id -> Subscription
    Client-->>App: Event{sub, payload} via next()

    Note over Client,Transport: Keepalive / Reconnect
    Client->>Transport: sendPing() if idle
    Transport->>Server: ping
    alt Pong received
        Server-->>Transport: pong
        Client->>Client: reset timers
    else Pong timeout
        Client->>Client: computeBackoffMs + applyJitter
        Client->>Client: deinit & reconnect loop
        Client->>Transport: new connection
        loop for each active subscription
            Client->>Transport: sendFrame(eth_subscribe original params)
            Server-->>Transport: new subscription_id
            Client->>Client: update server_id mapping
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐇 I hopped a frame, then hopped again,

Backoff counts and jitter reign,
Reconnect, resubscribe, queue aligned,
Multiplexed beats in one thin line,
Tiny rabbit cheers — our WS is kind!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add WsClient: resilient WebSocket subscriptions with auto-reconnect' directly and clearly describes the main change—a new WsClient module providing resilient WebSocket functionality with auto-reconnect.
Linked Issues check ✅ Passed The pull request comprehensively implements all required behaviors from issue #35: auto-reconnect with exponential backoff and configurable delays, re-subscription on reconnect with server-id remapping, connection health monitoring via ping/pong, and thread-based execution on caller thread.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #35 requirements. ws_transport.zig additions (deadline-aware reads, sendPing, frames_received) are foundational for WsClient. subscription.zig refactoring extracts reusable parsing helpers. README and integration tests document and validate the new functionality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ws-client

Review rate limit: 3/5 reviews remaining, refill in 18 minutes and 13 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/subscription.zig (1)

344-405: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the fast-path JSON scanners whitespace-tolerant.

extractResultString(), getSubscriptionId(), and extractResponseId() only recognize minified payloads like "result":"...", "subscription":"...", and "id":42. Valid JSON-RPC responses may include spaces around the colon, and then subscribe(), resubscribeAll(), request matching, and notification dispatch all start failing even though the payload is valid. This is a protocol-compatibility break at the root helper layer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/subscription.zig` around lines 344 - 405, The three fast-path scanners
(extractResultString, getSubscriptionId, extractResponseId) fail on JSON with
spaces around the colon; update each to locate the key (e.g., "\"result\"",
"\"subscription\"", "\"id\""), advance past the key, then skip optional
whitespace, require and skip a ':' character, skip optional whitespace again,
then proceed to parse the value (for extractResultString and getSubscriptionId
expect a starting '"' and find the matching closing '"' while handling bounds,
and for extractResponseId parse consecutive digits starting at the first digit);
keep existing bounds checks and error/null returns, and reference the same
function names (extractResultString, getSubscriptionId, extractResponseId) when
making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ws_client.zig`:
- Around line 256-281: The request() function currently resends the exact same
JSON-RPC payload after beginReconnect(), which can double-execute non-idempotent
methods; modify WsClient.request to avoid transparent replay by default: add a
per-request option (e.g., allow_replay: bool) or check the method name against a
whitelist of safe/idempotent methods before resending, and if replay is not
permitted return a new error (e.g., error.RetryNotAllowed) after
beginReconnect() instead of calling sendOrReconnect(req); update call sites to
opt-in when safe (or pass allow_replay=true for eth_subscribe/read-only RPCs)
and reference WsClient.request, sendOrReconnect, beginReconnect,
readFrameWithKeepalive, and next_id when making these changes.
- Around line 199-224: The unsubscribe function currently frees a Subscription
without removing queued Event entries that contain the raw *Subscription
pointer, causing dangling pointers; update pub fn unsubscribe(self: *WsClient,
sub: *Subscription) to scan and purge any queued events in self.pending (or the
pending queue structure) whose Event.sub == sub before calling
self.freeSubscription(sub), ensuring you remove or drop those Event values
safely (free any owned memory) and only then proceed with orderedRemove on
self.subs and the eth_unsubscribe request so next() cannot return a dangling sub
pointer.

---

Outside diff comments:
In `@src/subscription.zig`:
- Around line 344-405: The three fast-path scanners (extractResultString,
getSubscriptionId, extractResponseId) fail on JSON with spaces around the colon;
update each to locate the key (e.g., "\"result\"", "\"subscription\"",
"\"id\""), advance past the key, then skip optional whitespace, require and skip
a ':' character, skip optional whitespace again, then proceed to parse the value
(for extractResultString and getSubscriptionId expect a starting '"' and find
the matching closing '"' while handling bounds, and for extractResponseId parse
consecutive digits starting at the first digit); keep existing bounds checks and
error/null returns, and reference the same function names (extractResultString,
getSubscriptionId, extractResponseId) when making the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0a1980a-d328-4385-9493-98e94654bf06

📥 Commits

Reviewing files that changed from the base of the PR and between d52c9f5 and 71a7137.

📒 Files selected for processing (6)
  • README.md
  • src/root.zig
  • src/subscription.zig
  • src/ws_client.zig
  • src/ws_transport.zig
  • tests/integration_tests.zig

Comment thread src/ws_client.zig
Comment thread src/ws_client.zig Outdated
koko1123 added 2 commits May 1, 2026 15:05
Three independent fixes:

1. Whitespace-tolerant JSON scanners. extractResultString,
   getSubscriptionId, and extractResponseId previously matched only
   minified payloads (e.g. "result":"..."). JSON-RPC permits whitespace
   around the colon, and some proxies and test harnesses produce it.
   Factor a shared findKeyValueStart helper that walks past optional
   ws + ':' + optional ws after locating the key, then resumes value
   parsing. Also covers the case where the key text appears inside a
   value -- the helper now keeps searching past false matches.

2. Stop auto-replaying WsClient.request on disconnect. The previous
   code re-sent the same JSON-RPC payload after a mid-flight reconnect,
   which would double-execute non-idempotent methods (e.g. eth_send-
   RawTransaction). Public request() now returns
   error.RequestInterrupted on post-send disconnect; an internal
   requestReplay() retains the old replay-safe behavior and is used by
   subscribe()/unsubscribe(), both idempotent at the protocol level.
   Pre-send retries (the request never reached the wire) remain
   transparent in both paths.

3. Purge pending queue on unsubscribe. Previously, unsubscribe() freed
   the Subscription while leaving Events in the pending FIFO that
   referenced it -- a subsequent next() would return a dangling
   pointer. Add dropPending(*Subscription) to remove and free those
   queued events before tearing down the handle.

Tests:
- 4 new tests in subscription.zig (whitespace cases for all three
  scanners + a key-found-inside-a-value case).
- 1 new test in ws_client.zig for dropPending.
- All 1300+ unit tests pass.
- 18/18 integration tests against Anvil pass.
CodeRabbit's second-pass review flagged that findKeyValueStart could be
fooled by an error message whose value literally contains text like
\"result\":\"fake\". The previous fix already handled the case where
the scanner found a misplaced key followed by something other than a
colon, but it could still be fooled by a fake key-value pair embedded
inside an open string.

Add isInsideString(json, idx): walks json char-by-char, tracking string
state with escape handling, and returns true if idx is inside an open
string. findKeyValueStart now consults it on each candidate match and
skips any candidate whose opening quote is inside a string.

Two new tests:
- extractResultString skips a fake \"result\":\"fake\" embedded in an
  error message and finds the real result key after it.
- isInsideString basic correctness with plain strings, escaped quotes,
  and stray backslashes.

The walk is O(idx) per candidate match, which is fine for our typical
< 4KB JSON-RPC payloads -- and only runs when there is an actual false
match earlier in the document.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/subscription.zig (1)

182-199: 💤 Low value

Double-parsing overhead in parseBlockFromNotification is acceptable but worth noting.

The current flow (parse JSON → stringify result → wrap in {"result":...} → parse again in parseBlockHeader) allocates and parses the result twice. This reuses existing infrastructure cleanly, so it's a reasonable tradeoff for now. If block notification throughput becomes a bottleneck, consider extending parseBlockHeader to accept a std.json.Value directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/subscription.zig` around lines 182 - 199, parseBlockFromNotification
currently parses JSON to std.json.Value then stringifies/wraps and reparses via
provider_mod.parseBlockHeader, causing double parsing and allocations; to fix,
add an overload or new function in provider_mod named parseBlockHeaderFromValue
(or modify parseBlockHeader to accept a std.json.Value) that takes allocator and
the parsed std.json.Value (the result_val) and parses the block header directly,
then change parseBlockFromNotification to call that new function with result_val
and remove the stringifyAlloc/allocPrint/allocator.free steps and
wrapped/result_json variables; ensure the new provider_mod function uses the
same ownership contract for allocated fields (extra_data) as the existing
parseBlockHeader.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/subscription.zig`:
- Around line 182-199: parseBlockFromNotification currently parses JSON to
std.json.Value then stringifies/wraps and reparses via
provider_mod.parseBlockHeader, causing double parsing and allocations; to fix,
add an overload or new function in provider_mod named parseBlockHeaderFromValue
(or modify parseBlockHeader to accept a std.json.Value) that takes allocator and
the parsed std.json.Value (the result_val) and parses the block header directly,
then change parseBlockFromNotification to call that new function with result_val
and remove the stringifyAlloc/allocPrint/allocator.free steps and
wrapped/result_json variables; ensure the new provider_mod function uses the
same ownership contract for allocated fields (extra_data) as the existing
parseBlockHeader.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 052dcfca-9dfd-4484-b284-d007282d2d21

📥 Commits

Reviewing files that changed from the base of the PR and between 71a7137 and 67c03a1.

📒 Files selected for processing (2)
  • src/subscription.zig
  • src/ws_client.zig
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/ws_client.zig

@koko1123 koko1123 merged commit 6cf28b5 into main May 1, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebSocket auto-reconnect and resilience for production bots

1 participant