Skip to content

feat: typed subscriptions (#43), retry middleware (#41), Makefile, security fixes#45

Merged
koko1123 merged 3 commits intomainfrom
koko/medium-fixes
Mar 12, 2026
Merged

feat: typed subscriptions (#43), retry middleware (#41), Makefile, security fixes#45
koko1123 merged 3 commits intomainfrom
koko/medium-fixes

Conversation

@koko1123
Copy link
Contributor

@koko1123 koko1123 commented Mar 12, 2026

Summary

  • Typed subscription event parsing (nextBlock, nextLog) #43 — Typed subscription helpers: nextBlock(), nextLog(), nextTxHash() on Subscription eliminate raw JSON parsing boilerplate from bot code. Each method reads the next eth_subscription notification and returns a fully typed BlockHeader, Log, or [32]u8 hash.
  • Provider middleware: retry with exponential backoff #41RetryingProvider middleware: wraps any Provider with configurable exponential backoff, jitter, max attempts, and retryable error policy (connection errors vs. all RPC errors). Forwards all 14 Provider methods. Exported from root as eth.RetryingProvider / eth.RetryOpts.
  • Makefile: make test, make fmt, make lint, make ci, make integration-test, make bench — mirrors CI jobs so contributors can verify locally before pushing.
  • Security — multicall.zig: replace plain addition with std.math.add checked arithmetic in all 7 overflow-prone boundary checks in decodeAggregate3Results. Prevents wraparound bypass on malicious input in ReleaseFast builds.
  • Security — abi_encode.zig: replace std.debug.assert with error.TooManyValues propagation through encodeValuesIntoNoAllocencodeDynamicValueInto recursive chain. Nested structures > 32 elements now return a proper error in all build modes.
  • Bug — subscription.zig: fix allocator mismatch in nextBlock and nextLograw was freed with the caller's allocator instead of self.allocator (UB if they differ).
  • .gitignore: exclude docs/plans/ (local planning files).

Test plan

  • make ci passes (build + fmt + unit tests)
  • make integration-test passes against a local Anvil node
  • nextBlock / nextLog / nextTxHash covered by unit tests in src/subscription.zig
  • RetryingProvider covered by unit tests in src/retry_provider.zig (retry exhaustion, non-retryable fast-fail, exponential backoff verification)
  • multicall.zig overflow paths return error.InvalidAbiData rather than wrapping
  • abi_encode.zig nested tuple > 32 elements returns error.TooManyValues

Summary by CodeRabbit

  • New Features

    • Automatic retry capability with configurable exponential backoff and jitter for provider operations
    • Subscription helpers to receive next block, next log, and next pending-transaction hash
    • Richer public parsing APIs for logs and block headers
  • Bug Fixes

    • Improved error propagation to avoid panics during encoding/decoding
    • Strengthened overflow-safe boundary checks for decoding
  • Chores

    • Added build orchestration and updated ignore rules
  • Tests

    • Added unit tests and a small test assertion to ensure retry middleware is loaded

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eth-zig Ready Ready Preview, Comment Mar 12, 2026 0:26am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

Adds a RetryingProvider middleware with configurable retries/backoff, exposes provider parsing helpers, adds typed Subscription helpers, tightens ABI encoding error propagation, hardens multicall bounds checks, adds a Makefile, and ignores docs/plans/ in .gitignore.

Changes

Cohort / File(s) Summary
Build & Ignore
\.gitignore, Makefile
Adds docs/plans/ to .gitignore and a new Makefile with build/test/fmt/lint/ci/bench/clean targets and a configurable ZIG variable.
Retrying Provider
src/retry_provider.zig, src/root.zig
Adds RetryingProvider, RetryOpts, RetryableErrors, RetryState, isRetryable, and per-call wrappers that implement exponential backoff + jitter; re-exports types in src/root.zig; includes unit tests.
Subscription API
src/subscription.zig
Adds nextBlock, nextLog, nextTxHash helpers, new SubscriptionError variants (InvalidNotification, NullResult), getNotificationResult, and imports required parsing modules.
Provider parsing
src/provider.zig
Promotes parseSingleLog and parseBlockHeader to pub fn and expands parseSingleLog to populate full receipt_mod.Log with improved memory/error handling.
ABI encoding
src/abi_encode.zig
Changes dynamic encoding functions to return EncodeError!void, replaces hard asserts with runtime TooManyValues checks, and propagates errors via try through dynamic/write paths.
Multicall decoding
src/multicall.zig
Replaces naive bounds checks with overflow-protected std.math.add end-pointer calculations and explicit bounds validations, returning InvalidAbiData on violations.
Tests
tests/unit_tests.zig
Adds assertion to ensure eth.retry_provider middleware is loaded in unit tests.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant RetryingProvider
    participant RetryState
    participant InnerProvider
    participant Network

    Client->>RetryingProvider: getBalance(address)
    activate RetryingProvider

    RetryingProvider->>InnerProvider: getBalance(address)
    activate InnerProvider
    InnerProvider->>Network: RPC request
    Network-->>InnerProvider: network error
    deactivate InnerProvider

    RetryingProvider->>RetryState: init(self)
    activate RetryState
    RetryState->>RetryState: compute backoff & jitter
    RetryState->>RetryState: sleep
    RetryState-->>RetryingProvider: allow retry?
    deactivate RetryState

    RetryingProvider->>InnerProvider: getBalance(address) (retry)
    activate InnerProvider
    InnerProvider->>Network: RPC request
    Network-->>InnerProvider: success
    InnerProvider-->>RetryingProvider: result
    deactivate InnerProvider

    RetryingProvider-->>Client: result
    deactivate RetryingProvider
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hop through retries with jittered cheer,
Backoff ticks down, then hope draws near,
Logs and blocks I fetch with nimble paws,
Bounds are checked, and encoders pause,
Build runs, tests sing — a rabbit's small applause!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively summarizes the four major changes: typed subscriptions, retry middleware, Makefile, and security fixes—matching all significant components in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch koko/medium-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

@koko1123
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

… issues

- subscription.zig: add nextBlock(), nextLog(), nextTxHash() typed helpers
  that parse eth_subscription notifications directly into BlockHeader/Log/[32]u8,
  removing boilerplate from bot code (closes #43)
- retry_provider.zig: new RetryingProvider wrapper with exponential backoff,
  configurable max attempts, jitter, and retryable error policy (closes #41)
- provider.zig: expose parseBlockHeader and parseSingleLog as pub for reuse
  by subscription typed helpers
- root.zig: export RetryingProvider and RetryOpts at top level
- Makefile: add build/test/fmt/lint/ci/integration-test/bench targets so
  contributors can run the full CI check locally before pushing
- multicall.zig: replace plain overflow-prone additions with std.math.add
  checked arithmetic in decodeAggregate3Results boundary checks
- abi_encode.zig: replace std.debug.assert with error.TooManyValues propagation
  through encodeValuesIntoNoAlloc and encodeDynamicValueInto recursive chain
- .gitignore: exclude docs/plans/ (local planning files)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/retry_provider.zig (1)

141-146: Consider documenting idempotency concerns for sendRawTransaction.

Retrying sendRawTransaction is generally safe because Ethereum transactions are idempotent—the same signed transaction can only be included once due to nonce constraints. However, if the first attempt actually succeeded but the response was lost due to a network error, the retry will fail with a "nonce already used" error (which is not in the retryable list, so it will propagate correctly).

This is the expected behavior, but it may be worth documenting in the API or RetryOpts that callers should be prepared to handle this case by checking the transaction status separately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/retry_provider.zig` around lines 141 - 146, Document the idempotency and
retry semantics for sendRawTransaction on RetryingProvider: explain that
sendRawTransaction retries are usually safe because signed Ethereum transactions
are nonce-protected, but a successful first submission with a lost response can
cause a subsequent retry to fail with a non-retryable "nonce already used"
error; add this note to the sendRawTransaction API docs and to the RetryOpts
documentation (or README) so callers know to verify transaction status
separately when they see nonce-related errors; reference sendRawTransaction,
RetryingProvider, RetryState and RetryOpts in the doc text so readers can find
the relevant code paths.
src/abi_encode.zig (1)

305-306: Remaining std.debug.assert in writeValuesDirect.

The assertion at line 305 is retained in writeValuesDirect. This is safe because this function is only called from encodeValues and encodeFunctionCall, which already validate values.len <= max_tuple_values before calling calcEncodedSize and writeValuesDirect. The assertion here serves as an internal invariant check rather than input validation.

If you prefer full consistency with the error-returning pattern for defense-in-depth, this could be converted too—but it would require changing writeValuesDirect to return an error type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/abi_encode.zig` around lines 305 - 306, The remaining std.debug.assert in
writeValuesDirect is an internal invariant check but looks inconsistent with the
rest of the API which returns errors; to fix, either remove/replace it with an
explicit error-return path (update writeValuesDirect's signature to return an
error and propagate that from calcEncodedSize/encodeValues/encodeFunctionCall)
or, simpler, keep the assertion but add a clear comment above writeValuesDirect
stating that values.len is validated by callers (encodeValues and
encodeFunctionCall) and that the assert is only an internal invariant; choose
the approach that matches the project's error-handling style and update caller
signatures if you pick the error-return option.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/abi_encode.zig`:
- Around line 305-306: The remaining std.debug.assert in writeValuesDirect is an
internal invariant check but looks inconsistent with the rest of the API which
returns errors; to fix, either remove/replace it with an explicit error-return
path (update writeValuesDirect's signature to return an error and propagate that
from calcEncodedSize/encodeValues/encodeFunctionCall) or, simpler, keep the
assertion but add a clear comment above writeValuesDirect stating that
values.len is validated by callers (encodeValues and encodeFunctionCall) and
that the assert is only an internal invariant; choose the approach that matches
the project's error-handling style and update caller signatures if you pick the
error-return option.

In `@src/retry_provider.zig`:
- Around line 141-146: Document the idempotency and retry semantics for
sendRawTransaction on RetryingProvider: explain that sendRawTransaction retries
are usually safe because signed Ethereum transactions are nonce-protected, but a
successful first submission with a lost response can cause a subsequent retry to
fail with a non-retryable "nonce already used" error; add this note to the
sendRawTransaction API docs and to the RetryOpts documentation (or README) so
callers know to verify transaction status separately when they see nonce-related
errors; reference sendRawTransaction, RetryingProvider, RetryState and RetryOpts
in the doc text so readers can find the relevant code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0344600-dd90-4597-9334-82638fe094b6

📥 Commits

Reviewing files that changed from the base of the PR and between c8818cc and a6a643b.

📒 Files selected for processing (8)
  • .gitignore
  • Makefile
  • src/abi_encode.zig
  • src/multicall.zig
  • src/provider.zig
  • src/retry_provider.zig
  • src/root.zig
  • src/subscription.zig

- abi_encode.zig: add pre-condition doc to writeValuesDirect clarifying
  that all callers (including recursive paths through writeDynamicValueDirect)
  must satisfy values.len <= max_tuple_values; the assert is an internal
  invariant, not a validation gap
- retry_provider.zig: document sendRawTransaction retry semantics explaining
  nonce-protection safety, the lost-response edge case, and the recommended
  fallback of checking receipt status on nonce errors
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/abi_encode.zig (1)

212-249: ⚠️ Potential issue | 🔴 Critical

Propagate TooManyValues through the direct writer as well.

This new EncodeError path never reaches the public encoders: Line 69 and Line 81 still call writeValuesDirect, and nested .array / .fixed_array / .tuple values recurse back into the assert at Line 312. A payload like .{ .array = &items_33 } still aborts in safe builds and can walk past the fixed offsets buffer in optimized builds, so the public API still does not honor error.TooManyValues for deeply nested values.

Suggested direction
 pub fn encodeValues(allocator: std.mem.Allocator, values: []const AbiValue) EncodeError![]u8 {
     if (values.len > max_tuple_values) return error.TooManyValues;
     const total = calcEncodedSize(values);
     const buf = try allocator.alloc(u8, total);
     errdefer allocator.free(buf);
-    writeValuesDirect(buf, values);
+    try writeValuesDirect(buf, values);
     return buf;
 }
 
-fn writeValuesDirect(buf: []u8, values: []const AbiValue) void {
+fn writeValuesDirect(buf: []u8, values: []const AbiValue) EncodeError!void {
+    if (values.len > max_tuple_values) return error.TooManyValues;
     ...
-            pos += writeDynamicValueDirect(buf[pos..], val);
+            pos += try writeDynamicValueDirect(buf[pos..], val);
         }
     }
 }
 
-fn writeDynamicValueDirect(buf: []u8, val: AbiValue) usize {
+fn writeDynamicValueDirect(buf: []u8, val: AbiValue) EncodeError!usize {
     ...
-            writeValuesDirect(buf[32..], items);
+            try writeValuesDirect(buf[32..], items);
             return 32 + calcEncodedSize(items);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/abi_encode.zig` around lines 212 - 249, The public direct-writer path
isn't propagating error.TooManyValues: change writeValuesDirect to return
EncodeError!void (instead of void), have it perform the same max_tuple_values
check and return error.TooManyValues when exceeded, and update its callers (the
public encoders that call writeValuesDirect and any other call sites) to
propagate the error (use try or return) so deeply nested
.array/.fixed_array/.tuple cases can surface TooManyValues instead of aborting;
ensure encodeDynamicValueInto and encodeValuesIntoNoAlloc continue to propagate
errors up the chain.
🧹 Nitpick comments (1)
src/retry_provider.zig (1)

214-228: Jitter adds delay but never subtracts — document or consider full jitter.

The current jitter implementation (lines 219-221) adds 0 to jitter * backoff_ms to the delay, meaning retries are always at least the base backoff. Some implementations use "full jitter" where the delay can be reduced (e.g., base ± jitter%), which provides better spread for thundering-herd mitigation.

The current approach is valid and conservative, but consider either:

  • Documenting that jitter is additive-only, or
  • Switching to full jitter: backoff_ms * (1.0 - jitter + random * 2 * jitter)

This is a minor design choice, not a bug.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/retry_provider.zig` around lines 214 - 228, The jitter currently only
adds a positive extra to backoff in RetryState.next (see jitter_ms, extra,
provider.opts.jitter and provider.prng.random().int(u64)); update the
implementation to use full jitter instead of additive-only by computing a random
multiplier in the range (1.0 - jitter) .. (1.0 + jitter) and applying it to
backoff_ms when sleeping, or alternatively add a clarifying comment/doc
explaining that jitter is intentionally additive-only; if switching to full
jitter, replace the jitter_ms/extra computation with a random float-based
multiplier derived from provider.prng.random() and apply it to self.backoff_ms
before sleeping, then keep the rest of next_backoff/backoff cap logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/retry_provider.zig`:
- Around line 39-53: Replace uses of std.rand.DefaultPrng with
std.Random.DefaultPrng throughout this file (e.g., the prng field on
RetryingProvider and the constructor RetryingProvider.init) to match Zig 0.15.2
API; ensure DefaultPrng.init(seed) calls use std.Random. Also add a short doc
comment on RetryingProvider (near the struct and the prng field) stating that
the PRNG state is mutated on each retry and therefore RetryingProvider instances
are not thread-safe / should not be shared concurrently.

---

Outside diff comments:
In `@src/abi_encode.zig`:
- Around line 212-249: The public direct-writer path isn't propagating
error.TooManyValues: change writeValuesDirect to return EncodeError!void
(instead of void), have it perform the same max_tuple_values check and return
error.TooManyValues when exceeded, and update its callers (the public encoders
that call writeValuesDirect and any other call sites) to propagate the error
(use try or return) so deeply nested .array/.fixed_array/.tuple cases can
surface TooManyValues instead of aborting; ensure encodeDynamicValueInto and
encodeValuesIntoNoAlloc continue to propagate errors up the chain.

---

Nitpick comments:
In `@src/retry_provider.zig`:
- Around line 214-228: The jitter currently only adds a positive extra to
backoff in RetryState.next (see jitter_ms, extra, provider.opts.jitter and
provider.prng.random().int(u64)); update the implementation to use full jitter
instead of additive-only by computing a random multiplier in the range (1.0 -
jitter) .. (1.0 + jitter) and applying it to backoff_ms when sleeping, or
alternatively add a clarifying comment/doc explaining that jitter is
intentionally additive-only; if switching to full jitter, replace the
jitter_ms/extra computation with a random float-based multiplier derived from
provider.prng.random() and apply it to self.backoff_ms before sleeping, then
keep the rest of next_backoff/backoff cap logic unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99076e6f-0c9e-49c5-a810-7f07d76ad4b2

📥 Commits

Reviewing files that changed from the base of the PR and between a6a643b and 5ec4141.

📒 Files selected for processing (2)
  • src/abi_encode.zig
  • src/retry_provider.zig

std.rand doesn't exist in Zig 0.15.2; the correct path is
std.Random.DefaultPrng. This was a latent compile error because
retry_provider.zig wasn't included in unit_tests.zig.

Also adds thread-safety doc comment on the prng field and includes
retry_provider in the unit test suite.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unit_tests.zig (1)

59-60: Also smoke-test the advertised top-level aliases.

This only proves eth.retry_provider is exported. If src/root.zig accidentally drops pub const RetryingProvider or pub const RetryOpts, this test still passes. Adding _ = eth.RetryingProvider; and _ = eth.RetryOpts; here would keep that public API under test.

🧪 Minimal addition
     // Middleware
     _ = eth.retry_provider;
+    _ = eth.RetryingProvider;
+    _ = eth.RetryOpts;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit_tests.zig` around lines 59 - 60, Add smoke-tests for the top-level
exported aliases so the public API surface is validated: in
tests/unit_tests.zig, alongside the existing `_ = eth.retry_provider;` add `_ =
eth.RetryingProvider;` and `_ = eth.RetryOpts;` to ensure `src/root.zig` still
`pub const`-exports those symbols (RetryingProvider and RetryOpts) and cause the
test to fail if either alias is removed.
src/retry_provider.zig (1)

316-347: Keep the backoff test off wall-clock time.

RetryState.next() always sleeps backoff_ms; jitter = 0 only removes the random extra delay. This case still waits about 1.5s end-to-end, which is avoidable CI tax. Extract the delay calculation or inject a sleeper so the backoff progression can be asserted without real sleeps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/retry_provider.zig` around lines 316 - 347, The test currently relies on
real sleeps because RetryState.next() calls the sleeper using backoff_ms (even
with jitter=0); refactor to make the sleep injectable or expose the backoff
calculation so tests can assert progression without wall-clock delays: add a
pluggable sleeper callback or sleep interface on RetryingProvider/RetryState
(referencing RetryingProvider, RetryState.init, RetryState.next, RetryOpts, and
the backoff_ms field), update the production construction to use the real
sleeper, and update the test to provide a no-op/mock sleeper (or call the
extracted compute_backoff/backoff calculation function directly) so backoff_ms
can be checked deterministically without sleeping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/retry_provider.zig`:
- Around line 17-31: RetryOpts values need normalization before use: clamp
max_attempts to at least 1 and ensure initial_backoff_ms does not exceed
max_backoff_ms (also clamp max_backoff_ms and jitter within expected ranges) by
adding normalization in RetryOpts.init (or in RetryState.init) so the public
contract holds at runtime; update RetryOpts.init/RetryState.init to coerce
max_attempts = max(1, max_attempts), initial_backoff_ms =
min(initial_backoff_ms, max_backoff_ms), and enforce sensible bounds for
max_backoff_ms and jitter to prevent first sleep > max and zero-attempt cases.

---

Nitpick comments:
In `@src/retry_provider.zig`:
- Around line 316-347: The test currently relies on real sleeps because
RetryState.next() calls the sleeper using backoff_ms (even with jitter=0);
refactor to make the sleep injectable or expose the backoff calculation so tests
can assert progression without wall-clock delays: add a pluggable sleeper
callback or sleep interface on RetryingProvider/RetryState (referencing
RetryingProvider, RetryState.init, RetryState.next, RetryOpts, and the
backoff_ms field), update the production construction to use the real sleeper,
and update the test to provide a no-op/mock sleeper (or call the extracted
compute_backoff/backoff calculation function directly) so backoff_ms can be
checked deterministically without sleeping.

In `@tests/unit_tests.zig`:
- Around line 59-60: Add smoke-tests for the top-level exported aliases so the
public API surface is validated: in tests/unit_tests.zig, alongside the existing
`_ = eth.retry_provider;` add `_ = eth.RetryingProvider;` and `_ =
eth.RetryOpts;` to ensure `src/root.zig` still `pub const`-exports those symbols
(RetryingProvider and RetryOpts) and cause the test to fail if either alias is
removed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d66f834-cbca-4492-be8f-7e94396673e9

📥 Commits

Reviewing files that changed from the base of the PR and between 5ec4141 and f619ae7.

📒 Files selected for processing (2)
  • src/retry_provider.zig
  • tests/unit_tests.zig

Comment on lines +17 to +31
/// Configuration for RetryingProvider.
pub const RetryOpts = struct {
/// Maximum number of attempts (1 = no retry). Default: 3.
max_attempts: u32 = 3,
/// Initial backoff delay in milliseconds. Default: 100.
initial_backoff_ms: u64 = 100,
/// Backoff multiplier applied after each failed attempt. Default: 2.0 (exponential).
backoff_multiplier: f64 = 2.0,
/// Maximum backoff delay in milliseconds. Default: 5_000.
max_backoff_ms: u64 = 5_000,
/// Jitter factor 0.0–1.0 added to each backoff to prevent thundering herd. Default: 0.1.
jitter: f64 = 0.1,
/// Which errors trigger a retry. Default: connection errors only.
retryable: RetryableErrors = .connection_errors,
};
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize RetryOpts before first use.

initial_backoff_ms is copied through unchanged, so initial_backoff_ms > max_backoff_ms makes the first retry sleep longer than the documented maximum. max_attempts = 0 is also accepted even though the docs define 1 as the no-retry floor. Clamp these invariants in init (or RetryState.init) so the runtime behavior matches the public contract.

🔧 Possible fix
 pub fn init(inner: *Provider, opts: RetryOpts) RetryingProvider {
     const seed = std.crypto.random.int(u64);
+    const normalized_opts = RetryOpts{
+        .max_attempts = if (opts.max_attempts == 0) 1 else opts.max_attempts,
+        .initial_backoff_ms = `@min`(opts.initial_backoff_ms, opts.max_backoff_ms),
+        .backoff_multiplier = opts.backoff_multiplier,
+        .max_backoff_ms = opts.max_backoff_ms,
+        .jitter = opts.jitter,
+        .retryable = opts.retryable,
+    };
     return .{
         .inner = inner,
-        .opts = opts,
+        .opts = normalized_opts,
         .prng = std.Random.DefaultPrng.init(seed),
     };
 }

Also applies to: 47-53, 205-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/retry_provider.zig` around lines 17 - 31, RetryOpts values need
normalization before use: clamp max_attempts to at least 1 and ensure
initial_backoff_ms does not exceed max_backoff_ms (also clamp max_backoff_ms and
jitter within expected ranges) by adding normalization in RetryOpts.init (or in
RetryState.init) so the public contract holds at runtime; update
RetryOpts.init/RetryState.init to coerce max_attempts = max(1, max_attempts),
initial_backoff_ms = min(initial_backoff_ms, max_backoff_ms), and enforce
sensible bounds for max_backoff_ms and jitter to prevent first sleep > max and
zero-attempt cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai The edge cases are already handled correctly or are non-issues:

  • max_attempts = 0: attempt starts at 0, increments to 1, 1 >= 0 is true → returns false immediately. The call fails on first error with no retry. That's reasonable
    behavior — the caller asked for zero attempts.
  • initial_backoff_ms > max_backoff_ms: The first sleep uses initial_backoff_ms as-is, but the next backoff is clamped via @min(next_backoff, max_backoff_ms) (line 227).
    The first sleep being slightly above max is a minor inconsistency, not a bug — and adding normalization logic to "fix" a misconfiguration the caller explicitly provided
    is over-engineering.
  • jitter out of range: A jitter > 1.0 just means more jitter. A negative jitter would produce a negative jitter_ms which, cast to u64, would be enormous — but this is a
    f64 * f64 producing a float, and @intFromFloat of a negative value is safety-checked in Debug mode. Still, this is "don't pass garbage" territory, not something to
    silently clamp.

These are all plain config structs with documented defaults. The caller owns their values. Adding silent normalization makes the API less predictable — if someone passes
max_attempts = 0 and the library silently changes it to 1, that's surprising. This is the same approach Go's http.Client, Rust's reqwest, and similar libraries take:
document the contract, trust the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koko1123, that's a well-reasoned explanation — the behavior is consistent and intentional across all three cases. The "trust the caller" contract is a valid design choice, and silently coercing values like max_attempts = 0 → 1 would indeed be surprising. I'll withdraw the suggestion.


✏️ Learnings added
Learnt from: koko1123
Repo: StrobeLabs/eth.zig PR: 45
File: src/retry_provider.zig:17-31
Timestamp: 2026-03-12T12:53:49.429Z
Learning: In `src/retry_provider.zig` (eth.zig project), `RetryOpts` is a plain config struct that intentionally does NOT normalize edge-case inputs. The design philosophy is "trust the caller":
- `max_attempts = 0` is valid: the retry loop immediately returns false (no retry), which is the caller's intent.
- `initial_backoff_ms > max_backoff_ms` is allowed: only the first sleep may exceed max; subsequent backoffs are clamped via `min` in `RetryState.next`. This is a minor inconsistency, not a bug.
- Out-of-range `jitter` is the caller's responsibility; negative jitter is caught by `intFromFloat` safety checks in Debug mode.
Do not suggest silent normalization/clamping in `RetryingProvider.init` or `RetryState.init` — it makes the API less predictable.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@koko1123 koko1123 merged commit 2175e77 into main Mar 12, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant