Expose per-upstream client timeouts and retries in ClientConfig#203
Merged
Conversation
`Client::new` already accepts `request_timeout`, `connection_timeout`, and `retries` arguments, but `from_config` hardcodes all three to `None` because `ClientConfig` only exposes `endpoints` and `shuffle_endpoints`. As a result the only way to override the 30s per-upstream request timeout (and the 30s connection timeout, and the default retry count) is to construct `Client` directly in Rust, which isn't reachable from the YAML-driven config. Adds three optional fields to `ClientConfig`: - `request_timeout_seconds` - `connection_timeout_seconds` - `retries` `from_config` plumbs them into `Client::new`. None of the existing defaults change when the fields are omitted. The motivating case is heavy storage queries against slow public RPCs (Acala under load is the case that surfaced this in `polkadot-fellows/runtimes#1180` / `open-web3-stack/polkadot-ecosystem-tests#621`) where 30s per upstream is not enough and Subway exhausts its endpoint cycle without serving a response.
…t timeout, and retries
xlc
approved these changes
May 20, 2026
rockbmb
added a commit
to open-web3-stack/polkadot-ecosystem-tests
that referenced
this pull request
May 20, 2026
Subway's default per-upstream request timeout is 30s. With three Acala public RPC endpoints, heavy storage queries that take longer than 30s cause Subway to cycle through all three endpoints (~90s) before any single upstream has a chance to respond, and the test-side waiting client times out. `request_timeout_seconds` was added to `ClientConfig` in AcalaNetwork/subway#203 (Subway v0.1.1+). Setting it to 90 lets a single upstream attempt run long enough to complete those queries instead of being preempted by Subway's own per-endpoint clock. The companion exclusion of Acala tests in `vitest.config.mts` is intentionally left in place; this commit only restores Subway's ability to wait long enough. Lifting the exclusion is a separate verification step.
rockbmb
added a commit
to open-web3-stack/polkadot-ecosystem-tests
that referenced
this pull request
May 20, 2026
…pstream timeout (#622) * Install Subway from upstream `v0.1.0` musl release in `ci.yml` Switches `cargo install --git` to a `curl | tar -xz` of the released static binary (https://github.com/AcalaNetwork/subway/releases/tag/v0.1.0, published by AcalaNetwork/subway#202). Removes the Rust toolchain install, Subway-HEAD commit-hash lookup, and Swatinem cache layer that existed only to amortise the `cargo install` cost — none of them have any other consumer in this workflow. * Install Subway from upstream `v0.1.0` musl release in `update-known-good.yml` Same swap as the previous commit, applied to the periodic block-number update workflow. * Install Subway from upstream `v0.1.0` musl release in `update-snapshot.yml` Same swap as the previous two commits, applied to the snapshot-update workflow. * Fail Subway download fast on HTTP errors (`curl -f`) Without `-f`, an HTTP 4xx/5xx response (e.g. release deleted, GitHub degraded) leaves `curl` exiting zero with the error body on stdout, and the downstream `tar -xz` fails with a confusing "not in gzip format" message instead. Per review on PR #622. * Install Subway by extracting binary from `acala/subway:v0.1.1` Docker image The `v0.1.1` GitHub Release at AcalaNetwork/subway is missing its `x86_64-unknown-linux-musl.tar.gz` asset; the release workflow's `Build release binary` step failed (`cargo build --locked` mismatched the bumped `Cargo.toml` version), so the upload was skipped. The upstream tag still produces a working Docker image because `docker.yml` doesn't use `--locked`, so `acala/subway:v0.1.1` is the only working consumption path for v0.1.1. The image's binary lives at `/usr/local/bin/subway` (per Subway's Dockerfile); copying it out with `docker create` + `docker cp` lands in roughly the same wall time as the curl-and-untar path and unblocks consumption of PR #203's `request_timeout_seconds` config field. * Set Subway per-upstream `request_timeout_seconds` to 90s Subway's default per-upstream request timeout is 30s. With three Acala public RPC endpoints, heavy storage queries that take longer than 30s cause Subway to cycle through all three endpoints (~90s) before any single upstream has a chance to respond, and the test-side waiting client times out. `request_timeout_seconds` was added to `ClientConfig` in AcalaNetwork/subway#203 (Subway v0.1.1+). Setting it to 90 lets a single upstream attempt run long enough to complete those queries instead of being preempted by Subway's own per-endpoint clock. The companion exclusion of Acala tests in `vitest.config.mts` is intentionally left in place; this commit only restores Subway's ability to wait long enough. Lifting the exclusion is a separate verification step. * Re-enable Acala test suites `request_timeout_seconds: 90` on Subway's upstream client (added to `subway-template.yml` in the previous commit) gives Subway enough time per upstream attempt for Acala storage queries to land before the 30s default forced it to cycle endpoints. The exclusion added in PR #621 is no longer needed and is removed; the exclusion comment is narrowed to bifrostKusama, which still lacks a workable endpoint set.
rockbmb
added a commit
to rockbmb/runtimes
that referenced
this pull request
May 20, 2026
… image The `v0.1.1` GitHub Release artifact is missing because the release workflow's `Build release binary` step failed against a stale `Cargo.lock`; the upload step was skipped. The Docker image build at the same tag succeeded (it doesn't use `--locked`), so `acala/subway:v0.1.1` is the only working consumption path for the release that includes AcalaNetwork/subway#203's new `request_timeout_seconds` field, which the next commit relies on. Mirrors the equivalent change in open-web3-stack/polkadot-ecosystem-tests#622.
rockbmb
added a commit
to rockbmb/runtimes
that referenced
this pull request
May 20, 2026
Subway's default per-upstream request timeout is 30s. With three Acala public RPC endpoints, heavy storage queries that take longer than 30s cause Subway to cycle through all three (~90s) before any single upstream has time to respond, and the chopsticks-side client times out. `request_timeout_seconds` is the new field added in AcalaNetwork/subway#203 (Subway v0.1.1+, installed in the previous commit). Setting it to 90 lets a single upstream attempt run long enough to complete those queries instead of being preempted by Subway's per-endpoint clock. Mirrors the equivalent change in open-web3-stack/polkadot-ecosystem-tests#622.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Client::newalready acceptsrequest_timeout,connection_timeout, andretriesarguments, butfrom_confighardcodes all three toNonebecauseClientConfigonly exposesendpointsandshuffle_endpoints. As a result the only way to override the 30s per-upstream request timeout (and the 30s connection timeout, and the default retry count) is to constructClientdirectly in Rust, which isn't reachable from the YAML-driven config that downstream Subway deployments use.The motivating case is heavy storage queries against slow public RPCs. When chopsticks issues a
chain_getBlockHash(or similar) to Subway, the upstream chain doesn't always respond inside 30s. Subway cycles to the next endpoint, waits another 30s, and so on, never serving a response to chopsticks; the call eventually times out higher up the stack. Increasing chopsticks' ownrpc-timeoutdoesn't help because Subway never gets past the 30s/endpoint cycle. This is what surfaced in polkadot-fellows/runtimes#1180 / open-web3-stack/polkadot-ecosystem-tests#621 (Acala under sharded-CI load).Change
Adds three optional fields to
ClientConfig:request_timeout_secondsconnection_timeout_secondsretriesfrom_configplumbs them intoClient::new. None of the existing defaults change when the fields are omitted (30s request timeout, 30s connection timeout, default retry count). The three test-internalClientConfigliterals are updated to set the new fields toNonefor parity with the previous behaviour.Trivial diff, four files, no behavioural change in the absence of new YAML entries.