fix(nodectl): bound ton-http-api per-endpoint wait so daemon cannot hang on unreachable upstream#158
Merged
Keshoid merged 5 commits intoMay 20, 2026
Conversation
…ang on unreachable upstream
…mon-hangs-when-ton-http-api-endpoint-is
Contributor
There was a problem hiding this comment.
Pull request overview
This PR prevents nodectl daemon handlers from hanging indefinitely when configured ton-http-api upstream endpoints are unreachable by bounding each per-endpoint JSON-RPC attempt with a timeout budget and improving failure diagnostics.
Changes:
- Introduces configurable per-endpoint timeout settings in
TonHttpApiConfig(connect_timeout_secs,request_timeout_secs) and a resolvedEndpointTimeoutstype. - Wraps each endpoint JSON-RPC attempt in
tokio::time::timeout(EndpointTimeouts::total()), and returns an aggregated error listing all attempted endpoints and their failure reasons. - Threads the resolved timeouts through all
ClientJsonRpc::connect_manycall sites; adds tests for timeout bounding and aggregated errors; addstokio(time feature) toton-http-api-client.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/node-control/ton-http-api-client/src/v2/client_json_rpc.rs |
Adds per-endpoint timeout enforcement, aggregated errors/logging, and new tests (including blackhole endpoint simulation). |
src/node-control/ton-http-api-client/Cargo.toml |
Adds tokio dependency for runtime timeout support. |
src/node-control/service/src/runtime_config.rs |
Passes resolved endpoint timeouts when constructing the JSON-RPC client. |
src/node-control/service/src/auth/user_store.rs |
Updates test mock client construction to pass default timeouts. |
src/node-control/common/src/app_config.rs |
Adds timeout config fields/defaults, EndpointTimeouts, and serde/default tests. |
src/node-control/commands/src/commands/ton_http_api/get_config_param_cmd.rs |
Threads resolved timeouts into CLI-side RPC client construction (and removes unused clap::command import). |
src/node-control/commands/src/commands/nodectl/utils.rs |
Threads resolved timeouts into RPC client construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+164
to
+170
| let reason = format!("timed out after {}s", per_endpoint_budget.as_secs()); | ||
| tracing::debug!( | ||
| method, | ||
| endpoint = %endpoint.url, | ||
| attempt = attempt + 1, | ||
| total_attempts = total, | ||
| timeout_secs = per_endpoint_budget.as_secs(), |
Comment on lines
+505
to
+514
| // Accept and hold connections open without responding. | ||
| loop { | ||
| if let Ok((socket, _)) = listener.accept().await { | ||
| // Park the socket so the request never completes; drop on task abort. | ||
| tokio::spawn(async move { | ||
| let _socket = socket; | ||
| std::future::pending::<()>().await; | ||
| }); | ||
| } else { | ||
| break; |
Comment on lines
+44
to
+46
| /// `connect` bounds the initial TCP/TLS handshake, `request` bounds the | ||
| /// overall per-endpoint wall-clock budget. Their sum caps the time spent | ||
| /// on any single endpoint before failing over. |
mrnkslv
approved these changes
May 19, 2026
ITBear
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the configured
ton-http-apiendpoint is unreachable (blackholed IP, wrong port, network split), nodectl daemon HTTP handlers that depend onrpc_clientwere hanging for as long as the kernel kept the outbound TCP connect attempt open. The CLI eventually failed with the local request timeout, blaming the nodectl service rather than ton-http-api.Root cause:
toncenter-rs::BaseApiClient::newusesreqwest::Client::new()with no timeouts; nothing inClientJsonRpccapped the per-endpoint wait. Total wait grew linearly with the number of configured endpoints.Changes
TonHttpApiConfiggains two optional fields,connect_timeout_secsandrequest_timeout_secs(defaults 3s / 5s), plus aresolved_timeouts()helper that returns the newEndpointTimeoutsvalue type.ClientJsonRpc::connect_manynow takesEndpointTimeouts. Each per-endpoint call injson_rpcis wrapped intokio::time::timeout(connect + request)— an unreachable upstream can no longer stall the failover loop.ton-http-api unreachable: tried N endpoint(s): [url1: timed out after 8s; url2: connection refused; ...]and atracing::warn!is emitted, so the operator can identify ton-http-api as the source rather than the local nodectl service.resolved_timeouts()through:service/src/runtime_config.rs,commands/.../nodectl/utils.rs,commands/.../ton_http_api/get_config_param_cmd.rs. The mock inservice/src/auth/user_store.rswas updated as well.ton-http-api-clientgains atokiodependency with thetimefeature.client_json_rpc.rs(single dead endpoint times out, N dead endpoints stay within total budget, aggregated error lists every attempted endpoint) and inapp_config.rs(default/explicit timeouts, serde round-trip, skip-when-unset).Notes
state="error"in/v1/pools) and into daemon logs.Option<u64>with#[serde(default, skip_serializing_if = "Option::is_none")], so on-disk configs are unchanged until an operator sets a value.Closes SMA-95