Skip to content

fix(nodectl): bound ton-http-api per-endpoint wait so daemon cannot hang on unreachable upstream#158

Merged
Keshoid merged 5 commits into
release/nodectl/v0.5.0from
feature/sma-95-nodectl-daemon-hangs-when-ton-http-api-endpoint-is
May 20, 2026
Merged

fix(nodectl): bound ton-http-api per-endpoint wait so daemon cannot hang on unreachable upstream#158
Keshoid merged 5 commits into
release/nodectl/v0.5.0from
feature/sma-95-nodectl-daemon-hangs-when-ton-http-api-endpoint-is

Conversation

@Keshoid
Copy link
Copy Markdown
Contributor

@Keshoid Keshoid commented May 19, 2026

Summary

When the configured ton-http-api endpoint is unreachable (blackholed IP, wrong port, network split), nodectl daemon HTTP handlers that depend on rpc_client were hanging for as long as the kernel kept the outbound TCP connect attempt open. The CLI eventually failed with the local request timeout, blaming the nodectl service rather than ton-http-api.

Root cause: toncenter-rs::BaseApiClient::new uses reqwest::Client::new() with no timeouts; nothing in ClientJsonRpc capped the per-endpoint wait. Total wait grew linearly with the number of configured endpoints.

Changes

  • TonHttpApiConfig gains two optional fields, connect_timeout_secs and request_timeout_secs (defaults 3s / 5s), plus a resolved_timeouts() helper that returns the new EndpointTimeouts value type.
  • ClientJsonRpc::connect_many now takes EndpointTimeouts. Each per-endpoint call in json_rpc is wrapped in tokio::time::timeout(connect + request) — an unreachable upstream can no longer stall the failover loop.
  • On total failure, the error now reads ton-http-api unreachable: tried N endpoint(s): [url1: timed out after 8s; url2: connection refused; ...] and a tracing::warn! is emitted, so the operator can identify ton-http-api as the source rather than the local nodectl service.
  • Call sites updated to thread resolved_timeouts() through: service/src/runtime_config.rs, commands/.../nodectl/utils.rs, commands/.../ton_http_api/get_config_param_cmd.rs. The mock in service/src/auth/user_store.rs was updated as well.
  • ton-http-api-client gains a tokio dependency with the time feature.
  • Tests added in client_json_rpc.rs (single dead endpoint times out, N dead endpoints stay within total budget, aggregated error lists every attempted endpoint) and in app_config.rs (default/explicit timeouts, serde round-trip, skip-when-unset).

Notes

  • Handler behavior (silent degradation on RPC failure per SMA-43) is intentionally unchanged. The new endpoint-failure detail flows through wherever errors do propagate (e.g. per-slot state="error" in /v1/pools) and into daemon logs.
  • Existing configs are forward-compatible: both new fields are Option<u64> with #[serde(default, skip_serializing_if = "Option::is_none")], so on-disk configs are unchanged until an operator sets a value.

Closes SMA-95

Copilot AI review requested due to automatic review settings May 19, 2026 16:44
@linear
Copy link
Copy Markdown

linear Bot commented May 19, 2026

SMA-95

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents nodectl daemon handlers from hanging indefinitely when configured ton-http-api upstream endpoints are unreachable by bounding each per-endpoint JSON-RPC attempt with a timeout budget and improving failure diagnostics.

Changes:

  • Introduces configurable per-endpoint timeout settings in TonHttpApiConfig (connect_timeout_secs, request_timeout_secs) and a resolved EndpointTimeouts type.
  • Wraps each endpoint JSON-RPC attempt in tokio::time::timeout(EndpointTimeouts::total()), and returns an aggregated error listing all attempted endpoints and their failure reasons.
  • Threads the resolved timeouts through all ClientJsonRpc::connect_many call sites; adds tests for timeout bounding and aggregated errors; adds tokio (time feature) to ton-http-api-client.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/node-control/ton-http-api-client/src/v2/client_json_rpc.rs Adds per-endpoint timeout enforcement, aggregated errors/logging, and new tests (including blackhole endpoint simulation).
src/node-control/ton-http-api-client/Cargo.toml Adds tokio dependency for runtime timeout support.
src/node-control/service/src/runtime_config.rs Passes resolved endpoint timeouts when constructing the JSON-RPC client.
src/node-control/service/src/auth/user_store.rs Updates test mock client construction to pass default timeouts.
src/node-control/common/src/app_config.rs Adds timeout config fields/defaults, EndpointTimeouts, and serde/default tests.
src/node-control/commands/src/commands/ton_http_api/get_config_param_cmd.rs Threads resolved timeouts into CLI-side RPC client construction (and removes unused clap::command import).
src/node-control/commands/src/commands/nodectl/utils.rs Threads resolved timeouts into RPC client construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +164 to +170
let reason = format!("timed out after {}s", per_endpoint_budget.as_secs());
tracing::debug!(
method,
endpoint = %endpoint.url,
attempt = attempt + 1,
total_attempts = total,
timeout_secs = per_endpoint_budget.as_secs(),
Comment on lines +505 to +514
// Accept and hold connections open without responding.
loop {
if let Ok((socket, _)) = listener.accept().await {
// Park the socket so the request never completes; drop on task abort.
tokio::spawn(async move {
let _socket = socket;
std::future::pending::<()>().await;
});
} else {
break;
Comment on lines +44 to +46
/// `connect` bounds the initial TCP/TLS handshake, `request` bounds the
/// overall per-endpoint wall-clock budget. Their sum caps the time spent
/// on any single endpoint before failing over.
@Keshoid Keshoid merged commit 7a23e60 into release/nodectl/v0.5.0 May 20, 2026
5 checks passed
@Keshoid Keshoid deleted the feature/sma-95-nodectl-daemon-hangs-when-ton-http-api-endpoint-is branch May 20, 2026 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants