Skip to content

S12.17: Tunable TCP connect and TLS handshake timeouts #282

@DavidCozens

Description

@DavidCozens

Parent epic: #31 (E12: Error Handling)
Follow-on from: #276 (S12.14 — fail-fast streams + eager Service drain)

Background

Two timeouts are currently hardcoded by S12.14:

  • TCP connect — 200 ms
    (Platform/Windows/Source/SolidSyslogWinsockTcpStream.c:88-101,
    Platform/Posix/Source/SolidSyslogPosixTcpStream.c:24).
  • TLS handshake — 5 s
    (HANDSHAKE_TIMEOUT_MS in Platform/OpenSsl/Source/SolidSyslogTlsStream.c).

Both values were tuned for the BlockStore drain-rate scenario on
loopback / LAN, which is the right default for the BDD suite. They are
too tight for real WAN deployments — e.g. a far cloud SIEM behind a
high-RTT link or a transparent proxy, where 200 ms connect and 5 s
handshake routinely exceed the budget even when the path is healthy.

Tradeoff to surface

  • Tight bounds keep the BlockStore discard policy responsive during
    outages — overflow is observable in tests within seconds.
  • Loose bounds match real WAN deployments but slow the drop-detection
    path during a true outage; overflow scenarios take proportionally
    longer to trigger.

This story does not change the defaults — it only makes them tunable
per deployment.

Scope

Two design options to choose between in implementation:

A. Compile-time CMake variables. SOLIDSYSLOG_TCP_CONNECT_TIMEOUT_MS
and SOLIDSYSLOG_TLS_HANDSHAKE_TIMEOUT_MS, plumbed via
add_compile_definitions. Cheapest; integrators rebuild to change.

B. Config-injected fields. Add connectTimeoutMs to
SolidSyslogStreamSenderConfig and handshakeTimeoutMs to
SolidSyslogTlsStreamConfig. Zero / unset means "use the existing
default constant". Higher cost; integrators set per-deployment without
rebuilding.

Recommend B — config injection is the pattern the rest of the
library has been moving toward (see CLAUDE.md "Callback Conventions");
compile-time variables would be a step backwards. Confirm before
implementation.

Verification

  • Unit: assert the configured values reach the bounded select()
    loop in PosixTcpStream / WinsockTcpStream and the bounded
    handshake retry loop in TlsStream. Existing fakes
    (SocketFake, WinsockFake, plus the OpenSSL test seam) capture the
    deadline argument.
  • No BDD changes — the BDD harness keeps the current default; tunable
    values are a deployment-time concern, not a behavioural one for the
    suite.

Out of scope

  • Read / Send timeouts during normal operation — unrelated; not part
    of fail-fast contract.
  • Adaptive / exponential backoff between reconnect attempts — separate
    story if needed.

Acceptance

  • Defaults unchanged (BDD suite still passes with no config change).
  • Setting non-zero connectTimeoutMs / handshakeTimeoutMs is
    observed in the bounded-wait paths in unit tests.
  • Coverage stays at 100% line/branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    storyStory issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions