Warn on partial UDP send failure, raise SendError when all fail#23
Merged
Conversation
A host with both A and AAAA records on a network with no route for one family (e.g. v4-only network with AAAA record) will fail the send for the unreachable family while the other family succeeds. The previous "Sparoid error sending ..." wording made the partial failure look fatal; reword to "Sparoid warn: skip <ip>: <reason>" to make clear it is recoverable, mirroring sparoid.rb#20.
When a host resolves to both A and AAAA addresses but the network only routes one family, the unreachable family's send raises EHOSTUNREACH. Previously every per-addr failure was logged, which is noisy when the other family's send succeeded and the call as a whole worked. Collect errors and only log them if every address failed; include the original hostname alongside the IP for easier triage.
The previous integration tests relied on "send to 0.0.0.0 fails with EHOSTUNREACH" to exercise the all-failed branch, which is true on macOS but not on the Linux CI runners where the kernel accepts the send. Extract the error-reporting decision into a pure helper that can be unit-tested with synthetic inputs, independent of OS networking behavior.
- Per-address failure with at least one success: STDERR warn line
("Sparoid warn: skip <host> (<ip>): <reason>") so v4-only networks
don't make AAAA-resolved hosts log catastrophically.
- Every address failed: raise Sparoid::Client::SendError with all
per-address details. The CLI already catches and exits 1, so this
surfaces a clear actionable failure instead of returning quietly and
failing later in fdpass.
Apps embedding sparoid as a shard at scale (15k+ servers) need partial
UDP send failures classified as WARN-level by their monitoring, not as
ERROR-level (which most aggregators infer from STDERR by default).
- Sparoid::Client::Log = ::Log.for(self) declares a log source so apps
can filter sparoid output specifically.
- udp_send emits per-address skips via Log.warn { ... }.
- client-cli configures Log.setup_from_env with an STDERR backend (in
:connect mode STDOUT is the unix-domain FD-passing channel and isn't
safe for free-form output).
There was a problem hiding this comment.
Pull request overview
This PR changes the client-side UDP send path so mixed-family delivery failures are downgraded from unconditional STDERR errors to structured warnings, while escalating the fully-failed case into a Sparoid::Client::SendError. In the broader codebase, this is aimed at making sparoid's client behavior less noisy in partial-success environments while preserving a hard failure when no address can be reached.
Changes:
- Add per-address error collection in
Sparoid::Client, warning on partial send failures and raisingSendErrorwhen all sends fail. - Configure the CLI logger to write to STDERR so
connectmode does not contaminate the FD-passing STDOUT channel. - Add a new spec for the error-formatting helper used to build the raised exception message.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/client.cr |
Adds aggregated UDP send failure handling, warning logs, and SendError formatting. |
src/client-cli.cr |
Initializes logging to STDERR for CLI execution, especially connect mode. |
spec/sparoid_spec.cr |
Adds unit coverage for the new send-error message formatting helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- SendError no longer prepends 'Sparoid:' to the message — the CLI's rescue clause already prefixes 'Sparoid error:', so the previous format produced 'Sparoid error: Sparoid: failed to send...'. - Replace the format_send_errors formatting helper with a process_send_results helper that owns the partial-vs-total decision. This raises SendError directly when every send failed and returns the per-address partial-failure errors otherwise. - Specs now cover both branches plus the all-succeeded and empty-input cases, exercising the actual control flow rather than just message formatting.
Contributor
Author
|
Ran metrics-shovel with this patch, output now as warn level at least. And can be configured to be silent by |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sparoid error sending #<...>per per-address failure even when the other family succeeded — at shard-in-app scale (15k+ servers) this floods log aggregators with what monitoring classifies as ERROR-level lines.Log.warn { "skip <host> (<ip>): <reason>" }per failed addr, classified as WARN by any properly configured backend. Apps embedding the shard can filtersparoid.clientspecifically. Returns normally.Sparoid::Client::SendErrorwith all per-address details. The CLI's existingrescue ex(src/client-cli.cr:62) printsSparoid error: <msg>and exits 1.Sparoid::Client::Log = ::Log.for(self)declares a dedicated log source.Log::IOBackend.new(STDERR)because in:connectmode STDOUT is the unix-domain socket used forSCM_RIGHTSFD passing — free-form text there would queue ahead of thesendmsgand confuse SSH'srecvmsg.fdpassalready handles per-addr connect failures correctly (each attempt runs in its own fiber, first to connect wins, rest silent), so onlyudp_sendneeded the change.Context
EHOSTUNREACHerrors into the log even though the v4 send worked. The shard is used at scale to send to thousands of brokers.fdpass).Test plan
crystal spec spec/sparoid_spec.cr— coversformat_send_errors(used to build the raised message). The unrelatedclient can send another IPfailure exists onmainfor the same network-setup reason; on Linux CI it still passes.crystal tool format --check— cleanbin/ameba— clean