Summary
connectivity wait is documented as a way to gate other processes on dependency readiness (init containers, deploy scripts, etc.). It polls forever with a fixed 15s sleep and no overall deadline. If a dependency is permanently broken — a typo'd hostname, a service that never comes up, a destination behind a firewall change — wait hangs indefinitely. The orchestrator (k8s, systemd, GitHub Actions) eventually times out and reports a useless "still waiting" failure with no diagnostic surface.
For an SRE, the desired UX is:
connectivity wait --timeout 5m exits with a distinct non-zero code on timeout
- The log clearly identifies which destination(s) were not reached
- The error is detectable from process exit code without parsing logs
Code
destinations.go:219-230:
func (dest *Destination) WaitFor() {
for {
reachable := dest.Check()
if reachable {
LogDestination(dest, "Connected")
return
}
time.Sleep(15 * time.Second)
}
}
connectivity.go:155-166:
func WaitLoop(destinations []*Destination) {
var wg sync.WaitGroup
for _, dest := range destinations {
wg.Add(1)
go func(dest *Destination) {
defer wg.Done()
dest.WaitFor()
}(dest)
}
wg.Wait()
}
No context.Context, no deadline, no progress reporting beyond per-attempt logs.
Suggested fix
Add a --timeout flag (default e.g. unlimited, but recommend setting one) and propagate it through context:
ctx, cancel := context.WithTimeout(context.Background(), *timeout)
defer cancel()
var wg sync.WaitGroup
errs := make(chan string, len(destinations))
for _, dest := range destinations {
wg.Add(1)
go func(d *Destination) {
defer wg.Done()
if !d.WaitFor(ctx) {
errs <- d.Label
}
}(dest)
}
wg.Wait()
close(errs)
if len(errs) > 0 {
log.Printf("Timed out waiting for: %s", strings.Join(collect(errs), ", "))
os.Exit(1)
}
WaitFor(ctx) should also use exponential backoff (capped) rather than 15s flat — so a fast-failing destination doesn't issue 4 DNS lookups per minute for an hour, and a slow-converging one isn't punished by an aggressive cadence.
Distinct exit codes (e.g., 1 = timeout, 2 = config error) would also help orchestrator-level alerting distinguish "dependency missing" from "we never started".
Summary
connectivity waitis documented as a way to gate other processes on dependency readiness (init containers, deploy scripts, etc.). It polls forever with a fixed 15s sleep and no overall deadline. If a dependency is permanently broken — a typo'd hostname, a service that never comes up, a destination behind a firewall change —waithangs indefinitely. The orchestrator (k8s, systemd, GitHub Actions) eventually times out and reports a useless "still waiting" failure with no diagnostic surface.For an SRE, the desired UX is:
connectivity wait --timeout 5mexits with a distinct non-zero code on timeoutCode
destinations.go:219-230:connectivity.go:155-166:No
context.Context, no deadline, no progress reporting beyond per-attempt logs.Suggested fix
Add a
--timeoutflag (default e.g. unlimited, but recommend setting one) and propagate it through context:WaitFor(ctx)should also use exponential backoff (capped) rather than 15s flat — so a fast-failing destination doesn't issue 4 DNS lookups per minute for an hour, and a slow-converging one isn't punished by an aggressive cadence.Distinct exit codes (e.g., 1 = timeout, 2 = config error) would also help orchestrator-level alerting distinguish "dependency missing" from "we never started".