Summary
Two related problems make the test suite a maintenance liability rather than an asset.
1. Tests aren't hermetic
Several tests reach the real network at test time:
resolver_test.go:45-66 — DNS lookups for 8.8.8.8, example.com, a.b.c
router_test.go:84-89 — routing to 1.2.3.4 (requires real default route)
source_test.go:38-43 — reads local network interfaces
These cause flakes on offline laptops, in restricted CI environments, and in build sandboxes — and they prevent enforcing CI on PRs from environments without internet (some self-hosted runners, GitHub Action retries from cache, contributor laptops on a hotel network).
2. Most production code paths have no tests at all
Untested files: config.go, connectivity.go, dialer.go, https.go, pinger.go, statsd.go, logging.go, help.go. That's the majority of the codebase, including everything related to the CLI dispatcher, statsd protocol formatting (where #14's escaping bug lives), config parsing (where #5 and #6 live), and the HTTP/dial codepaths (where #7, #8, #12, #15 live).
The cause is structural: the production code is hard to test as written:
- Global state in
statsd.go:11 (var queue = make(chan string, 100)) — can't be reset between tests
LoadConfig calls log.Fatalf on error — kills the test process instead of returning an error
main, ShowDestinations, ParseDestinations call os.Exit directly
- Functions take concrete
*Destination everywhere — no interfaces to fake
- Everything in
package main — internal-package boundaries can't enforce abstraction
Suggested fix
Quick wins:
-
Gate network-touching tests behind a build tag or testing.Short:
func TestLookupPublicIp(t *testing.T) {
if testing.Short() {
t.Skip("requires network")
}
...
}
Run go test -short ./... in CI by default, and go test ./... in a separate "integration" job.
-
Make LoadConfig return an error instead of calling log.Fatalf. Move the fatal handling to main.
-
Add basic unit tests for the deterministic surfaces with no excuses for being missing:
Structural follow-ups (larger change, optional):
- Extract
internal/probe, internal/config, internal/statsd packages.
- Introduce small interfaces at boundaries (
type Dialer interface{ Dial(network, addr string) (net.Conn, error) }) so the HTTP/dial paths can be tested without real sockets.
Relates to #23 (CI coverage), and would catch regressions for many of the open bugs.
Summary
Two related problems make the test suite a maintenance liability rather than an asset.
1. Tests aren't hermetic
Several tests reach the real network at test time:
resolver_test.go:45-66— DNS lookups for8.8.8.8,example.com,a.b.crouter_test.go:84-89— routing to1.2.3.4(requires real default route)source_test.go:38-43— reads local network interfacesThese cause flakes on offline laptops, in restricted CI environments, and in build sandboxes — and they prevent enforcing CI on PRs from environments without internet (some self-hosted runners, GitHub Action retries from cache, contributor laptops on a hotel network).
2. Most production code paths have no tests at all
Untested files:
config.go,connectivity.go,dialer.go,https.go,pinger.go,statsd.go,logging.go,help.go. That's the majority of the codebase, including everything related to the CLI dispatcher, statsd protocol formatting (where #14's escaping bug lives), config parsing (where #5 and #6 live), and the HTTP/dial codepaths (where #7, #8, #12, #15 live).The cause is structural: the production code is hard to test as written:
statsd.go:11(var queue = make(chan string, 100)) — can't be reset between testsLoadConfigcallslog.Fatalfon error — kills the test process instead of returning an errormain,ShowDestinations,ParseDestinationscallos.Exitdirectly*Destinationeverywhere — no interfaces to fakepackage main— internal-package boundaries can't enforce abstractionSuggested fix
Quick wins:
Gate network-touching tests behind a build tag or
testing.Short:Run
go test -short ./...in CI by default, andgo test ./...in a separate "integration" job.Make
LoadConfigreturn an error instead of callinglog.Fatalf. Move the fatal handling tomain.Add basic unit tests for the deterministic surfaces with no excuses for being missing:
EscapeTag(statsd.go)formatTags(statsd.go)Count/Timer/Gaugepayload formattingConfigYAML loading (table-driven)GetURLs/ParseDestinations(the source of bugParseDestinationssilently drops the first URL when loaded from a config file #5)UrlStringpassword redaction (the property URL passwords leak into logs viahttp.Geterror messages #12 depends on)Structural follow-ups (larger change, optional):
internal/probe,internal/config,internal/statsdpackages.type Dialer interface{ Dial(network, addr string) (net.Conn, error) }) so the HTTP/dial paths can be tested without real sockets.Relates to #23 (CI coverage), and would catch regressions for many of the open bugs.