Skip to content

Tests aren't hermetic and major modules are untested; production code is hard to test #24

@dolph

Description

@dolph

Summary

Two related problems make the test suite a maintenance liability rather than an asset.

1. Tests aren't hermetic

Several tests reach the real network at test time:

  • resolver_test.go:45-66 — DNS lookups for 8.8.8.8, example.com, a.b.c
  • router_test.go:84-89 — routing to 1.2.3.4 (requires real default route)
  • source_test.go:38-43 — reads local network interfaces

These cause flakes on offline laptops, in restricted CI environments, and in build sandboxes — and they prevent enforcing CI on PRs from environments without internet (some self-hosted runners, GitHub Action retries from cache, contributor laptops on a hotel network).

2. Most production code paths have no tests at all

Untested files: config.go, connectivity.go, dialer.go, https.go, pinger.go, statsd.go, logging.go, help.go. That's the majority of the codebase, including everything related to the CLI dispatcher, statsd protocol formatting (where #14's escaping bug lives), config parsing (where #5 and #6 live), and the HTTP/dial codepaths (where #7, #8, #12, #15 live).

The cause is structural: the production code is hard to test as written:

  • Global state in statsd.go:11 (var queue = make(chan string, 100)) — can't be reset between tests
  • LoadConfig calls log.Fatalf on error — kills the test process instead of returning an error
  • main, ShowDestinations, ParseDestinations call os.Exit directly
  • Functions take concrete *Destination everywhere — no interfaces to fake
  • Everything in package main — internal-package boundaries can't enforce abstraction

Suggested fix

Quick wins:

  1. Gate network-touching tests behind a build tag or testing.Short:

    func TestLookupPublicIp(t *testing.T) {
        if testing.Short() {
            t.Skip("requires network")
        }
        ...
    }

    Run go test -short ./... in CI by default, and go test ./... in a separate "integration" job.

  2. Make LoadConfig return an error instead of calling log.Fatalf. Move the fatal handling to main.

  3. Add basic unit tests for the deterministic surfaces with no excuses for being missing:

Structural follow-ups (larger change, optional):

  • Extract internal/probe, internal/config, internal/statsd packages.
  • Introduce small interfaces at boundaries (type Dialer interface{ Dial(network, addr string) (net.Conn, error) }) so the HTTP/dial paths can be tested without real sockets.

Relates to #23 (CI coverage), and would catch regressions for many of the open bugs.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions