Skip to content

CI runs -race but no test exercises the concurrent code paths (StatsdSender queue, MonitorLoop, WaitLoop) #49

@dolph

Description

@dolph

go test -race is documented in AGENTS.md and run in CI, but the race detector only flags races on code that executes. The concurrent surface of this codebase is currently unexercised by any test:

  • StatsdSender (statsd.go) is a goroutine spawned from main that reads forever from a package-global queue chan string of capacity 100. The producers — Increment / Count / Timer / Gauge — write to the same channel from any goroutine, including from inside Destination.Check(). There is no test that runs these concurrently.
  • MonitorLoop (connectivity.go) spawns len(destinations) goroutines that each call dest.Monitor() in an infinite time.Sleep-loop. No test.
  • WaitLoop uses a sync.WaitGroup over goroutine-per-destination. No test.
  • The queue channel blocks producers when full (Statsd emitter can stall checks when statsd is down; opens a new connection per metric #11 documents the production hazard). A test that fills the queue with statsd down would expose the back-pressure deadlock the bug describes — that's the closing test for Statsd emitter can stall checks when statsd is down; opens a new connection per metric #11 once it's fixed.

Why this is a real test-engineering problem, not "more tests"

A passing -race build today is a false signal. It says "no races were observed in the code paths we ran," but the only goroutines that the test binary ever spins up are the testing framework's own. The repo's concurrency is currently certified by code review alone.

A handful of small, hermetic tests would change that:

  1. Statsd back-pressure test — fill queue to capacity with no consumer, then call Increment and assert it blocks (use a done channel + select with timeout). This is the regression test for Statsd emitter can stall checks when statsd is down; opens a new connection per metric #11.
  2. Statsd consumer test — spin up a UDP listener on 127.0.0.1:0, point a *Config at it, run StatsdSender in a goroutine, send N metrics from M goroutines, assert all N arrive and the strings match the wire format. Exercises both the goroutine and the format functions referenced in the other test issue.
  3. WaitLoop test — pass a slice of destinations where each WaitFor is replaced by a hermetic stub (requires a small refactor for testability — pass a func() bool or an interface), then drive the loop and verify wg.Wait() returns.
  4. Drain-on-shutdown — once Process lifecycle is brittle: no signal handling, no statsd drain, no panic recovery #17 (signal handling) lands, a test should verify the queue is drained before exit. File this as a follow-up under Process lifecycle is brittle: no signal handling, no statsd drain, no panic recovery #17.

Prerequisites

The current design — package-global queue, StatsdSender(*Config) with no shutdown signal, Destination.Monitor() with an infinite loop and unmockable time.Sleep — is not amenable to any of the above. Each of these tests forces a small testability seam:

  • queue → an injectable channel or a *StatsdEmitter struct.
  • StatsdSender → accept a context.Context for cancellation (aligns with AGENTS.md's context.Context plumbing guidance).
  • Monitor / WaitFor → accept an injectable sleeper/clock or a max-iteration count, or factor the loop body into a separately testable function.

Refs #24 (umbrella). Refs #11 (statsd back-pressure — this would be the regression test). Refs #17 (lifecycle — drain-on-shutdown test belongs there).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions