Skip to content

flake: TestPostWorkspacesByOrganization/AllProvisionersStale #1288

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/21235972985

Failing job:

  • Workflow: nightly-gauntlet
  • Job: test-go-pg (macos-latest)
  • Completed at: 2026-01-22T04:36:34Z (run_attempt=1)

Commit:

  • 26ce070393466ec05036375992c0ec657b95fec0 (DevCats) — "feat: update doc-check workflow to utilize claude-skills (#21588)"

Failure evidence:

  • File: coderd/workspaces_test.go
  • Test: TestPostWorkspacesByOrganization/AllProvisionersStale
  • Lines: 1392-1393

Observed failure:

workspaces_test.go:1392:
    Error: Should be zero, but was 1
    Test:  TestPostWorkspacesByOrganization/AllProvisionersStale
workspaces_test.go:1393:
    Error: Not equal:
           expected: 2026-01-22 03:28:27.445925 +0000 UTC
           actual  : 2026-01-22 04:28:27.364759 +0000 UTC

What the test is doing (relevant snippet):

  • Starts coderd with IncludeProvisionerDaemon: true
  • Manually backdates provisioner daemons:
    • newLastSeenAt := dbtime.Now().Add(-time.Hour)
    • UPDATE provisioner_daemons SET last_seen_at = $1
  • Creates a workspace and expects:
    • MatchedProvisioners.Count == 1
    • MatchedProvisioners.Available == 0
    • MatchedProvisioners.MostRecentlySeen.Time == newLastSeenAt

Hypothesis / likely root cause:

  • The in-memory provisioner daemon is still running and can heartbeat/update provisioner_daemons.last_seen_at concurrently with the test’s manual UPDATE.
  • If the daemon updates last_seen_at after the test sets it stale (but before the workspace creation response is assembled), then:
    • MostRecentlySeen becomes ~now (not -1h)
    • Available can become 1 (daemon considered healthy), violating the test’s expectation.

This matches the observed 1-hour skew and Available == 1.

Not a data race / crash:

  • No WARNING: DATA RACE, panic:, or OOM indicators seen in the job logs.

Suggested fix direction:

  • Make the daemon deterministically stale for the duration of the assertion, e.g.:
    • stop/pause the provisioner daemon heartbeat before updating last_seen_at, or
    • run the test without a live provisioner daemon and insert a stale provisioner_daemons row directly, or
    • add test helpers to control/override daemon last-seen timestamps without racing a background heartbeat.

Duplicate search (coder/internal):

  • Searched: AllProvisionersStale, TestPostWorkspacesByOrganization, workspaces_test.go:1392, provisioner_daemons last_seen_at — no matches.

Assignment analysis:

  • Recent relevant area changes include provisioner operation/test plumbing changes in 3194bcfc (Steven Masley) and the existing related flake in TestTemplateVersionDryRun/ImportNotFinished is also owned by the templates/provisioner code.
  • Assigning to the most recent meaningful modifier in the provisioner/test integration area.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions