Skip to content

flake: TestAgent_ReconnectNoLifecycleReemit #1410

@flake-investigator

Description

@flake-investigator

CI run: https://github.com/coder/coder/actions/runs/23242064104
Failing job: https://github.com/coder/coder/actions/runs/23242064104/job/67560833697 (test-go-pg (macos-latest))
Commit: 4f566f92b524a8eb5a339b062cf13e26eca07365 (Danielle Maywood) coder/coder@4f566f9

Failure:

  • Test: agent TestAgent_ReconnectNoLifecycleReemit
  • Symptom: slogtest treats ERROR logs as test failures. During the reconnect, the agent logs an ERROR when attempting to report the script completion after the coordinator disconnect.

Key log excerpt:

=== FAIL: agent TestAgent_ReconnectNoLifecycleReemit (6.79s)
    t.go:120: 2026-03-18 11:20:27.465 [erro]  agent: reporting script completed: connection closed; connection closed  log_source_id=00000000-0000-0000-0000-000000000000  log_path=/var/folders/1r/b8l6nzhj4zdbz5jmjdhqbprw0000gn/T/coder-script-00000000-0000-0000-0000-000000000000.log  script_data_dir=/var/folders/1r/b8l6nzhj4zdbz5jmjdhqbprw0000gn/T/coder-script-data/00000000-0000-0000-0000-000000000000
         *** slogtest: log detected at level ERROR; TEST FAILURE ***

Root cause assessment:

  • Flaky test (timing-sensitive). The reconnect path on macOS closes the coordinator, and the script completion log upload races with the disconnect, producing an ERROR log that slogtest flags.
  • No panic/OOM/race detector output observed.

Assignment analysis:

  • git blame -L 3042,3092 agent/agent_test.go points to commit 22a87f6 ("fix: filter sub-agents from build duration metric"), which introduced TestAgent_ReconnectNoLifecycleReemit.
  • Commit author: Jon Ayers. Unable to assign in coder/internal, so assigning to agent-area owner for triage.

Suggested next steps:

  • Consider downgrading this specific error log to WARN/DEBUG in the reconnect path, or update the test to allow the expected connection-closed error during shutdown/reconnect on macOS.

Repro:

  • go test ./agent -run TestAgent_ReconnectNoLifecycleReemit -count=50

Related issues:

  • None found after searching for the test name and log message in coder/internal.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions