Skip to content

flake: TestTools/GetTaskLogs (ByUUID, ByIdentifier) #1111

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/18962366101
Workflow Job: test-go-pg (macos-latest)
Commit: 9298e7e073970011f7711cf37ffce5c5defe1a8d by Jake Howell
When: 2025-10-31 04:21 UTC

What failed

  • Package: github.com/coder/coder/v2/codersdk/toolsdk
  • Tests: TestTools/GetTaskLogs subtests [ByUUID, ByIdentifier]

Evidence from logs

=== FAIL: codersdk/toolsdk TestTools/GetTaskLogs/ByUUID (0.86s)
    toolsdk_test.go:1570:
        Error Trace: /Users/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1570
        Error:        Received unexpected error:
                      get task logs "7218018d-547e-4d68-9793-1799c19d69e1":
                          github.com/coder/coder/v2/codersdk/toolsdk.init.func32
                              /Users/runner/work/coder/coder/codersdk/toolsdk/toolsdk.go:2134
                        - GET http://127.0.0.1:59633/api/experimental/tasks/nervous-tesla6-bvG/7218018d-547e-4d68-9793-1799c19d69e1/logs: unexpected status code 400: Task status must be active.
                          Error: Task status is "initializing", it must be "active" to interact with the task.

Classification

  • Type: Flaky test (timing-dependent; task still initializing when logs are requested)
  • Not a matrix cancellation artifact (only macOS job failed; Windows succeeded)
  • No data race indicators (no "WARNING: DATA RACE" present)
  • No panic/OOM signatures

Precise assignment analysis (Test function blame)

  • Failing test block: toolsdk_test.go “GetTaskLogs” table tests around lines ~1493–1580.
    Commands used:
    • grep -n "GetTaskLogs" codersdk/toolsdk/toolsdk_test.go
    • git blame -L 1493,1580 codersdk/toolsdk/toolsdk_test.go
  • Recent modification to this exact test section: a1fa58ac17c4 (2025-10-28) "fix: update dbgen and dbfake task creation and toolsdk test fixtures" by Mathias Fredriksson (refactor to use dbfake.WithTask and new task model). This is the last change touching the failing lines.
  • Assigning to: @mafredri (last modifier of the failing test function lines per blame).

Root cause hypothesis

  • Server endpoint for task logs correctly enforces that task status is "active" before interaction.
  • The test appears to create a task and request logs immediately after agent connect, without waiting for the task to progress from "initializing" to "active", causing intermittent 400s.
  • Recent changes to task creation/linking in tests (switch to dbfake.WithTask and explicit task IDs) may have narrowed timing margins, increasing likelihood of hitting the initializing window.

Related issues

Proposed fix

  • In TestTools/GetTaskLogs, wait/poll for the created task to reach status "active" before requesting logs. Alternatively, the test can relax timing by ensuring the agent-side task app is ready before asserting on logs.

Reproduction hints

  • Re-run on macOS runner:
    • go test ./codersdk/toolsdk -run "TestTools/GetTaskLogs" -count=50
  • Expect occasional failures with the 400 "Task status must be active" response if no readiness wait is added.

Quality checklist

  • Used grep to identify failing test and lines; blame points to last modifier of the failing function
  • Searched for data race/panic/OOM (none)
  • Searched coder/internal (open/closed) for duplicates: "GetTaskLogs", error text (none found)
  • Assignment based on test ownership (blame), not CI run author

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions