flake: TestTools/GetTaskLogs (ByUUID, ByIdentifier)

CI Run Link: https://github.com/coder/coder/actions/runs/18962366101
Workflow Job: test-go-pg (macos-latest)
Commit: 9298e7e073970011f7711cf37ffce5c5defe1a8d by Jake Howell
When: 2025-10-31 04:21 UTC

What failed
- Package: github.com/coder/coder/v2/codersdk/toolsdk
- Tests: TestTools/GetTaskLogs subtests [ByUUID, ByIdentifier]

Evidence from logs
```
=== FAIL: codersdk/toolsdk TestTools/GetTaskLogs/ByUUID (0.86s)
    toolsdk_test.go:1570:
        Error Trace: /Users/runner/work/coder/coder/codersdk/toolsdk/toolsdk_test.go:1570
        Error:        Received unexpected error:
                      get task logs "7218018d-547e-4d68-9793-1799c19d69e1":
                          github.com/coder/coder/v2/codersdk/toolsdk.init.func32
                              /Users/runner/work/coder/coder/codersdk/toolsdk/toolsdk.go:2134
                        - GET http://127.0.0.1:59633/api/experimental/tasks/nervous-tesla6-bvG/7218018d-547e-4d68-9793-1799c19d69e1/logs: unexpected status code 400: Task status must be active.
                          Error: Task status is "initializing", it must be "active" to interact with the task.
```

Classification
- Type: Flaky test (timing-dependent; task still initializing when logs are requested)
- Not a matrix cancellation artifact (only macOS job failed; Windows succeeded)
- No data race indicators (no "WARNING: DATA RACE" present)
- No panic/OOM signatures

Precise assignment analysis (Test function blame)
- Failing test block: toolsdk_test.go “GetTaskLogs” table tests around lines ~1493–1580.
  Commands used:
  - grep -n "GetTaskLogs" codersdk/toolsdk/toolsdk_test.go
  - git blame -L 1493,1580 codersdk/toolsdk/toolsdk_test.go
- Recent modification to this exact test section: a1fa58ac17c4 (2025-10-28) "fix: update dbgen and dbfake task creation and toolsdk test fixtures" by Mathias Fredriksson (refactor to use dbfake.WithTask and new task model). This is the last change touching the failing lines.
- Assigning to: @mafredri (last modifier of the failing test function lines per blame).

Root cause hypothesis
- Server endpoint for task logs correctly enforces that task status is "active" before interaction.
- The test appears to create a task and request logs immediately after agent connect, without waiting for the task to progress from "initializing" to "active", causing intermittent 400s.
- Recent changes to task creation/linking in tests (switch to dbfake.WithTask and explicit task IDs) may have narrowed timing margins, increasing likelihood of hitting the initializing window.

Related issues
- Possibly related to earlier tasks tools flake fixed here: https://github.com/coder/internal/issues/1103 (DeleteTask variant) which was resolved by a change from the same author/area.

Proposed fix
- In TestTools/GetTaskLogs, wait/poll for the created task to reach status "active" before requesting logs. Alternatively, the test can relax timing by ensuring the agent-side task app is ready before asserting on logs.

Reproduction hints
- Re-run on macOS runner:
  - `go test ./codersdk/toolsdk -run "TestTools/GetTaskLogs" -count=50`
- Expect occasional failures with the 400 "Task status must be active" response if no readiness wait is added.

Quality checklist
- Used grep to identify failing test and lines; blame points to last modifier of the failing function
- Searched for data race/panic/OOM (none)
- Searched coder/internal (open/closed) for duplicates: "GetTaskLogs", error text (none found)
- Assignment based on test ownership (blame), not CI run author


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flake: TestTools/GetTaskLogs (ByUUID, ByIdentifier) #1111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flake: TestTools/GetTaskLogs (ByUUID, ByIdentifier) #1111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions