Skip to content

flake: doc-check task monitoring timeout #1469

@flake-investigator

Description

@flake-investigator

CI Run Link: https://github.com/coder/coder/actions/runs/24457555405/job/71462305765?pr=24382

Failing job:

  • Workflow: AI Documentation Check
  • Job: Analyze PR for Documentation Updates Needed
  • Completed at: 2026-04-15T13:46:09Z (from job logs; same day as Slack alert)
  • Run attempt: 1 (not a matrix cancellation)

Commit info:

  • SHA: 6ee9572e63a95a0260430bf57fba0a950bdbbc85
  • Author: Atif Ali (me@matifali.dev)
  • Message: docs(ai-gateway): use full env var name for copilot base URL override

Failure evidence:

[0s] Waiting for task status...
...
##[error]Task monitoring timed out after 600s

Additional errors during cleanup/log fetch:

Encountered an error running "coder task logs"
error: Trace=[resolve task "doc-check-24382": ]
Resource not found or you do not have access to this resource
...
Encountered an error running "coder task delete"
error: Trace=[resolve task "doc-check-24382": ]
Resource not found or you do not have access to this resource

What failed / root cause classification:

  • Infrastructure: The doc-check workflow creates or reuses a Coder task and then waits up to 10 minutes for completion.
  • The task status polling never observed a completion message and timed out after 600s.
  • Follow-up calls to fetch logs/delete the task returned “Resource not found or you do not have access to this resource”, suggesting the task disappeared or the task service/API was unavailable mid-run.

Relevant workflow logic:

  • .github/workflows/doc-check.yamlWait for Task Completion step uses coder task status polling and fails after 600s.

Duplicate search (coder/internal):

  • Searched: "doc-check" "Task monitoring timed out", "create-task-action" timeout, "coder task" timeout, "Resource not found" "task logs" — no matches.

Assignment analysis:

  • The failure is in the doc-check workflow’s task monitoring step, not a test.
  • Recent relevant changes to .github/workflows/doc-check.yaml are by DevCats and Atif Ali; the latest workflow ownership change with CI permissions is a21f00d (Atif Ali).
  • Assigning to the most recent meaningful modifier of the doc-check workflow who is an org member.

Suggested next steps:

  • Verify task service health during CI runs and whether coder task status may intermittently return invalid JSON or stall.
  • Consider increasing timeout or adding retry/backoff for task status polling and better handling when the task is missing.

Reproduction:

  • Re-run the AI Documentation Check workflow for PR #24382 (or manually trigger workflow_dispatch).

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions