Skip to content

feat(ci): add diagnose command#506

Merged
121watts merged 7 commits into
mainfrom
watts/dep-4264-ci-diagnose-cli
May 13, 2026
Merged

feat(ci): add diagnose command#506
121watts merged 7 commits into
mainfrom
watts/dep-4264-ci-diagnose-cli

Conversation

@121watts
Copy link
Copy Markdown
Contributor

@121watts 121watts commented May 8, 2026

Summary

The CLI keeps the human command as depot ci diagnose: pass a failed run, workflow, job, or attempt id and it renders the API's FailureDiagnosis response into a readable triage summary.

Review guide

This looks larger than it is because generated protobuf/OpenAPI output is included.

Please skip generated files.

Human review surface:

  • CLI behavior: pkg/cmd/ci/diagnose.go

What was happening

Users could get logs and summaries, but the CLI did not have a first-class way to ask which failure mattered. Matrix jobs made that worse because several attempts could fail or cancel for the same underlying reason, and repeating every representative drill-down again in a footer made large diagnoses noisy.

What happens now

  • Adds depot ci diagnose <id> with automatic target resolution and --output json.
  • Keeps a hidden --type run|workflow|job|attempt disambiguation flag for rare ID collisions.
  • Calls the API failure diagnosis endpoint and renders grouped failures, representative attempts, relevant log lines, and contextual follow-up depot ci logs / depot ci summary commands.
  • Avoids repeating representative drill-down commands in a giant Next commands footer; text output keeps those commands under the representative attempt where they are useful.
  • Uses explicit representative wording like Showing 3 of 7 similar attempts for this group. so bounded output looks intentional instead of incomplete.

Sample Response

Target: workflow 2d2xsk1rtq (failed)
Failure groups: 2

Group 1: [Errno -2] Name or service not known
  2 failures
  Where: Step 24 (Set up databases (slow path - run migrations)): script exited with code 1

  Diagnosis:
    The Django `setup_test_environment` management command ... could not resolve the PostgreSQL
    database hostname...

  Possible fix:
    Ensure the PostgreSQL service container ... is defined and running...

  Attempts:
    - #1 dq1t0hmhn2  Node.js Tests (2/3) (shard=2) (failed)
      Logs: depot ci logs dq1t0hmhn2 --org cl0wyyk6k39487ebgraxasinja
      View: https://depot.dev/orgs/...
    - #1 cdtlqs7fjp  Node.js Tests (3/3) (shard=3) (failed)
      Logs: depot ci logs cdtlqs7fjp --org cl0wyyk6k39487ebgraxasinja
      View: https://depot.dev/orgs/...

  Evidence:
    - ... django.db.utils.OperationalError: [Errno -2] Name or service not known
    - ... psycopg.OperationalError: [Errno -2] Name or service not known

Validation

  • make generate
  • go test ./pkg/cmd/ci ./pkg/api
  • go test ./pkg/cmd/ci -run Diagnose
  • go test ./pkg/cmd/ci
  • go test ./...
  • git diff --check
  • go build -o bin/depot-dev ./cmd/depot
  • make bin/depot
  • Live smoke via DEPOT_API_URL=http://localhost:18080 ./bin/depot ci diagnose h5753524rz --org cl0wyyk6k39487ebgraxasinja
  • Live smoke via DEPOT_API_URL=http://127.0.0.1:18080 ./bin/depot-dev ci diagnose cvhm9dnf32: next_commands=0, logs=13, summaries=0

Depends on https://github.com/depot/api/pull/3656.
Linear: https://linear.app/depot/issue/DEP-4264/ci-diagnose-failure-triage-for-runs-workflows-jobs-and-attempts


Note

Medium Risk
Adds a new CI API wrapper and a sizable new depot ci diagnose command with custom text/JSON rendering logic; primary risk is output/UX correctness and handling of bounded/truncated API responses, but it doesn’t touch auth flows beyond existing token/org headers.

Overview
Adds a new CIGetFailureDiagnosis API wrapper and wires it into a new depot ci diagnose subcommand for diagnosing failed runs/workflows/jobs/attempts.

The command validates exactly one target selector, calls the failure-diagnosis endpoint, and renders either human-readable triage output (grouped/focused/over-limit/empty states, representative attempts, evidence, and drill-down commands) or normalized JSON (stringified enums, bounds/capabilities, and machine-safe argv).

Includes comprehensive tests for the new API wrapper and CLI formatting/edge-cases (org flag propagation, omission of unavailable summary commands, evidence filtering, and truncation/bounds behavior).

Reviewed by Cursor Bugbot for commit ba30e6f. Bugbot is set up for automated code reviews on this repo. Configure here.

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 8, 2026

DEP-4264

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: TruncatedContextFields serializes as null instead of empty array
    • Changed line 637 to use make([]string, 0, len(...)) to guarantee a non-nil slice that serializes as [] instead of null, matching the nil-context guard and other array fields.

Create PR

Or push these changes by commenting:

@cursor push 36e7c44691
Preview (36e7c44691)
diff --git a/pkg/cmd/ci/diagnose.go b/pkg/cmd/ci/diagnose.go
--- a/pkg/cmd/ci/diagnose.go
+++ b/pkg/cmd/ci/diagnose.go
@@ -613,6 +613,8 @@
 	if context == nil {
 		return diagnoseContextJSON{TruncatedContextFields: []string{}}
 	}
+	truncatedFields := make([]string, 0, len(context.GetTruncatedContextFields()))
+	truncatedFields = append(truncatedFields, context.GetTruncatedContextFields()...)
 	return diagnoseContextJSON{
 		RunID:                  context.GetRunId(),
 		Repo:                   context.GetRepo(),
@@ -634,7 +636,7 @@
 		Attempt:                context.GetAttempt(),
 		AttemptStatus:          context.GetAttemptStatus(),
 		AttemptConclusion:      context.GetAttemptConclusion(),
-		TruncatedContextFields: append([]string(nil), context.GetTruncatedContextFields()...),
+		TruncatedContextFields: truncatedFields,
 	}
 }

You can send follow-ups to the cloud agent here.

Comment thread pkg/cmd/ci/diagnose.go Outdated
@121watts 121watts force-pushed the watts/dep-4264-ci-diagnose-cli branch from 89f292c to 3551bb9 Compare May 8, 2026 11:24
@121watts 121watts changed the title Add ci diagnose command feat(ci): add diagnose command May 8, 2026
@121watts 121watts force-pushed the watts/dep-4264-ci-diagnose-cli branch from 3551bb9 to 180881c Compare May 8, 2026 11:45
@121watts 121watts changed the title feat(ci): add diagnose command feat(ci): add command May 8, 2026
@121watts 121watts changed the title feat(ci): add command feat(ci): add diagnose command May 8, 2026
@121watts 121watts force-pushed the watts/dep-4264-ci-diagnose-cli branch from b1fd5bb to 8624f9c Compare May 8, 2026 13:55
@121watts 121watts force-pushed the watts/dep-4264-ci-diagnose-cli branch from 8624f9c to 8ebec37 Compare May 8, 2026 15:05
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Empty-string conflation suppresses differing representative errors
    • Added non-empty check to else-if condition at line 305 to distinguish between 'no errors' and 'different errors', preventing suppression of differing representative error messages.

Create PR

Or push these changes by commenting:

@cursor push 872e3d894f
Preview (872e3d894f)
diff --git a/pkg/cmd/ci/diagnose.go b/pkg/cmd/ci/diagnose.go
--- a/pkg/cmd/ci/diagnose.go
+++ b/pkg/cmd/ci/diagnose.go
@@ -302,7 +302,7 @@
 		if representativeError != "" && representativeError != group.GetErrorMessage() {
 			fmt.Fprintf(w, "  Where: %s\n", representativeError)
 			showRepresentativeErrors = false
-		} else if representativeError == group.GetErrorMessage() {
+		} else if representativeError != "" && representativeError == group.GetErrorMessage() {
 			showRepresentativeErrors = false
 		}
 		if group.GetDiagnosis() != "" {

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 6d1310f. Configure here.

Comment thread pkg/cmd/ci/diagnose.go
@121watts 121watts merged commit f46e4c3 into main May 13, 2026
11 checks passed
@121watts 121watts deleted the watts/dep-4264-ci-diagnose-cli branch May 13, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants