Skip to content

test: validate graceful Claude API skip (dotCMS/core#35328)#460

Closed
sfreudenthaler wants to merge 2 commits intomainfrom
feat/35328-test-graceful-api-skip
Closed

test: validate graceful Claude API skip (dotCMS/core#35328)#460
sfreudenthaler wants to merge 2 commits intomainfrom
feat/35328-test-graceful-api-skip

Conversation

@sfreudenthaler
Copy link
Copy Markdown
Member

Summary

This PR tests the fix from dotCMS/ai-workflows@feat/35328-graceful-api-unavailability.

The Claude orchestrator now:

  1. Checks the Anthropic API availability before running
  2. Skips gracefully with a warning when the API returns 5xx or is unreachable
  3. Re-checks after runtime failures to distinguish service outages from real errors

What to verify

  • The claude-automatic-review and claude-rollback-safety-check jobs should run successfully
  • The pre-flight check step should show ✅ Claude API is available in the logs
  • If the API were down, the jobs would show a warning and pass (not fail)

Ref: dotCMS#35328

…esting

Test dotCMS/ai-workflows@feat/35328-graceful-api-unavailability which adds
pre-flight API availability check to skip Claude step gracefully when service
is down.

Ref: dotCMS#35328

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sfreudenthaler sfreudenthaler requested a review from a team as a code owner April 15, 2026 18:40
@github-actions
Copy link
Copy Markdown

❌ Issue Linking Required

This PR could not be linked to an issue. All PRs must be linked to an issue for tracking purposes.

How to fix this:

Option 1: Add keyword to PR body (Recommended - auto-removes this comment)
Edit this PR description and add one of these lines:

  • This PR fixes #123 or Fixes: #123

  • This PR closes #123 or Closes: #123

  • This PR resolves #123 or Resolves: #123

  • Other supported keywords: fix, fixed, close, closed, resolve, resolved
    Option 2: Link via GitHub UI (Note: won't clear the failed check)

  1. Go to the PR → Development section (right sidebar)

  2. Click "Link issue" and select an existing issue

  3. Push a new commit or re-run the workflow to clear the failed check
    Option 3: Use branch naming
    Create a new branch with one of these patterns:

  • 123-feature-description (number at start)

  • issue-123-feature-description (issue-number at start)

  • feature-issue-123 (issue-number anywhere)

Why is this required?

Issue linking ensures proper tracking, documentation, and helps maintain project history. It connects your code changes to the problem they solve.---

This comment was automatically generated by the issue linking workflow

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94e811290e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
)
uses: dotCMS/ai-workflows/.github/workflows/claude-orchestrator.yml@v2.0.0
uses: dotCMS/ai-workflows/.github/workflows/claude-orchestrator.yml@feat/35328-graceful-api-unavailability
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin orchestrator workflow to an immutable ref

Switching uses: from @v2.0.0 to the mutable feature branch makes these jobs depend on a temporary ref that can be force-pushed or deleted; if feat/35328-graceful-api-unavailability is removed, the reusable workflow reference cannot be resolved and the Claude jobs will stop starting. Use a released tag or commit SHA for stability (the same issue is repeated at lines 111 and 139).

Useful? React with 👍 / 👎.

sfreudenthaler added a commit to dotCMS/ai-workflows that referenced this pull request Apr 15, 2026
## Summary

- Add pre-flight API availability check before running
`claude-code-action`
- Skip the Claude step gracefully (warning, not failure) when the API
returns 5xx or is unreachable
- Belt-and-suspenders: `continue-on-error: true` + post-execution
re-check distinguishes service outages from legitimate errors

## Problem

When the Anthropic API is down, the Claude step fails with a hard error,
blocking the entire CI pipeline. Example: [dotCMS/core run
24461196854](https://github.com/dotCMS/core/actions/runs/24461196854)

```
API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}} · check status.claude.com
```

## Solution

Two layers of protection in `claude-executor.yml`:

**Layer 1 — Pre-flight check** (catches most outages):
- `curl` the `/v1/models` endpoint with a 15s timeout before running
Claude
- 5xx / network failures → `available=false` → skip Claude step → warn
and succeed
- Auth errors (401/403), rate limits (429) → `available=true` → proceed
so action can surface the specific error

**Layer 2 — Runtime protection** (catches mid-execution degradation):
- `continue-on-error: true` on the Claude step
- Post-execution step checks if Claude failed
- If failed AND API is now returning 500 → skip gracefully (service
issue)
- If failed AND API is now returning 200 → re-fail with "legitimate
error" message

## Test

Validated in `dotCMS/core-workflow-test#460`:
- Pre-flight check correctly passes when API is available
- `Handle Claude execution result` correctly re-fails for non-service
errors (workflow validation failure in test PR)
- The skip path is code-correct (would activate when API returns 5xx)

## Consumer repos to update after merge

- `dotCMS/core` — update `@v2.0.0` → new tag
- `dotCMS/core-workflow-test` — update `@v2.0.0` → new tag

Fixes: dotCMS/core#35328

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Update ai-workflows reference from v2.0.0 to v2.1.0 which adds
graceful handling when the Anthropic API is unavailable.

Ref: dotCMS#35328

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
github-merge-queue bot pushed a commit to dotCMS/core that referenced this pull request Apr 15, 2026
…ty skip (#35336)

## Summary

- Bump `ai-workflows` reference from `v2.0.0` → `v2.1.0` in
`ai_claude-orchestrator.yml`
- `v2.1.0` adds pre-flight Anthropic API availability check + runtime
error triage

## Problem

When the Claude service has an outage, all PR pipelines fail with:
```
API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}} · check status.claude.com
```

Example blocked run:
https://github.com/dotCMS/core/actions/runs/24461196854

## What changed in `v2.1.0` (`ai-workflows`)

**Pre-flight check (Layer 1):**
- Calls `GET /v1/models` before running claude-code-action (15s timeout,
2 retries)
- 5xx or network failure → `available=false` → skip Claude step
gracefully with `::warning::`
- 401/403/429 or other codes → proceed so the action surfaces the
specific error

**Runtime protection (Layer 2):**
- `continue-on-error: true` on the Claude step
- Post-execution step re-checks the API if Claude failed
- API available after failure → re-fail the job ("legitimate error")
- API unavailable after failure → skip gracefully ("service
degradation")

## Test

Validated in dotCMS#460 (also updated to `v2.1.0`):
- Pre-flight check correctly identifies API availability
- Legitimate errors still surface and fail the job (correct)
- Service outage path is code-correct (pre-flight would skip before
Claude runs)

Fixes #35328

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant