Allow sst refresh to fail without blocking deploy#614
Conversation
sst refresh exits non-zero when it detects state changes, even on success. Adding continue-on-error so deploy still proceeds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| aws-region: ${{ env.AWS_REGION_INPUT }} | ||
|
|
||
| - name: Refresh SST state | ||
| continue-on-error: true |
There was a problem hiding this comment.
🔴 sst deploy proceeds unconditionally after sst refresh failure, risking production deployment on stale/corrupt state
Adding continue-on-error: true to the "Refresh SST state" step means that if sst refresh --stage production fails (e.g., due to state corruption, resource drift, or permission errors), the workflow silently continues to sst deploy --stage production at line 74. Since sst refresh synchronizes the state file with actual cloud infrastructure, deploying against a stale or inconsistent state can cause resource conflicts, duplicate resources, or failed deployments that are harder to recover from. The deploy step has no if: condition checking the refresh outcome — it runs regardless. At minimum, the refresh step should have an id and the deploy step should either gate on the refresh outcome or log a prominent warning.
Prompt for agents
In .github/workflows/deploy-web.yml, instead of unconditionally swallowing the sst refresh failure, give the refresh step an id and add a conditional warning + gating logic:
1. On line 61, add `id: refresh` to the "Refresh SST state" step (keep continue-on-error: true).
2. Before the "Deploy SST app" step (line 69), add a new step that checks the refresh outcome and emits a warning:
- name: Warn on refresh failure
if: steps.refresh.outcome == 'failure'
run: echo '::warning::SST refresh failed — deploying with potentially stale state'
3. Alternatively, if a refresh failure should block deployment in most cases, remove continue-on-error: true and instead add retry logic (e.g., using nick-fields/retry action) to handle transient failures while still failing on genuine state corruption.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
continue-on-error: trueto the refresh stepsst refreshexits non-zero when it detects state changes (even when it succeeds), which was blocking the deploy stepTest plan
🤖 Generated with Claude Code