Skip to content

fix(app-platform): handle ActiveDeployment.Phase transition states (#48)#49

Merged
intel352 merged 1 commit intomainfrom
fix/health-poll-active-slot-phase
May 2, 2026
Merged

fix(app-platform): handle ActiveDeployment.Phase transition states (#48)#49
intel352 merged 1 commit intomainfrom
fix/health-poll-active-slot-phase

Conversation

@intel352
Copy link
Copy Markdown
Contributor

@intel352 intel352 commented May 2, 2026

Summary

Closes #48. Fixes appHealthResult false-negative when the ActiveDeployment slot is populated but its Phase has not yet transitioned to Active.

Bug

Surfaced in core-dump staging deploy 25258558716: 1-second transition from "deployment in progress: DEPLOYING" to "no deployment found" → 10-minute timeout instead of detecting the eventually-healthy deployment.

When DO promotes a deployment from InProgressDeployment to ActiveDeployment, there is a window where InProgressDeployment is nil but ActiveDeployment.Phase is still Deploying (or another transitioning phase). The previous code fell through all checks and returned "no deployment found" in that window.

Fix

When ActiveDeployment is populated but Phase != Active, classify the deployment by phase (mirrors the existing InProgressDeployment switch logic). The "no deployment found" terminal branch only fires when all three slots (ActiveDeployment, InProgressDeployment, PendingDeployment) are nil.

Test plan

  • go test ./internal/drivers/ -run TestAppHealth — 4 new tests cover the active-slot transition states + priority-vs-InProgress regression
  • go test ./... — full suite green (159 tests across 3 packages)
  • Validates post-merge against core-dump staging deploy via PR-D3 lockfile bump chain

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

When DO promotes a deployment from the InProgressDeployment slot to the
ActiveDeployment slot, there is a window where InProgressDeployment is
nil but ActiveDeployment.Phase is still transitioning (e.g. Deploying,
PendingDeploy) before reaching Active. The previous appHealthResult
returned "no deployment found" in that window, locking polling callers
into a false-negative loop until timeout.

Surfaced during core-dump staging deploy (run 25258558716): 1-second
transition from "deployment in progress: DEPLOYING" to "no deployment
found" → 10-minute timeout instead of healthy completion.

Fix: when ActiveDeployment is populated but Phase != Active, classify
based on the phase using the same priority order as the InProgress
switch. Building/PendingDeploy/Deploying → in-progress (active slot);
Error/Canceled/Superseded → failed (active slot). The "no deployment
found" branch only fires when all three slots (Active, InProgress,
Pending) are nil.

Tests cover the four scenarios:
- ActiveDeployment with Deploying phase → in-progress
- ActiveDeployment with Error phase → failed
- ActiveDeployment with Active phase → healthy (regression)
- Active(Active) + InProgress(Deploying) priority → healthy

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 18:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes App Platform health-check classification during a DigitalOcean deployment slot transition where ActiveDeployment is populated before its phase becomes Active, preventing false "no deployment found" results and long health-check timeouts. It updates the shared deployment-slot health logic used by App Platform resources and adds regression tests around the transition window described in issue #48.

Changes:

  • Teach appHealthResult to classify non-Active ActiveDeployment phases as in-progress, failed, or unknown instead of falling through to "no deployment found".
  • Preserve the existing healthy-path precedence when ActiveDeployment.Phase == Active, even if InProgressDeployment is also present.
  • Add App Platform regression tests covering active-slot deploying, error, healthy, and precedence scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
internal/drivers/app_platform.go Extends shared App Platform/API Gateway health evaluation to handle transitional ActiveDeployment phases.
internal/drivers/app_platform_test.go Adds regression tests for active-slot transition behavior and priority handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +947 to +963
// TestAppHealthResult_ActiveSlotErrorPhase covers terminal failure when the
// ActiveDeployment slot is populated with a failed phase.
func TestAppHealthResult_ActiveSlotErrorPhase(t *testing.T) {
d := drivers.NewAppPlatformDriverWithClient(&mockAppClient{
app: appWithPhases(phasePtr(godo.DeploymentPhase_Error), nil, nil),
}, "nyc3")
result, err := d.HealthCheck(context.Background(), interfaces.ResourceRef{Name: "phased-app", ProviderID: "f8b6200c-3bba-48a7-8bf1-7a3e3a885eb5"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if result.Healthy {
t.Error("expected Healthy=false for ActiveDeployment.Phase=Error")
}
if !strings.Contains(result.Message, "failed") {
t.Errorf("message should contain 'failed', got: %q", result.Message)
}
}
Comment on lines +251 to +275
if dep := app.ActiveDeployment; dep != nil {
switch dep.Phase {
case godo.DeploymentPhase_PendingBuild,
godo.DeploymentPhase_Building,
godo.DeploymentPhase_PendingDeploy,
godo.DeploymentPhase_Deploying:
return &interfaces.HealthResult{
Healthy: false,
Message: fmt.Sprintf("deployment in progress (active slot): %s", dep.Phase),
}
case godo.DeploymentPhase_Error,
godo.DeploymentPhase_Canceled,
godo.DeploymentPhase_Superseded:
return &interfaces.HealthResult{
Healthy: false,
Message: fmt.Sprintf("deployment failed (active slot): %s", dep.Phase),
}
default:
// Forward-compat: a future godo release may add new phases.
// Report "unknown" rather than "failed" to avoid mislabeling.
return &interfaces.HealthResult{
Healthy: false,
Message: fmt.Sprintf("unknown phase (active slot): %s", dep.Phase),
}
}
Comment on lines +246 to +251
// 2a. ActiveDeployment populated but Phase not yet Active — covers the
// post-promotion-pre-active window where DO has moved the deployment out of
// the InProgressDeployment slot but its Phase is still transitioning toward
// Active. Without this check, the polling loop falls through to "no
// deployment found" and loops until timeout (issue #48).
if dep := app.ActiveDeployment; dep != nil {
Comment on lines +915 to +926
// ── Active-slot phase-transition tests (issue #48) ───────────────────────────
//
// When DO promotes a deployment from InProgressDeployment to ActiveDeployment,
// there is a window where InProgressDeployment is nil but
// ActiveDeployment.Phase is still transitioning (e.g. Deploying, PendingDeploy)
// before reaching Active. The previous appHealthResult returned "no deployment
// found" in that window, locking polling callers into a false-negative loop.

// TestAppHealthResult_ActiveSlotDeployingPhase covers the core bug: ActiveDeployment
// with Deploying phase and no InProgressDeployment must return in-progress, not
// "no deployment found".
func TestAppHealthResult_ActiveSlotDeployingPhase(t *testing.T) {
@intel352 intel352 merged commit d826aed into main May 2, 2026
7 checks passed
@intel352 intel352 deleted the fix/health-poll-active-slot-phase branch May 2, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AppPlatformDriver health-poll: 'no deployment found' false-negative during InProgress→Active transition

2 participants