fix: handle stopped web app during deploy status tracking#7773
fix: handle stopped web app during deploy status tracking#7773
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes an azd deploy infinite polling scenario when deploying to an administratively stopped Linux App Service by detecting “RuntimeStarting with 0 instances” and short-circuiting after a small threshold, and adds an opt-out env var to skip runtime status tracking.
Changes:
- Add “no instances” detection + threshold logic to App Service zip deploy status tracking to prevent indefinite polling for stopped apps.
- Introduce
AZD_DEPLOY_{SERVICE_NAME}_SKIP_STATUS_CHECKto bypass runtime status tracking for a given App Service service. - Add unit tests and a functional sample to reproduce/validate the stopped-webapp scenario; update env var documentation.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/azsdk/zip_deploy_client.go | Adds no-instances threshold handling and richer status evaluation result for deploy polling. |
| cli/azd/pkg/azsdk/zip_deploy_client_test.go | Adds unit tests covering zero-instance detection, nil fields, and error cases. |
| cli/azd/pkg/azapi/webapp.go | Adds skipStatusCheck to optionally bypass runtime deploy status tracking. |
| cli/azd/pkg/azapi/azure_client_linuxwebapp_test.go | Extends tests to cover the skip-status-check path with basic zip deploy mocks. |
| cli/azd/pkg/project/service_target_appservice.go | Reads service-scoped env var to skip status tracking and plumbs the flag into deploy calls. |
| cli/azd/pkg/project/service_targets_coverage3_test.go | Adds tests validating env var name generation for the new skip-status-check variable. |
| cli/azd/docs/environment-variables.md | Documents the new AZD_DEPLOY_{SERVICE_NAME}_SKIP_STATUS_CHECK variable (and slot env var). |
| cli/azd/test/functional/testdata/samples/webapp-stopped/azure.yaml | Adds a functional test sample project targeting App Service. |
| cli/azd/test/functional/testdata/samples/webapp-stopped/src/app.py | Minimal Flask app for the new stopped-webapp functional sample. |
| cli/azd/test/functional/testdata/samples/webapp-stopped/src/requirements.txt | Python dependencies for the new functional sample. |
| cli/azd/test/functional/testdata/samples/webapp-stopped/infra/main.bicep | Subscription-scope infra for the new functional sample (RG + module). |
| cli/azd/test/functional/testdata/samples/webapp-stopped/infra/resources.bicep | Defines App Service plan + Linux web app resources for the sample. |
| cli/azd/test/functional/testdata/samples/webapp-stopped/infra/main.parameters.json | Parameters file wiring env name/location into the sample deployment. |
jongio
left a comment
There was a problem hiding this comment.
The tests cover the new paths well. One architectural question worth considering before merge:
The app object in DeployAppServiceZip already carries app.Properties.State (READ-ONLY on SiteProperties, values like Running / Stopped). That lets you detect a stopped app deterministically and skip DeployTrackStatus without the noInstancesThreshold heuristic or the new env var workaround. It addresses the root cause (a stopped app will never register instances) instead of inferring intent from the poll pattern.
If you go that route, the env var becomes an escape hatch for similar API anomalies rather than the primary fix. If you keep the current approach, the threshold of 3 needs justification, since 3 polls is roughly 9 seconds at the existing 3s interval and a healthy cold-starting container can briefly report 0 instances.
Also: the webapp-stopped testdata sample isn't referenced by any Go test in the repo. If it's only for manual repro, add a short README in the sample folder explaining that.
When deploying to a stopped Linux web app, azd deploy would poll
indefinitely showing 'Starting runtime process, 0 in progress instances,
0 successful instances'. This happened because the deployment status API
never transitions to RuntimeSuccessful when the app has no running
instances.
Changes:
- Detect 0 total instances during RuntimeStarting status and treat as
successful deployment after 3 consecutive polls (avoids false positives
from transient states)
- Add nil-pointer guards for instance counter fields in the status
response to prevent panics
- Add AZD_DEPLOY_{SERVICE_NAME}_SKIP_STATUS_CHECK env var to let users
opt out of runtime status tracking entirely (workaround for similar
issues)
- Add unit tests for the zero-instance detection, nil counters, skip
status check, and env var name generation
- Add webapp-stopped test sample for reproducing the issue
- Document new env var in environment-variables.md
Fixes #7708
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add nil guard for response.Properties/Status in poller.Done() branch - Add BuildSuccessful to allowed terminal states - Use t.Context() instead of context.Background() in new test - Fix 'the the' typo in bicep description - Simplify new(int32(0)) to new(int32) for zero values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename test to follow _Coverage3 suffix convention - Use 'response or its properties are empty' error message for nil properties/status to match resumeDeployment fallback handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d0808af to
617dca7
Compare
…istic - Replace noInstances polling heuristic with isAppStopped() check on app.Properties.State (addresses root cause directly) - Use strconv.ParseBool for SKIP_STATUS_CHECK env var so false/0 works - Add functional test Test_CLI_Deploy_StoppedWebApp - Add isAppStopped unit tests (Stopped, Running, NilState, NilProperties) - Add StoppedApp integration test in azure_client_linuxwebapp_test.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Simplified az CLI calls in Test_CLI_Deploy_StoppedWebApp and added //nolint:gosec directives since arguments come from test infrastructure, not user input. Removed unnecessary azd x run fallback pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The functional test Test_CLI_Deploy_StoppedWebApp uses az CLI calls (webapp stop/show) that bypass the recording proxy, so it cannot be replayed. Added playback skip guard and empty cassette file so the test is skipped in CI playback mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
jongio
left a comment
There was a problem hiding this comment.
Switching to isAppStopped(app) makes this deterministic, no heuristic timing window to defend. The functional test exercises the real path (provision, az webapp stop, deploy), and strconv.ParseBool on the env var handles the false/0 case correctly. Nice follow-through on the iteration.
LGTM.
wbreza
left a comment
There was a problem hiding this comment.
Looks good, @vhvb1989 — the pivot to deterministic isAppStopped(app) per @jongio''s feedback is the right call, nil guards are correct, the BuildSuccessful terminal-status addition is a nice adjacent fix, and the env-var escape hatch is documented in docs/environment-variables.md. Approving.
A few follow-up items that aren''t blockers for this PR but worth tracking — happy to file issues if helpful:
🟡 Follow-ups (not blocking)
- Other non-running states.
isAppStoppedmatches onlyState == "Stopped". App Service can also reportSuspended(free-tier quota) andStoppingSite/StartingSitetransients — those would still re-enter the infinite-poll trap. The env-var escape hatch is the only remedy today. - Slot deploys bypass the check.
DeployAppServiceSlotZip(pkg/azapi/webapp.go:333) is a separate code path that doesn''t callisAppStoppedand doesn''t receiveskipStatusCheck. Deploying to a stopped slot will hang with the same symptom. Scoping to "main site only" is reasonable for this fix; just wanted to flag the gap. - Parent-vs-slot state mismatch.
isAppStopped(app)reads the root site''sState, not the target slot''s. Related to the slot item above. - Post-deploy UX when app is stopped. Today we log "Web app is stopped — skipping runtime deployment status tracking" and then return success. A user who forgot they stopped the app could see a green deploy and wonder why nothing is serving the new code. Consider either (a) appending a one-line follow-up ("To serve the deployed code, start the app:
az webapp start --name ... --resource-group ...") or (b) surfacing this inServiceDeployResult.OperationDisplayMessages. - Env-var opt-in has no user-visible signal. If someone sets
AZD_DEPLOY_{SERVICE}_SKIP_STATUS_CHECK=trueon a running app, the deploy silently skips status tracking with no log. A one-line "Skipping runtime status tracking (AZD_DEPLOY_…_SKIP_STATUS_CHECK set)" would improve diagnosability.
🟢 Minor
deploymentStatusResultis a single-field struct today — vestigial from the earlier heuristic design. Keep if you''re expecting to add fields soon; otherwise it could revert toerror.Test_CLI_Deploy_StoppedWebAppis live-only (cassette is a shim;t.Skipin playback). Consistent with repo convention, so no action — just noting that the functional regression for #7708 relies on the existing unit coverage (azure_client_linuxwebapp_test.go,zip_deploy_client_test.go) for CI, which looks solid.
⚠️ CI
azure-dev - cli Linux leg showed a failure (1h28m) while Mac/Windows/ARM64 are green — almost certainly a flake but worth a re-run before merging.
Nice work on #7708.
Problem
When deploying to a stopped Linux web app,
azd deploypolls indefinitely showing:This causes CI/CD pipelines to time out.
Root Cause
The deployment status API (
DeployTrackStatus) returnsRuntimeStartingwith 0 total instances for a stopped app, but the code treats this as "keep polling". The poller never transitions toRuntimeSuccessful, causing an infinite loop.Fix
Core Fix
In
logWebAppDeploymentStatus(), detect when status isRuntimeStartingand total instances (in-progress + successful + failed) == 0. After 3 consecutive polls with 0 instances (to avoid false positives from transient states), treat the deployment as successful since the zip upload already completed.Env Var Workaround
Added
AZD_DEPLOY_{SERVICE_NAME}_SKIP_STATUS_CHECKenv var to let users opt out of runtime status tracking entirely. This provides an immediate workaround for users affected by this or similar issues.Additional Improvements
logWebAppDeploymentStatusreturn type to a struct for richer signalingTesting
webapp-stoppedtest sample intest/functional/testdata/samples/for reproducing the issueaz webapp stop, confirmed the bug, then verified the fix resolves itDocumentation
cli/azd/docs/environment-variables.mdwith the new env varFixes #7708