Detect active deployments before provisioning#7251
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a pre-deployment check that detects in-progress ARM deployments at the target scope and waits for them to complete, avoiding DeploymentActive failures during provisioning.
Changes:
- Introduces
waitForActiveDeployments()in the Bicep provisioning flow and polls until deployments clear or a timeout is reached. - Adds
IsActiveDeploymentState()plus new tests to classify which provisioning states are considered “active”. - Extends
infra.ScopewithListActiveDeployments()and adds aDeploymentActiveerror suggestion rule.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/resources/error_suggestions.yaml | Adds a user-facing suggestion for DeploymentActive ARM errors. |
| cli/azd/pkg/infra/scope.go | Extends scope interface + implements ListActiveDeployments() for RG and subscription scopes. |
| cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go | Adds wait loop before deployment submission, with polling/timeout defaults. |
| cli/azd/pkg/infra/provisioning/bicep/bicep_provider_test.go | Updates mocked scope to satisfy the new Scope interface. |
| cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go | Adds tests covering wait-loop behavior, errors, cancellation, and timeout. |
| cli/azd/pkg/azapi/deployments.go | Adds IsActiveDeploymentState() helper. |
| cli/azd/pkg/azapi/deployment_state_test.go | Adds unit tests for active/inactive state classification. |
f456f95 to
8829c13
Compare
Telemetry Context: DeploymentActive + Retry BehaviorThis PR addresses Retry behavior makes this especially valuableOf machines that hit
Time savings
This is a clean win — the fix is architecturally simple (poll + wait) and eliminates a category of failure that can never be solved by retrying. |
8829c13 to
38516a8
Compare
|
@copilot - will you check to ensure we have metrics coverage so we can see how often this error and fix occur after merging this change? |
|
@kristenwomack I've opened a new pull request, #7288, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
Converting to draft — the active deployment check integration point in Need to either:
The standalone tests for |
3732718 to
f1f8bfc
Compare
|
/azp run |
|
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
Before starting a Bicep deployment, check the target scope for in-progress ARM deployments and wait for them to complete. This avoids the DeploymentActive error that ARM returns after ~5 minutes when a concurrent deployment is already running on the same resource group. Changes: - Add IsActiveDeploymentState() helper in azapi to classify provisioning states as active or terminal. - Add ListActiveDeployments() to the infra.Scope interface and both ResourceGroupScope / SubscriptionScope implementations. - Add waitForActiveDeployments() in the Bicep provider, called after preflight validation and before deployment submission. It polls until active deployments clear or a 30-minute timeout is reached. - Add a DeploymentActive error suggestion rule to error_suggestions.yaml. - Add unit tests for state classification, polling, timeout, error handling, and context cancellation. Fixes #7248 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…per, refresh timeout names - Fix 'range 200' compile error (not valid in all Go versions) - Make DeploymentActive YAML rule scope-agnostic - Extract filterActiveDeployments helper to deduplicate scope logic - Refresh deployment names from latest poll on timeout message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move ListActiveDeployments to a standalone function instead of adding it to the exported Scope interface. Adding methods to exported interfaces is a breaking change for any external implementation (including test mocks in CI). The standalone infra.ListActiveDeployments(ctx, scope) function calls scope.ListDeployments and filters for active states, achieving the same result without widening the interface contract. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The agent rewrote the entire YAML file causing a 925-line diff that broke CI tests. Reset to main and add only the DeploymentActive rule (11 lines) as intended. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deployment object returned by generateDeploymentObject embeds a Scope that can be nil in test environments (e.g. mockedScope returns an empty SubscriptionDeployment). Using scopeForTemplate resolves the scope from the provider's configuration, avoiding nil panics in existing tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f1f8bfc to
74a9edb
Compare
Add lessons learned from recent PR reviews (#7290, #7251, #7250, #7247, #7236, #7235, #7202, #7039) as agent instructions to prevent recurring review findings. New sections: - Error handling: ErrorWithSuggestion completeness, telemetry service attribution, scope-agnostic messages - Architecture boundaries: pkg/project target-agnostic, extension docs - Output formatting: shell-safe paths, consistent JSON contracts - Path safety: traversal validation, quoted paths in messages - Testing best practices: test actual rules, extract shared helpers, correct env vars, TypeScript patterns, efficient dir checks - CI/GitHub Actions: permissions, PATH handling, artifact downloads, prefer ADO for secrets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add lessons learned from team and Copilot reviews across PRs #7290, #7251, #7250, #7247, #7236, #7235, #7202, #7039 as agent instructions to prevent recurring review findings. New/expanded sections: - Error handling: ErrorWithSuggestion field completeness, telemetry service attribution, scope-agnostic messages, link/suggestion parity, stale data in polling loops - Architecture boundaries: pkg/project target-agnostic, extension docs separation, env var verification against source code - Output formatting: shell-safe quoted paths, consistent JSON types - Path safety: traversal validation, quoted paths in messages - Code organization: extract shared logic across scopes - Documentation standards: help text consistency, no dead references, PR description accuracy - Testing best practices: test YAML rules e2e, extract shared helpers, correct env vars (AZD_FORCE_TTY, NO_COLOR), TypeScript patterns, reasonable timeouts, cross-platform paths, test new JSON fields - CI / GitHub Actions: permissions blocks, PATH handling, cross-workflow artifacts, prefer ADO for secrets, no placeholder steps Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run azure-dev - cli |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Summary
Fixes #7248
Before starting a deployment, azd now checks for active deployments on the target scope. If another deployment is in progress, it warns the user and waits for it to complete — avoiding the
DeploymentActiveARM error that wastes ~5 minutes of the user's time.Telemetry Context
DeploymentActivefailures in March (~270/month projected)provision, 19% fromupChanges
Pre-deployment active check (
bicep_provider.go)Added
waitForActiveDeployments()between preflight validation and deployment submission:ErrDeploymentsNotFound(scope doesn't exist yet); other errors propagateActive state classification (
deployments.go)IsActiveDeploymentState()classifies 11 provisioning states as active, including transitional states (Canceling,Deleting,DeletingResources,UpdatingDenyAssignments) that can still block new deployments.Scope interface (
scope.go)Added
ListActiveDeployments()to bothResourceGroupScopeandSubscriptionScope.Error suggestion (
error_suggestions.yaml)Added
DeploymentActiverule with user-friendly message and ARM troubleshooting link.Test Coverage (8 tests, 24 subtests)
TestIsActiveDeploymentStateTestWaitForActiveDeployments_NoActiveTestWaitForActiveDeployments_InitialListError_NotFoundTestWaitForActiveDeployments_InitialListError_OtherTestWaitForActiveDeployments_ActiveThenClearTestWaitForActiveDeployments_CancelledContextTestWaitForActiveDeployments_PollErrorTestWaitForActiveDeployments_TimeoutRelated