Skip to content

Detect active deployments before provisioning#7251

Draft
spboyer wants to merge 11 commits intomainfrom
fix/deployment-active-conflict
Draft

Detect active deployments before provisioning#7251
spboyer wants to merge 11 commits intomainfrom
fix/deployment-active-conflict

Conversation

@spboyer
Copy link
Member

@spboyer spboyer commented Mar 23, 2026

Summary

Fixes #7248

Before starting a deployment, azd now checks for active deployments on the target scope. If another deployment is in progress, it warns the user and waits for it to complete — avoiding the DeploymentActive ARM error that wastes ~5 minutes of the user's time.

Telemetry Context

  • 199 DeploymentActive failures in March (~270/month projected)
  • Average wait before failure: 5.3 minutes (P90: 12.2 min)
  • 78% from provision, 19% from up

Changes

Pre-deployment active check (bicep_provider.go)

Added waitForActiveDeployments() between preflight validation and deployment submission:

  • Lists deployments filtered for active provisioning states
  • If found: warns with deployment names, polls at 30s intervals
  • Timeout: 30 minutes (matches typical long deployments)
  • Only ignores ErrDeploymentsNotFound (scope doesn't exist yet); other errors propagate

Active state classification (deployments.go)

IsActiveDeploymentState() classifies 11 provisioning states as active, including transitional states (Canceling, Deleting, DeletingResources, UpdatingDenyAssignments) that can still block new deployments.

Scope interface (scope.go)

Added ListActiveDeployments() to both ResourceGroupScope and SubscriptionScope.

Error suggestion (error_suggestions.yaml)

Added DeploymentActive rule with user-friendly message and ARM troubleshooting link.

Test Coverage (8 tests, 24 subtests)

Test Coverage
TestIsActiveDeploymentState 17 subtests covering all provisioning states
TestWaitForActiveDeployments_NoActive Happy path
TestWaitForActiveDeployments_InitialListError_NotFound RG doesn't exist yet
TestWaitForActiveDeployments_InitialListError_Other Auth/throttle errors propagate
TestWaitForActiveDeployments_ActiveThenClear Polling until clear
TestWaitForActiveDeployments_CancelledContext Context cancellation
TestWaitForActiveDeployments_PollError Error during polling
TestWaitForActiveDeployments_Timeout 30min timeout

Related

Copilot AI review requested due to automatic review settings March 23, 2026 15:17
@spboyer spboyer added the bug Something isn't working label Mar 23, 2026
@spboyer spboyer self-assigned this Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pre-deployment check that detects in-progress ARM deployments at the target scope and waits for them to complete, avoiding DeploymentActive failures during provisioning.

Changes:

  • Introduces waitForActiveDeployments() in the Bicep provisioning flow and polls until deployments clear or a timeout is reached.
  • Adds IsActiveDeploymentState() plus new tests to classify which provisioning states are considered “active”.
  • Extends infra.Scope with ListActiveDeployments() and adds a DeploymentActive error suggestion rule.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cli/azd/resources/error_suggestions.yaml Adds a user-facing suggestion for DeploymentActive ARM errors.
cli/azd/pkg/infra/scope.go Extends scope interface + implements ListActiveDeployments() for RG and subscription scopes.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go Adds wait loop before deployment submission, with polling/timeout defaults.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider_test.go Updates mocked scope to satisfy the new Scope interface.
cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go Adds tests covering wait-loop behavior, errors, cancellation, and timeout.
cli/azd/pkg/azapi/deployments.go Adds IsActiveDeploymentState() helper.
cli/azd/pkg/azapi/deployment_state_test.go Adds unit tests for active/inactive state classification.

@spboyer spboyer force-pushed the fix/deployment-active-conflict branch from f456f95 to 8829c13 Compare March 23, 2026 22:24
@spboyer
Copy link
Member Author

spboyer commented Mar 23, 2026

Telemetry Context: DeploymentActive + Retry Behavior

This PR addresses DeploymentActive (199 errors/month). Additional context from the deep dive:

Retry behavior makes this especially valuable

Of machines that hit InvalidTemplateDeployment errors (which includes DeploymentActive in the chain):

  • 66% retry without changing anything — for DeploymentActive, this means they re-submit and hit the same active deployment again
  • Average 3.6 retries per machine before they either succeed or give up
  • The detect-and-wait pattern in this PR would break that retry loop immediately

Time savings

  • DeploymentActive users currently wait an average deployment duration before failing, then retry
  • With this PR: one wait period (with progress feedback) instead of N failed attempts × deployment time each

This is a clean win — the fix is architecturally simple (poll + wait) and eliminates a category of failure that can never be solved by retrying.

@spboyer spboyer force-pushed the fix/deployment-active-conflict branch from 8829c13 to 38516a8 Compare March 24, 2026 13:34
@kristenwomack
Copy link
Collaborator

@copilot - will you check to ensure we have metrics coverage so we can see how often this error and fix occur after merging this change?

Copy link
Contributor

Copilot AI commented Mar 24, 2026

@kristenwomack I've opened a new pull request, #7288, to work on those changes. Once the pull request is ready, I'll request review from you.

@spboyer spboyer marked this pull request as draft March 24, 2026 16:39
@spboyer
Copy link
Member Author

spboyer commented Mar 24, 2026

Converting to draft — the active deployment check integration point in Deploy() causes nil-pointer panics in existing test mocks. The test infrastructure creates Deployment objects with nil inner scopes, and scopeForTemplate requires provider state that isn't available in all test paths.

Need to either:

  1. Restructure as a standalone pre-deployment middleware (avoids touching the Deploy code path)
  2. Add the check at the provision action level (internal/cmd/provision.go) before it calls into the bicep provider

The standalone tests for waitForActiveDeployments, IsActiveDeploymentState, and the error suggestion rule all pass. Only the integration with the existing Deploy flow is problematic.

@spboyer spboyer force-pushed the fix/deployment-active-conflict branch from 3732718 to f1f8bfc Compare March 24, 2026 19:22
@spboyer
Copy link
Member Author

spboyer commented Mar 25, 2026

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

spboyer and others added 7 commits March 24, 2026 17:33
Before starting a Bicep deployment, check the target scope for
in-progress ARM deployments and wait for them to complete. This avoids
the DeploymentActive error that ARM returns after ~5 minutes when a
concurrent deployment is already running on the same resource group.

Changes:
- Add IsActiveDeploymentState() helper in azapi to classify provisioning
  states as active or terminal.
- Add ListActiveDeployments() to the infra.Scope interface and both
  ResourceGroupScope / SubscriptionScope implementations.
- Add waitForActiveDeployments() in the Bicep provider, called after
  preflight validation and before deployment submission. It polls until
  active deployments clear or a 30-minute timeout is reached.
- Add a DeploymentActive error suggestion rule to error_suggestions.yaml.
- Add unit tests for state classification, polling, timeout, error
  handling, and context cancellation.

Fixes #7248

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…per, refresh timeout names

- Fix 'range 200' compile error (not valid in all Go versions)
- Make DeploymentActive YAML rule scope-agnostic
- Extract filterActiveDeployments helper to deduplicate scope logic
- Refresh deployment names from latest poll on timeout message

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer and others added 4 commits March 24, 2026 17:33
Move ListActiveDeployments to a standalone function instead of adding
it to the exported Scope interface. Adding methods to exported
interfaces is a breaking change for any external implementation
(including test mocks in CI).

The standalone infra.ListActiveDeployments(ctx, scope) function calls
scope.ListDeployments and filters for active states, achieving the
same result without widening the interface contract.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The agent rewrote the entire YAML file causing a 925-line diff that
broke CI tests. Reset to main and add only the DeploymentActive rule
(11 lines) as intended.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deployment object returned by generateDeploymentObject embeds a
Scope that can be nil in test environments (e.g. mockedScope returns
an empty SubscriptionDeployment). Using scopeForTemplate resolves
the scope from the provider's configuration, avoiding nil panics
in existing tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the fix/deployment-active-conflict branch from f1f8bfc to 74a9edb Compare March 25, 2026 00:33
spboyer added a commit that referenced this pull request Mar 25, 2026
Add lessons learned from recent PR reviews (#7290, #7251, #7250,
#7247, #7236, #7235, #7202, #7039) as agent instructions to prevent
recurring review findings.

New sections:
- Error handling: ErrorWithSuggestion completeness, telemetry service
  attribution, scope-agnostic messages
- Architecture boundaries: pkg/project target-agnostic, extension docs
- Output formatting: shell-safe paths, consistent JSON contracts
- Path safety: traversal validation, quoted paths in messages
- Testing best practices: test actual rules, extract shared helpers,
  correct env vars, TypeScript patterns, efficient dir checks
- CI/GitHub Actions: permissions, PATH handling, artifact downloads,
  prefer ADO for secrets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
spboyer added a commit that referenced this pull request Mar 25, 2026
Add lessons learned from team and Copilot reviews across PRs #7290,
#7251, #7250, #7247, #7236, #7235, #7202, #7039 as agent instructions
to prevent recurring review findings.

New/expanded sections:
- Error handling: ErrorWithSuggestion field completeness, telemetry
  service attribution, scope-agnostic messages, link/suggestion parity,
  stale data in polling loops
- Architecture boundaries: pkg/project target-agnostic, extension docs
  separation, env var verification against source code
- Output formatting: shell-safe quoted paths, consistent JSON types
- Path safety: traversal validation, quoted paths in messages
- Code organization: extract shared logic across scopes
- Documentation standards: help text consistency, no dead references,
  PR description accuracy
- Testing best practices: test YAML rules e2e, extract shared helpers,
  correct env vars (AZD_FORCE_TTY, NO_COLOR), TypeScript patterns,
  reasonable timeouts, cross-platform paths, test new JSON fields
- CI / GitHub Actions: permissions blocks, PATH handling, cross-workflow
  artifacts, prefer ADO for secrets, no placeholder steps

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer
Copy link
Member Author

spboyer commented Mar 25, 2026

/azp run azure-dev - cli

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle DeploymentActive conflict -- detect and wait for in-progress deployments

4 participants