Skip to content

[K9VULN-14660] fix(agentless-azure): harden agentless azure setup#185

Merged
mohamed-challal merged 20 commits into
mainfrom
mohamed.challal/harden-agentless-azure-setup
May 12, 2026
Merged

[K9VULN-14660] fix(agentless-azure): harden agentless azure setup#185
mohamed-challal merged 20 commits into
mainfrom
mohamed.challal/harden-agentless-azure-setup

Conversation

@mohamed-challal
Copy link
Copy Markdown
Contributor

@mohamed-challal mohamed-challal commented May 11, 2026

Summary

Hardens the Azure Agentless Scanner setup against the failure modes we expect to hit once it leaves pre-release: re-runs from a fresh Cloud Shell, repeated deploys with different SCANNER_RESOURCE_GROUP values, partial cleanups that leave Azure soft-deleted resources behind, and delegated-permission enterprise environments. Also roughly halves first-deploy time by parallelising the slowest phases.

The plumbing change underneath is a new per-install identifier (install-id = sha256(scanner_sub | rg)[:12]) used to derive Storage Account and Key Vault names, plus an Azure-tag-based discovery mechanism (DatadogAgentlessScanner=true) that replaces local state as the source of truth for "is there already a deployment in this subscription?".

Changes by theme

Robustness for re-runs and edge cases

  • fix(agentless-azure): handle unpurged secret key during re-setup with a different RG recover Azure soft-deleted Key Vaults instead of crashing with ConflictError on re-deploy.
  • fix(agentless-azure): handle resource group mismatch on deploy/destroy fail loudly (no silent overwrite) with actionable guidance when the user changes SCANNER_RESOURCE_GROUP between runs.
  • agentless azure: derive install_id and discover existing deployments via RG tag tag-based RG discovery on both deploy and destroy:
    • 1 tagged RG, env var unset → silently adopt it (fresh Cloud Shell re-runs work without env var)
    • 1 tagged RG, env var disagrees → ConfigurationError with mismatch guidance
    • ≥2 tagged RGs → single-install policy: deploy refuses, destroy requires explicit env var
  • agentless azure: scope storage account and key vault names to install_id resource names now derive from (scanner_sub, rg), so two RGs in the same subscription will not collide on a future multi-install iteration. Drops the legacy SA-RG lookup fallback.

Performance

First-deploy wall-clock on Cloud Shell drops roughly in half.

  • perf(agentless-azure): parallelize lookup checks and resource creation Storage Account and Key Vault control-plane work runs concurrently.
  • perf(agentless-azure): run preflight checks in parallel
  • perf(agentless-azure): bump Terraform parallelism 10 -> 20
  • fix(agentless-azure): skip Key Vault secret retries when Secrets Officer already exists

Permissions / enterprise delegation

  • add(agentless-azure): make permissions check softer in preflight if RG already exists when the RG is pre-created (admin-provisioned enterprise pattern), the resource-creation actions are probed at RG scope rather than subscription scope, so engineers with RG-only Contributor pass preflight.
  • add(agentless-azure): add roleDefinitions/write permission in preflight the Terraform roles module creates a custom scanning role whose assignableScopes covers every scan-target; surfacing this in preflight turns a confusing mid-apply 403 into an actionable error at the start of the run.

Observability

  • fix(agentless): mark active workflow step FAILED when Azure deploy exits with error failures now flip the in-progress step to FAILED on the workflow-status API; the Datadog UI setup-progress timeline no longer spins forever when, e.g., terraform apply exits with insufficient permissions.

Cleanup / housekeeping

  • refactor(agentless-azure): remove dead code
  • docs(agentless-azure): update readme
  • build(agentless-azure): re-build dist scripts refreshed dist/azure_agentless_setup.pyz.

Risk & compatibility

  • Single-install policy is now enforced. Any subscription that somehow ended up with two tagged Agentless RGs will fail deploy fast (this is the intended behaviour — the alternative would be a silent pick).
  • Tag contract. Discovery relies on the DatadogAgentlessScanner=true tag the script and Terraform module already apply on resource creation. The script never re-tags resources it didn't create (admin-pre-created RGs stay untagged and require the env var on subsequent re-runs).

@mohamed-challal mohamed-challal self-assigned this May 11, 2026
@mohamed-challal mohamed-challal changed the title fix(agentless-azure): harden agentless azure setup [K9VULN-14660] fix(agentless-azure): harden agentless azure setup May 11, 2026
@mohamed-challal mohamed-challal force-pushed the mohamed.challal/harden-agentless-azure-setup branch from bc649a9 to e4657b0 Compare May 11, 2026 16:16
@mohamed-challal
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4657b04b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread azure/agentless/src/azure_agentless_setup/metadata.py
Comment thread azure/agentless/src/azure_agentless_setup/config.py
@mohamed-challal mohamed-challal force-pushed the mohamed.challal/harden-agentless-azure-setup branch from e4657b0 to 3e73347 Compare May 11, 2026 16:28
@mohamed-challal mohamed-challal marked this pull request as ready for review May 11, 2026 16:29
@mohamed-challal mohamed-challal requested a review from a team as a code owner May 11, 2026 16:29
@mohamed-challal mohamed-challal requested a review from parsons90 May 11, 2026 16:29
Copy link
Copy Markdown
Collaborator

@tedkahwaji tedkahwaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mind updating the code owners so the gcp/azure agentless scripts tag your team?

@mohamed-challal mohamed-challal requested a review from a team as a code owner May 12, 2026 12:52
Copy link
Copy Markdown

@BraisCaboFelpete BraisCaboFelpete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mohamed-challal mohamed-challal merged commit 9047840 into main May 12, 2026
12 checks passed
@mohamed-challal mohamed-challal deleted the mohamed.challal/harden-agentless-azure-setup branch May 12, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants