fix(security/azure): migrate Logic App SCHEDULED_TASK_SECRET to Key Vault connection#74
Conversation
…ault connection
The Logic App workflows that fan out scheduled task calls to the Container
App were still embedding `${var.scheduled_task_secret}` plaintext in their
outgoing `Authorization: Bearer ...` headers (3 sites in scheduled-tasks.tf).
The secret therefore lived in:
- the Logic App workflow definition stored in Azure Resource Manager,
- the Terraform state file,
- any captured `terraform plan/show` output (CI logs, operator screens).
Migrate to Azure Logic Apps Key Vault references via system-assigned
managed identity:
- Each workflow (recommendations, ri_exchange, cleanup) gets a
system-assigned managed identity granted "Key Vault Secrets User" on
the existing Key Vault.
- A new `get-secret` action (custom HTTP action with
`authentication.type = ManagedServiceIdentity`, audience
`https://vault.azure.net`) fetches scheduled-task-secret from KV at
workflow runtime via the data-plane REST API.
- The call-endpoint action references `@body('get-secret')['value']`
in its outgoing Authorization header. The plaintext is never
interpolated into the workflow definition or Terraform state.
- `runtimeConfiguration.secureData` masks the secret in workflow run
history (outputs of get-secret, inputs of the downstream call).
Module API change: `var.scheduled_task_secret` (sensitive value) is
replaced with `var.scheduled_task_secret_name` (just the KV secret name).
Env-level caller updated to pass `module.secrets.scheduled_task_secret_name`
instead of `.scheduled_task_secret_value`. The KV access uses the existing
`var.key_vault_id` and `var.key_vault_uri` already plumbed for the
Container App's runtime identity — no new env-level wiring required.
The deferred section in known_issues/18_tf_env_azure.md is updated to
reflect that the Logic App leak is now closed.
Closes #50
|
@coderabbitai review |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe pull request resolves a security vulnerability where the Changes
Sequence Diagram(s)sequenceDiagram
participant LogicApp as Logic App Workflow
participant KeyVault as Azure Key Vault
participant ManagedId as Managed Identity
participant Endpoint as Scheduled Tasks Endpoint
rect rgba(34, 139, 34, 0.5)
Note over LogicApp,Endpoint: Updated Flow (Secure)
LogicApp->>ManagedId: Request access token
ManagedId-->>LogicApp: Access token (authenticated)
LogicApp->>KeyVault: Authenticate with managed identity token<br/>GET secret
KeyVault-->>LogicApp: Secret value (fetched at runtime)
LogicApp->>Endpoint: Call endpoint with retrieved secret<br/>Authorization: Bearer [secret]
Endpoint-->>LogicApp: Success
end
rect rgba(178, 34, 34, 0.5)
Note over LogicApp,Endpoint: Previous Flow (Insecure)
LogicApp->>LogicApp: Secret embedded in workflow definition<br/>Authorization: Bearer [plaintext_secret]
LogicApp->>Endpoint: Call endpoint with hardcoded secret
Endpoint-->>LogicApp: Success
Note over LogicApp,Endpoint: ✗ Plaintext visible in: workflow def,<br/>Terraform state, logs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
known_issues/18_tf_env_azure.md (1)
3-3:⚠️ Potential issue | 🟡 MinorUpdate the audit counter — it's now stale.
The summary line still reads
1 partially fixed, but this PR closes the only "partially fixed" item (the scheduled-task-secret entry, now flipped to ✔️ Resolved on line 102). Bump7 resolved→8 resolvedand1 partially fixed→0 partially fixedto keep the audit roll-up honest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@known_issues/18_tf_env_azure.md` at line 3, Update the audit summary line that currently reads "**Audit status (2026-04-20):** `0 still valid · 7 resolved · 1 partially fixed · 0 moved · 0 needs triage`" to reflect the closed item: change `7 resolved` → `8 resolved` and `1 partially fixed` → `0 partially fixed`; the scheduled-task-secret entry referenced (marked ✔️ Resolved at the content around the scheduled-task-secret section) is the item causing this counter change so update that single-line roll-up accordingly.
🧹 Nitpick comments (1)
terraform/modules/compute/azure/container-apps/scheduled-tasks.tf (1)
334-356: Add explicit dependency from workflow actions to role assignments to avoid first-run 403s.Azure RBAC propagation can take 5–10 minutes after an
azurerm_role_assignmentis created. Currently, there is no dependency edge from the*_get_secretactions to the corresponding*_kv_secrets_userassignments, so Terraform may create them in parallel. A manual "Run trigger" immediately afterapplycan 403 until propagation completes. Scheduled runs at 02:00/03:00/06h cadence are unlikely to hit this, but the runbook will benefit from the determinism.♻️ Proposed adjustment (one of the workflows shown; mirror for the other two)
resource "azurerm_logic_app_action_custom" "recommendations_get_secret" { count = var.enable_scheduled_tasks ? 1 : 0 name = "get-secret" logic_app_id = azurerm_logic_app_workflow.recommendations[0].id body = jsonencode({ ... }) + + depends_on = [azurerm_role_assignment.recommendations_kv_secrets_user] }Alternatively, keep the dependency at the workflow level so all actions inherit it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@terraform/modules/compute/azure/container-apps/scheduled-tasks.tf` around lines 334 - 356, Add an explicit dependency so each Logic App workflow waits for its Key Vault role assignment to propagate: update azurerm_logic_app_workflow.recommendations, azurerm_logic_app_workflow.ri_exchange, and azurerm_logic_app_workflow.cleanup to include a depends_on that references their respective azurerm_role_assignment resources (azurerm_role_assignment.recommendations_kv_secrets_user, azurerm_role_assignment.ri_exchange_kv_secrets_user, azurerm_role_assignment.cleanup_kv_secrets_user) so Terraform creates the role assignment before the workflow (or its _get_secret actions) to prevent first-run 403s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@terraform/modules/compute/azure/container-apps/scheduled-tasks.tf`:
- Around line 24-30: Add a plan-time guard so the secret name cannot be empty
when scheduled tasks are enabled: add a Terraform precondition/validation that
when var.enable_scheduled_tasks || var.enable_ri_exchange_schedule is true then
var.scheduled_task_secret_name != "" (or length(var.scheduled_task_secret_name)
> 0); this prevents scheduled_task_secret_url from resolving to the list-secrets
endpoint and surfaces the error at plan/apply. Ensure you reference the existing
vars (var.enable_scheduled_tasks, var.enable_ri_exchange_schedule,
var.scheduled_task_secret_name) when implementing the precondition/validation.
In `@terraform/modules/compute/azure/container-apps/variables.tf`:
- Around line 207-211: The variable scheduled_task_secret_name currently allows
an empty string which lets local.scheduled_task_secret_url resolve to the
list-secrets endpoint; add a validation block on variable
"scheduled_task_secret_name" requiring it to be a non-empty string (e.g. length
> 0) so plan-time errors occur when not provided, and also add a precondition
check where local.scheduled_task_secret_url (or the code building the Key Vault
URL in scheduled-tasks.tf) is constructed to assert that enable_scheduled_tasks
== false OR scheduled_task_secret_name != "" so the enabled+empty combination
fails during plan.
---
Outside diff comments:
In `@known_issues/18_tf_env_azure.md`:
- Line 3: Update the audit summary line that currently reads "**Audit status
(2026-04-20):** `0 still valid · 7 resolved · 1 partially fixed · 0 moved · 0
needs triage`" to reflect the closed item: change `7 resolved` → `8 resolved`
and `1 partially fixed` → `0 partially fixed`; the scheduled-task-secret entry
referenced (marked ✔️ Resolved at the content around the scheduled-task-secret
section) is the item causing this counter change so update that single-line
roll-up accordingly.
---
Nitpick comments:
In `@terraform/modules/compute/azure/container-apps/scheduled-tasks.tf`:
- Around line 334-356: Add an explicit dependency so each Logic App workflow
waits for its Key Vault role assignment to propagate: update
azurerm_logic_app_workflow.recommendations,
azurerm_logic_app_workflow.ri_exchange, and azurerm_logic_app_workflow.cleanup
to include a depends_on that references their respective azurerm_role_assignment
resources (azurerm_role_assignment.recommendations_kv_secrets_user,
azurerm_role_assignment.ri_exchange_kv_secrets_user,
azurerm_role_assignment.cleanup_kv_secrets_user) so Terraform creates the role
assignment before the workflow (or its _get_secret actions) to prevent first-run
403s.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 55793367-840d-471e-9415-d34e8fbc5c73
📒 Files selected for processing (5)
known_issues/18_tf_env_azure.mdterraform/environments/azure/.terraform.lock.hclterraform/environments/azure/compute.tfterraform/modules/compute/azure/container-apps/scheduled-tasks.tfterraform/modules/compute/azure/container-apps/variables.tf
| variable "scheduled_task_secret_name" { | ||
| description = "Key Vault secret name (NOT the value) holding the shared secret for authenticating scheduled task HTTP calls. The Logic App workflows fetch this secret at runtime via their managed identity, so the plaintext never lands in the workflow definition or Terraform state." | ||
| type = string | ||
| default = "" | ||
| sensitive = true | ||
| } |
There was a problem hiding this comment.
Add validation so an empty scheduled_task_secret_name can't silently produce a malformed Key Vault URL.
With default = "" and enable_scheduled_tasks defaulting to true, a caller that forgets to pass this name will produce …/secrets/?api-version=7.4 in local.scheduled_task_secret_url — that's the list-secrets endpoint, not a single-secret GET. The managed identity has read access at vault scope, so the call succeeds, body('get-secret')['value'] resolves to an array, and the downstream Authorization header becomes garbage. The failure only surfaces at runtime as a 401 from the Container App.
A validation block keeps the failure at plan time:
🛡️ Proposed fix
variable "scheduled_task_secret_name" {
description = "Key Vault secret name (NOT the value) holding the shared secret for authenticating scheduled task HTTP calls. The Logic App workflows fetch this secret at runtime via their managed identity, so the plaintext never lands in the workflow definition or Terraform state."
type = string
default = ""
+
+ validation {
+ condition = !contains(["/", "?", "#", " "], substr(var.scheduled_task_secret_name, 0, 0)) && !can(regex("[/?# ]", var.scheduled_task_secret_name))
+ error_message = "scheduled_task_secret_name must be a bare Key Vault secret name (no '/', '?', '#', or whitespace)."
+ }
}Plus a precondition next to where the URL is built (see scheduled-tasks.tf comment) so that the empty-string + feature-enabled combination is caught.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| variable "scheduled_task_secret_name" { | |
| description = "Key Vault secret name (NOT the value) holding the shared secret for authenticating scheduled task HTTP calls. The Logic App workflows fetch this secret at runtime via their managed identity, so the plaintext never lands in the workflow definition or Terraform state." | |
| type = string | |
| default = "" | |
| sensitive = true | |
| } | |
| variable "scheduled_task_secret_name" { | |
| description = "Key Vault secret name (NOT the value) holding the shared secret for authenticating scheduled task HTTP calls. The Logic App workflows fetch this secret at runtime via their managed identity, so the plaintext never lands in the workflow definition or Terraform state." | |
| type = string | |
| default = "" | |
| validation { | |
| condition = var.scheduled_task_secret_name != "" && !can(regex("[/?# ]", var.scheduled_task_secret_name)) | |
| error_message = "scheduled_task_secret_name must be a non-empty, bare Key Vault secret name (no '/', '?', '#', or whitespace)." | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@terraform/modules/compute/azure/container-apps/variables.tf` around lines 207
- 211, The variable scheduled_task_secret_name currently allows an empty string
which lets local.scheduled_task_secret_url resolve to the list-secrets endpoint;
add a validation block on variable "scheduled_task_secret_name" requiring it to
be a non-empty string (e.g. length > 0) so plan-time errors occur when not
provided, and also add a precondition check where
local.scheduled_task_secret_url (or the code building the Key Vault URL in
scheduled-tasks.tf) is constructed to assert that enable_scheduled_tasks ==
false OR scheduled_task_secret_name != "" so the enabled+empty combination fails
during plan.
Apply three actionable items from CodeRabbit's review of the Logic App Key Vault migration: 1. **Plan-time precondition guard** (scheduled-tasks.tf): an empty `var.scheduled_task_secret_name` would silently turn the KV URL into `<vault>/secrets/?api-version=7.4` (the list-secrets endpoint), which the workflow's managed identity would 403 against at runtime — a late, hard-to-diagnose failure. Add a `terraform_data` resource with two preconditions that fail at plan/apply when scheduled tasks are enabled but the secret name is empty or `key_vault_uri` is missing its trailing slash. 2. **RBAC propagation `depends_on`** (scheduled-tasks.tf): each `*_get_secret` custom action now `depends_on` its corresponding `*_kv_secrets_user` role assignment. Azure RBAC propagation can take 5–10 minutes after `azurerm_role_assignment` apply; without the dependency Terraform may schedule them in parallel, so a manual "trigger now" right after apply could 403 until propagation completes. The 02:00/03:00 UTC scheduled runs almost always fall after propagation, but this keeps the post-apply runbook deterministic. 3. **known_issues audit roll-up** (18_tf_env_azure.md): bump the summary counter from `7 resolved · 1 partially fixed` to `8 resolved · 0 partially fixed` to match the now-resolved scheduled-task-secret entry, and stamp the audit-status date to today (2026-04-25). Variable description for `scheduled_task_secret_name` updated to mention the new precondition. Refs #50, PR #74.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
CodeRabbit flagged the missing variable-level validation alongside the
existing precondition. Their literal suggestion required `var != ""`
unconditionally, which would fail plan for callers that legitimately
disable both `enable_scheduled_tasks` and `enable_ri_exchange_schedule`
(the variable's default is `""`). Variable validation can't reference
sibling variables so the "is non-empty when needed" rule must stay in
the resource precondition in scheduled-tasks.tf.
Add a narrower validation that only fires when a value is supplied: it
must be a bare Key Vault secret name with no `/`, `?`, `#`, or
whitespace. The original concern was that an empty value silently
produced the list-secrets endpoint URL — that's fully covered by the
precondition. The new validation closes the secondary risk that a
caller passes a value containing path/query/fragment characters and
escapes the `${key_vault_uri}secrets/${name}?api-version=…` URL
template.
`terraform validate` clean.
|
@coderabbitai review |
…led-task secret Make the rule split between `variables.tf` and `scheduled-tasks.tf` self- documenting. The variable's `validation` block can only handle the character-set half (because variable validation runs unconditionally and can't see other variables); the conditional non-empty rule lives on the precondition resource. Comments at both ends now point at each other so a reader landing on either file knows where the other half is, and why they're not unified. Tightened the variable comment in passing (typo'd "fragment fragment"). No functional change.
✅ Actions performedReview triggered.
|
…ntity CodeRabbit nits: two security/architectural overview comments still described the original system-assigned identity approach from #74: - File header comment at the top of scheduled-tasks.tf - Step 1 comment above `azurerm_logic_app_action_custom.recommendations_get_secret` Both now describe the user-assigned identity wiring this PR introduces. The other remaining system-assigned references in the file are deliberate — they explain *why* the migration was needed (the SA `principal_id` plan-time-null bug from azurerm 3.117.1) and should stay. Comment-only change. No terraform plan output difference.
…d identities (unbreak Azure deploy) (#142) * fix(security/azure): switch Logic App scheduled tasks to user-assigned identities PR #74 introduced system-assigned managed identities on the three Logic App workflows (recommendations, ri_exchange, cleanup) and used `azurerm_logic_app_workflow.<name>[0].identity[0].principal_id` as the principal in the Key Vault role assignments. Every Azure deploy since then fails terraform plan with: Error: Missing required argument with module.compute_container_apps[0].azurerm_role_assignment.recommendations_kv_secrets_user[0], ... 379: principal_id = azurerm_logic_app_workflow.recommendations[0].identity[0].principal_id The argument "principal_id" is required, but no definition was found. In azurerm 3.117.1, `azurerm_logic_app_workflow.<x>.identity[0].principal_id` evaluates to null at plan time when the resource is being created from scratch, and `azurerm_role_assignment.principal_id` is a required string the validator can't accept null for. Result: the deploy is permanently broken on first apply with an SA-identity-based config. Fix: replace the system-assigned identity on each Logic App with a user-assigned managed identity. UA identities are first-class resources whose `principal_id` is populated at plan time, so the role assignment resolves cleanly. The same module already uses an UA identity for the Container App itself (azurerm_user_assigned_identity.container_app), so the pattern is consistent. Changes in scheduled-tasks.tf: - Added 3 azurerm_user_assigned_identity resources, one per workflow, gated on the same enable_scheduled_tasks / enable_ri_exchange_schedule flags as their workflows. - Each azurerm_logic_app_workflow's identity block changed from `type = "SystemAssigned"` to `type = "UserAssigned"; identity_ids = [...]` pointing at its UA identity. - Each azurerm_role_assignment's principal_id reads `azurerm_user_assigned_identity.<x>[0].principal_id` directly — no more identity[0] chain. - Each get-secret action's body JSON authentication block gains an `identity = <UA-id>` field. With UA identities the Logic Apps runtime needs to know which identity to use for the KV call (with SA there was exactly one, picked by default). Verified: `terraform validate` clean against terraform/environments/azure. The same flags + secret name plumbing in compute.tf and variables.tf are unchanged. * docs(azure): refresh scheduled-tasks.tf comments to user-assigned identity CodeRabbit nits: two security/architectural overview comments still described the original system-assigned identity approach from #74: - File header comment at the top of scheduled-tasks.tf - Step 1 comment above `azurerm_logic_app_action_custom.recommendations_get_secret` Both now describe the user-assigned identity wiring this PR introduces. The other remaining system-assigned references in the file are deliberate — they explain *why* the migration was needed (the SA `principal_id` plan-time-null bug from azurerm 3.117.1) and should stay. Comment-only change. No terraform plan output difference.
Summary
Closes #50.
The Logic App workflows that fan out scheduled task calls to the Container App were still embedding the plaintext
${var.scheduled_task_secret}in their outgoingAuthorization: Bearer ...headers (3 sites interraform/modules/compute/azure/container-apps/scheduled-tasks.tf). The secret therefore lived in the Logic App workflow definition (visible viaaz logicapp show), the Terraform state file, and any capturedterraform plan/showoutput (CI logs, operator screens) — even after the Container App side was migrated to a Key Vault reference.This PR migrates all three workflows (
recommendations,ri_exchange,cleanup) to fetch the secret from Key Vault at runtime via system-assigned managed identity:azurerm_logic_app_workflowgetsidentity { type = "SystemAssigned" }and is grantedKey Vault Secrets Useron the existing Key Vault (var.key_vault_id, already plumbed for the Container App's runtime identity — no env-level wiring required).get-secretaction (custom HTTP action,authentication.type = ManagedServiceIdentity, audiencehttps://vault.azure.net) calls the Key Vault data-plane REST API at workflow runtime.@body('get-secret')['value']in its outgoingAuthorizationheader. The plaintext is never interpolated into the workflow definition or Terraform state.runtimeConfiguration.secureDatamasks the secret in workflow run history (outputsof get-secret,inputsof the downstream call).Module API change:
var.scheduled_task_secret(sensitive value) is replaced withvar.scheduled_task_secret_name(just the KV secret name). Env-level caller (terraform/environments/azure/compute.tf) updated to passmodule.secrets.scheduled_task_secret_nameinstead of.scheduled_task_secret_value.known_issues/18_tf_env_azure.mdis updated — the Logic App deferred section is now closed; the original audit finding is fully resolved on Azure.Decision: full migration (no canary)
The 3 sites are structurally identical (same secret URL, same auth model, same managed-identity grant). Splitting would have created two near-identical migration paths to maintain — riskier than landing them together. Local
terraform validatepasses for both the module and the env.Verification done
terraform fmt -recursive -check— passed.terraform validateinterraform/modules/compute/azure/container-apps/— passed.terraform validateinterraform/environments/azure/— passed.Test plan
terraform planagainst an Azure environment and confirms the diff matches the design (3 workflow identities added, 3 KV role assignments added, 6 custom-action resources, 3 oldazurerm_logic_app_action_httpremoved).az logic show -n <wf>no longer reveals the secret in the workflow definition.get-secretaction returns the value, the call-endpoint action authenticates successfully against the Container App).The post-apply checks are operator-only (require a live Azure tenant); the static verification is captured in this PR.
Out of scope
for_eachmap — would change the resource address shape and force a destroy/create on operator state. Left for a future cleanup if the workflow set keeps growing.Summary by CodeRabbit
Bug Fixes
Documentation