[Epic] Phased deployment: split Terraform monolith into stacked root modules (ADR-013)

## Summary

The current `infra/terraform/main.tf` provisions ~30 resources in a single `terraform apply` — VNet, ACR, ACA environment, 8 ACA stub apps, Cosmos DB (+ private endpoint), APIM Consumption, AI Foundry (AVM module), SWA, and 16+ role assignments. On fresh environments this causes Azure control-plane saturation: all 8 ACA apps time out under a single correlation ID (see #118).

**This is the parent epic** for restructuring the deployment into 7 ordered phases:

1. **Foundation** — RG, VNet, ACR, ACA env, Storage, SWA stub, DNS
2. **ACA Apps** — 8× Container App stubs + AcrPull RBAC
3. **Data + AI** — Cosmos DB, Private Endpoint, AI Foundry (parallel with Phase 2)
4. **Gateway + RBAC** — APIM Consumption, Cognitive Services RBAC, Agent RBAC
5. **Image Deploy** — Docker build → ACR push → `az containerapp update` (existing job, re-sequenced)
6. **Seed Data** — `scripts/seed_demo_data.py` + Foundry agent provisioning (new job)
7. **Validate** — Backend health + frontend→backend connectivity (existing job, enhanced)

## Architectural Decision

Adopt **stacked Terraform root modules** under `infra/terraform/stacks/` with `terraform_remote_state` data sources between them. This extends the existing pattern already used for `stacks/services/` (APIM service-edge) and `stacks/frontend/`.

See ADR-013 for full analysis of alternatives (-target, depends_on, parallelism-only) and why stacked roots were chosen.

## Target Workflow DAG

```
resolve-release → detect-changes
  ├─→ Phase 1: provision-foundation          [needs: detect-changes]
  │       ├─→ Phase 2a: provision-aca-apps    [needs: provision-foundation]  ──┐
  │       └─→ Phase 2b: provision-data-ai     [needs: provision-foundation]  ──┤ (parallel)
  │                                                                            │
  │           Phase 4: provision-gateway-rbac  [needs: aca-apps, data-ai]  ◄───┘
  │               └─→ Phase 4b: provision-service-infra  (existing, per-service matrix)
  │
  ├─→ Phase 5: deploy-backend-services  [needs: aca-apps, service-infra, data-ai]
  │       └─→ Phase 5b: seed-data  [needs: deploy-backend-services, data-ai]
  │
  └─→ Phase 7: post-deploy-guardrails  [needs: deploy-backend-services, seed-data]
```

## Sprint Plan

| Sprint | Issue | Scope | Risk |
|--------|-------|-------|------|
| 1 | #[aca-apps] | Extract `stacks/aca-apps/` — 8 ACA apps + AcrPull RBAC | Medium |
| 2 | #[data-ai] | Extract `stacks/data-ai/` — Cosmos + Foundry | High (state surgery) |
| 3 | #[gateway-rbac] | Extract `stacks/gateway-rbac/` — APIM + cross-service RBAC | Low |
| 4 | #[workflow] | Restructure `azd-deploy.yml` job DAG + seed-data job | Medium |

## Acceptance Criteria

- [ ] Fresh dev environment provisions successfully in < 40 minutes
- [ ] All 8 ACA apps reach `Succeeded` provisioning state without `Operation expired`
- [ ] `terraform plan` on each stack shows 0 changes after initial apply
- [ ] Existing service-edge stacks (`stacks/services/`) continue to work unchanged
- [ ] Seed data runs and populates demo records via APIM
- [ ] Post-deploy guardrails pass including frontend→backend connectivity
- [ ] Workflow is idempotent — rerun after success produces no changes
- [ ] ADR-013 documented in `docs/adr/`

## Dependencies

- Depends on: #118 (ACA timeout root cause) — validates the control-plane saturation hypothesis
- Depends on: #115 (state adoption) — the migration path uses similar state import patterns
- Subsumes: #116 (concurrency strategy) — resolved by per-phase timeouts + `cancel-in-progress: false`
- Subsumes: #117 (Foundry serialization) — resolved by isolating Foundry into data-ai stack


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Phased deployment: split Terraform monolith into stacked root modules (ADR-013) #119

Summary

Architectural Decision

Target Workflow DAG

Sprint Plan

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sprint	Issue	Scope	Risk
1	#[aca-apps]	Extract `stacks/aca-apps/` — 8 ACA apps + AcrPull RBAC	Medium
2	#[data-ai]	Extract `stacks/data-ai/` — Cosmos + Foundry	High (state surgery)
3	#[gateway-rbac]	Extract `stacks/gateway-rbac/` — APIM + cross-service RBAC	Low
4	#[workflow]	Restructure `azd-deploy.yml` job DAG + seed-data job	Medium

[Epic] Phased deployment: split Terraform monolith into stacked root modules (ADR-013) #119

Description

Summary

Architectural Decision

Target Workflow DAG

Sprint Plan

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions