Skip to content

Make azd ai agent init Bicep-less by default; add --infra to eject #8065

@therealjohn

Description

@therealjohn

We want the azure.ai.agents extension to default to Bicep-less for azd ai agent init projects. The infrastructure templates that live in Azure-Samples/azd-ai-starter-basic today move into the extension itself, get synthesized in memory at azd provision time from azure.yaml, and only hit disk when the developer asks for it with azd ai agent init --infra.

Today, azd ai agent init clones a ~300-line infra/main.bicep that pre-bakes Storage, Search, Bing, ACR, and monitoring whether the developer uses them or not. Every shipped extension version reads from the same template repo, so we can't slim that Bicep without breaking projects initialized by older extensions. Two coupled problems: the on-disk templates are bloated, and there's no in-place fix. This RFC is the mechanism that breaks the coupling.

Scope: the requirements for Bicep-less default, the eject command (azd ai agent init --infra), the version-stability contract, and how we retire the existing Azure-Samples/azd-ai-starter-basic story. Out of scope: the unified azure.yaml schema (covered in Azure/azure-dev#7962) and incremental composition (covered in Azure/azure-dev#8049).

Dependencies: this RFC assumes the consolidated azure.ai.project and azure.ai.agent host kinds from Azure/azure-dev#7962. The synthesis input is the unified azure.yaml; without it, there's no source of truth to synthesize from.

Current Problems

  1. Init clones a starter repo from GitHub on every new project. Before the developer has written any agent code, they get a full Bicep tree and an infra/ directory -- regardless of which resources they actually need.

  2. The starter template is a program in template clothing. It's roughly 300 lines of conditionals: shouldCreateAcr, useExistingAiProject ? X : Y, dual-path outputs for new-vs-existing on every resource, 30-plus outputs to cover every permutation. That's business logic dressed up as IaC, written to make init work, not to serve the developer's day-to-day needs.

  3. Template changes break older extension versions. The Bicep lives in a sample repo, not the extension. Every shipped extension reads from the same main branch. We can't tailor or slim that Bicep without affecting every project initialized by every prior extension build. This is the structural reason the "tailored, minimal Bicep" idea has gone nowhere: there's no place to put the new shape that doesn't break someone.

  4. Init forces a long, all-up-front menu. Because the template is one-size-fits-all, init has to ask about every resource it might provision: existing project or new, monitoring or not, which model deployment, which region. Most of these answers don't affect what the developer is about to build; they're forced because the template needs them resolved at clone time.

  5. The infra/ directory becomes a maintenance burden the developer never asked for. Feedback from devs new to Hosted Agents don't need to see Bicep, but today they have no choice -- it lands in their repo at init.

  6. No path to "I just want a working agent." Even after init clones the template, the developer still has to run azd provision to stand up resources before they can deploy. There's no shape in which init -> deploy works without writing or accepting Bicep.

Solution Hypothesis

The shape we want:

  • Default mode is Bicep-less. azd ai agent init produces an azure.yaml, an agent code project, and nothing else on disk. No infra/ directory, no main.bicep, no ACR/Search/Storage shells the developer didn't ask for. The developer doesn't need to know about provider routing or declare anything special in azure.yaml.
  • Templates ship inside the extension binary. The Bicep modules that live in Azure-Samples/azd-ai-starter-basic today move into the extension as embedded templates. They version with the extension, not with a moving sample repo.
  • The extension owns provisioning for azure.ai.* projects. When azd provision runs against a project whose only services are azure.ai.* and there's no on-disk infra/, AZD routes provision/preview/destroy through the extension. The extension synthesizes Bicep from azure.yaml and applies it.
  • This is a known shape, not a new mechanism. AZD already supports Bicep-less projects via the compose path (azd add writes resources: to azure.yaml; AZD synthesizes Bicep at provision time without ever writing it to disk). The extension needs the same kind of synthesize-and-provision pipeline, scoped to its host kinds. AZD also already supports extension-provided provisioning providers; what's missing is the auto-trigger that routes to one when the project signals it (without forcing the developer to write a provider declaration).
  • Eject is azd ai agent init --infra. The same init verb writes infrastructure to disk. Run it on an empty directory to start a project with on-disk Bicep from day one. Run it inside an existing Bicep-less project to materialize the templates the extension has been synthesizing in memory. Eject is infra-only when the project already exists -- it does not re-prompt for templates, add agents, or touch code.
  • Eject produces semantic parity with synthesized Bicep. The eject pathway runs the same synthesis as in-memory provisioning, just writes the output to ./infra/. The Bicep that gets applied to ARM is the same in both modes. After eject, future provisions read from ./infra/ instead of going through the extension -- consistent with how AZD already treats the presence of on-disk Bicep.
  • Synthesis is best-effort stable within a minor extension version. Within 0.2.x, the same azure.yaml produces semantically identical Bicep. Across minors the output may change; we'll document this in the changelog and recommend azd provision --preview after upgrades.

The mental model shifts from "init clones a template you maintain forever" to "init writes config; the extension knows how to provision from that config." Eject becomes the off-ramp for developers who want full Bicep control, not the on-ramp.

How This Replaces Azure-Samples/azd-ai-starter-basic

Today Bicep-less default
init clones the starter repo from GitHub init writes azure.yaml and code; no clone, no network call for templates
infra/main.bicep lives in the user's repo Templates live in the extension binary
Same main.bicep serves all extension versions Templates are pinned to the extension version
Slimming the template breaks older users Slimming the template ships in the next extension release
Developer maintains the Bicep Developer maintains azure.yaml; Bicep is invisible until they ask for it
No way to opt out of Bicep Bicep is opt-in via --infra

The Azure-Samples/azd-ai-starter-basic repo is retired as the init target. It can stick around as an example of "what the extension would generate," but the CLI no longer clones it.

Brownfield: Existing Foundry Projects

Today the extension reads USE_EXISTING_AI_PROJECT and AZURE_AI_PROJECT_ID and the starter Bicep branches on them. The Bicep-less model replaces that with an explicit field on the project service:

services:
  foundry-project:
    host: azure.ai.project
    resourceId: ${AZURE_AI_PROJECT_ID}   # presence -> existing project mode
    config:
      toolboxes: { ... }

When resourceId: is set, synthesis:

  • Omits the Foundry project ARM resource entirely from the generated Bicep.
  • Generates references to the existing project so outputs can wire AZURE_AI_PROJECT_ENDPOINT, AZURE_AI_PROJECT_ID, AZURE_RESOURCE_GROUP, and tenant/subscription/location for downstream consumers.
  • Still synthesizes any ARM-backed children the developer added under config: (e.g. additional model deployments) and routes data-plane resources to the existing project's deploy verb.

init detects the brownfield path the same way it does today (the user picks "use an existing project" in the prompt or passes --project-id) and writes resourceId: ${AZURE_AI_PROJECT_ID} into azure.yaml along with the env var. No more useExistingAiProject ternary in the generated Bicep -- the conditional collapses to "is the field present in azure.yaml?" at synthesis time.

azd ai agent init --infra

--infra is the eject flag. It works in three contexts and is infra-only -- it never adds an agent, prompts for templates, or writes code:

Context Behavior
Empty directory Run init normally, then also write ./infra/ from synthesized templates. Project is on-disk Bicep from the start.
Existing Bicep-less azd agent project Synthesize the current azure.yaml and write ./infra/. Do not invoke any of the normal init flow (no template prompts, no code scaffold, no agent add). The presence of ./infra/ flips the project to on-disk on the next provision; the extension stops synthesizing.
Existing on-disk project (already has infra/) Refuse to overwrite. Print an error pointing the developer at azd ai agent init --infra --force.
Project that is not an azd agent project Refuse with a targeted message: "no azure.ai.* services found in azure.yaml; nothing to eject."

Eject is all-or-nothing for the whole project. A project either gets infrastructure synthesized by the extension or uses on-disk Bicep -- there's no partial mode where some agents get synthesized and others sit on disk. That avoids two synthesis paths racing on the same ARM resources.

Example output, ejecting an existing project:

> azd ai agent init --infra

Generating infrastructure files from azure.yaml...

  Created infra/main.bicep
  Created infra/main.parameters.json
  Created infra/modules/ai-project.bicep
  Created infra/modules/acr.bicep

Future provisions will read from ./infra/.

Next steps:
  azd provision    Apply changes

Example output, eject refused:

> azd ai agent init --infra

Error: ./infra/ already exists.

If you want to regenerate from azure.yaml, delete the infra directory and run the command again.

Post-Eject Behavior of CLI Commands

After eject, the CLI keeps modifying azure.yaml (CLI = config, coding agents = code). But CLI commands that imply ARM changes -- adding monitoring, adding a second container agent that needs ACR, declaring a new model deployment -- now create a drift risk: azure.yaml says one thing, on-disk Bicep says another.

The contract:

Command class Bicep-less project On-disk project (post-eject)
Modifies only azure.yaml data-plane (e.g. add tool, add toolbox, add connection) Apply normally Apply normally -- nothing in Bicep changes
Modifies azure.yaml in a way that requires new ARM resources (e.g. add monitoring, second container agent) Apply normally; next provision synthesizes the new resources Apply to azure.yaml and warn: "your project uses on-disk Bicep; run azd ai agent init --infra --force to regenerate Bicep, or edit infra/ manually"
Eject (init --infra) Allowed Refused unless --force

The CLI never silently patches user-owned Bicep. The on-disk path is opt-in to manual ownership; we surface the gap and let the developer decide whether to regenerate or hand-edit.

Validation Before Synthesis

Synthesis must not run on an invalid azure.yaml. The extension validates in this order on every provision, preview, and init --infra:

  1. Schema validation. Each azure.ai.* service's config: block is validated against its JSON schema. Schema errors fail fast with the file/line of the offending field.
  2. Service graph invariants.
    • Exactly one azure.ai.project service must exist (multi-project is out of scope for V1).
    • Every azure.ai.agent service must uses: exactly one azure.ai.project service. Missing uses or a dangling reference is an error.
    • No cycles in uses:.
  3. Deploy-mode invariant. Each azure.ai.agent must have exactly one of runtime: or docker:. Both is an error. Neither is an error. (Matches Azure/azure-dev#7962.)
  4. Env reference resolution. Any ${VAR} referenced in config: blocks must resolve from the AZD environment, or surface as an explicit "missing env var" error before any Bicep gets generated.
  5. Brownfield consistency. If resourceId: is set on the project, the value must be a syntactically valid Foundry project ARM resource ID (not an existence check -- that happens at deploy).

All five run before synthesis. Failures return structured errors with field paths so the developer sees services.foundry-project.config.modelDeployments[0].sku: required rather than a Bicep parse error two layers down.

What We Need From AZD Core

The bulk of this work lives in the extension. Core has two asks:

  1. Auto-route to an extension-provided provisioning provider when the project signals it. AZD already supports extension-provided provisioning providers, and it already auto-routes to the compose path when azure.yaml declares resources:. We need the equivalent for extension providers: when there's no on-disk infra/, no resources:, and an extension owns one of the project's services, route provision/preview/destroy through that extension. This should be a generalizable mechanism -- not specific to azure.ai.agents -- so other extensions that want to own provisioning for their host kinds get it for free.

  2. Surface uses and runtime to extensions. The extension-facing ServiceConfig that AZD passes to extensions today doesn't expose uses or a typed runtime block, both of which synthesis needs. This is a small addition that rides alongside the schema work in Azure/azure-dev#7962.

We're explicitly not asking developers to declare a provider in azure.yaml. The compose precedent is auto-detection from a signal in the file (resources:); ours is the same shape with a different signal (presence of azure.ai.* services + no on-disk infra). Developers shouldn't have to learn what a provider is.

Tailored, Minimal Bicep

Slimming the template is a partner workstream that ships alongside Bicep-less. We're including it here because the two are coupled:

  • Bicep-less makes slimming safe: changing the embedded template only affects projects on the new extension version, not every project that ran the old init.
  • Slimming makes Bicep-less viable: a synthesized 300-line main.bicep full of shouldCreate* conditionals is no better than today, just hidden.

The slimmed shape, per azure.yaml service:

Service in azure.yaml Bicep generated at provision time Created at deploy time (Foundry API)
azure.ai.project (always present in agent projects) Foundry project ARM resource, model deployments declared in config: (ARM-backed Cognitive Services deployments) Toolboxes, connections, eval datasets, vector indexes -- everything in config: that is data-plane only
azure.ai.project with resourceId: set (existing project) None for the project itself; outputs wire AZURE_AI_PROJECT_ENDPOINT/AZURE_AI_PROJECT_ID from the existing resource Same as above, against the existing project
azure.ai.agent with runtime: (code-deploy) None Agent runtime created via Foundry zip-deploy API
azure.ai.agent with docker: (container) ACR resource (only when this is the first container service in the project) Agent runtime created via Foundry container-deploy API
Monitoring / Insights Only when added explicitly via azd ai agent add monitoring (see Azure/azure-dev#8049) n/a

Model deployments are ARM-backed because they're Microsoft.CognitiveServices/accounts/deployments resources with quota, SKU, and RBAC implications -- they belong in Bicep, not the data plane. Toolboxes and connections are data-plane: no ARM representation, created against the Foundry control-plane API at deploy time. The azure.ai.project.config block carries a mix of both; the synthesizer picks out the ARM-backed fields and routes the rest to deploy. Same split Azure/azure-dev#7962 draws between provision and deploy verbs for the project service.

What goes away from today's template: shouldCreateAcr (now: only present when there's a container service), useExistingAiProject ternaries (now: handled by the resourceId: field on the project service -- if set, the Foundry project resource is omitted from the generated Bicep entirely), pre-baked Storage/Search/Bing (now: added on demand, not at init), most of the 30-plus outputs (now: only emit outputs the deploy verb actually consumes).

Downstream Impact

  • AZD Core: the auto-route-to-extension-provider behavior described in "What We Need From AZD Core," plus exposing uses and runtime on the extension-facing ServiceConfig. Telemetry needs to distinguish "extension-provided provider" from "on-disk Bicep" so we can measure adoption.
  • Foundry Toolkit (VS Code): reads azure.yaml. No new files to read. Should treat the absence of ./infra/ as normal, not as project corruption.
  • Azure-Samples/azd-ai-starter-basic: retired as an init target. Repo can stay as a reference. Sample owners should add a README note pointing at the extension.
  • Other AZD samples that embed agent definitions: the init -t <other-sample> flow keeps working unchanged. Those samples bring their own infra/ and the extension respects them. Only the default azd ai agent init (no -t) goes Bicep-less.
  • Telemetry: add provision.synthesis_source (embedded vs. on_disk) and init.infra_flag (true vs. false) so we can measure the eject rate.
  • Documentation: new doc explaining the Bicep-less default, the eject command, and the version-stability contract. Migration guide for existing 0.1.x projects (which already have infra/ on disk -- they stay on the on-disk path, no action needed).

Scope Boundaries

In scope for this RFC:

  • The Bicep-less default behavior and what it means for azd ai agent init
  • azd ai agent init --infra
  • Retiring Azure-Samples/azd-ai-starter-basic as the init target
  • The slimmed Bicep shape that ships inside the extension
  • The Core changes required to route provisioning to the extension

Out of scope (covered elsewhere):

  • The unified azure.yaml schema -- Azure/azure-dev#7962
  • azd ai agent add and incremental composition -- Azure/azure-dev#8049
  • Coding-agent (Copilot) compatibility for the new flags -- separate proposal

Explicitly not in scope:

  • A Bicep module registry or any external module dependency at provision time
  • Rewriting AZD's Bicep provisioning pipeline; we plug into it as it exists
  • Any change to how azd init -t <sample> works for non-default samples

Metadata

Metadata

Assignees

Labels

ext-agentsazure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensionsfeatureFeature request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions