feat(ai-agents): fetch hosted-agent supported regions from manifest#7930
Conversation
Replace the hardcoded supportedHostedAgentRegions slice with a fetch from a JSON manifest committed in the repo. Adding or removing regions no longer requires shipping a new extension release. The manifest URL temporarily points at raw.githubusercontent.com on main; will switch to an aka.ms link once provisioned.
There was a problem hiding this comment.
Pull request overview
This PR updates the azure.ai.agents extension to dynamically retrieve the hosted-agent supported Azure regions from a JSON manifest (hosted via GitHub raw) and cache the result per-process, replacing the previously hardcoded Go list.
Changes:
- Add runtime fetch + per-process cache for hosted-agent supported regions, backed by a committed JSON manifest.
- Thread
context.Context+ error handling through region-dependent helpers (supportedRegionsForInit,supportedModelLocations) and update call sites. - Add unit tests for manifest fetching/normalization/timeout behavior and update model-location intersection tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cli/azd/extensions/azure.ai.agents/internal/exterrors/codes.go |
Adds a new structured error code for region-manifest fetch failures. |
cli/azd/extensions/azure.ai.agents/internal/cmd/init_models.go |
Updates model-location prompt flow to handle supportedModelLocations errors. |
cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go |
Implements manifest fetch, normalization, caching, and region/model-location helpers. |
cli/azd/extensions/azure.ai.agents/internal/cmd/init_foundry_resources_helpers.go |
Switches location allowlist retrieval to the new runtime-fetched region list. |
cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations_test.go |
Adds tests for manifest fetching and updates tests for the new helper signatures/caching. |
cli/azd/extensions/azure.ai.agents/hosted-agent-regions.json |
Introduces the hosted-agent supported regions manifest consumed at runtime. |
- Switch manifest URL to https://aka.ms/azd-ai-agents/regions and make it a var so tests can override it. - supportedModelLocations now returns a structured CodeNoSupportedModelLocations error when the intersection is empty (an empty allowlist would otherwise disable downstream filtering and let users pick unsupported regions). - init_models.go callsites handle CodeNoSupportedModelLocations gracefully by continuing the recovery loop with a helpful message instead of aborting. - TestSupportedRegionsForInit_FetchesOnceAndCaches now exercises the real cached-fetch path via a URL override; tests touching the shared regionsCache no longer run in parallel. - Added test asserting the structured error code.
trangevi
left a comment
There was a problem hiding this comment.
Please address copilot comments, otherwise looks find
wbreza
left a comment
There was a problem hiding this comment.
Overall Assessment
Good design: decoupling region updates from extension releases is a real operational improvement, and the implementation uses modern Go 1.26 patterns well. The main concerns are resilience — there's no fallback or retry when the manifest fetch fails, which is a regression from the always-available hardcoded list. A few input validation and error handling gaps round out the findings. The existing Copilot-bot findings (mutex held during fetch, unbounded io.ReadAll) also remain unaddressed.
What's Done Well
- Modern Go 1.26 patterns throughout: .Context(), slices.Clone(), �rrors.AsType
- Cache returns cloned slices to prevent caller mutation — solid defensive programming
- Tests correctly avoid .Parallel() for shared-cache tests (addressed from prior review)
- Structured error code (CodeNoSupportedModelLocations) with graceful recovery loop in init_models.go
- Commits 2-3 thoroughly address therealjohn's feedback
High Priority
H1. No fallback when manifest fetch fails
init_locations.go — supportedRegionsForInit() ~L135-142
When the manifest URL is unreachable (network down, air-gapped environment), the function returns a hard error with no fallback to the previously-hardcoded region list. This is a user-facing regression: the old hardcoded behavior always worked regardless of connectivity.
Suggestion: Consider embedding the committed hosted-agent-regions.json via go:embed as a fallback when the remote fetch fails. This preserves the operational benefit (update by merging JSON) while maintaining offline resilience.
H2. No retry logic for transient failures
init_locations.go — etchHostedAgentRegionsFromURL() ~L174-216
Single HTTP attempt with a 5s timeout. A brief DNS hiccup or transient 503 causes immediate failure — users must manually retry the entire �zd init flow.
Suggestion: Add 2-3 retries with exponential backoff (e.g., 100ms, 500ms, 1s) for transient errors (timeouts, 5xx). Don't retry permanent errors (404, malformed JSON).
Medium Priority
M1. Error chain broken
init_locations.go —
egionsFetchError()
The error helper uses mt.Sprintf("...: %v", err) which discards the error chain. Per repo standards (mt.Errorf("context: %w", err)), callers should be able to use �rrors.Is/�rrors.As on the underlying cause to distinguish timeouts from parse errors.
M2. No manifest schema versioning
hosted-agent-regions.json
The JSON manifest has no version field. If the schema needs to change in the future, older clients will fail with cryptic parse errors rather than a clear "unsupported schema version" message.
Suggestion: Add "version": 1 and validate on parse.
M3. Timing-dependent test
init_locations_test.go — TestFetchHostedAgentRegionsFromURL_RespectsTimeout
The test sleeps for imeout+2s then asserts elapsed < imeout+1s. On slow CI runners, the margin may be insufficient, causing flaky failures.
Suggestion: Remove the elapsed-time assertion (just verify the timeout error occurs) or widen the margin significantly.
M4. No Content-Type validation on HTTP response
init_locations.go ~L190
The response body is processed as JSON regardless of Content-Type. If the aka.ms redirect hits a captive portal or error page serving HTML, json.Unmarshal produces a cryptic error rather than a clear "unexpected content type" message.
M5. Region strings not format-validated
init_locations.go ~L204-209
Region names from the manifest are normalized (lowered/trimmed) but not validated against a format pattern. Invalid entries would pass through and fail later at Azure API call time with confusing errors.
Low Priority
L1. Error code naming inconsistency — CodeNoSupportedModelLocations uses a negation-noun pattern unlike existing codes (CodeUnsupportedHost, CodeUnsupportedAgentKind). Consider CodeUnsupportedModelLocations for consistency.
L2. URL var comment could be more descriptive — The comment at L109 explains the var is for test overrides but doesn't describe what the manifest contains or its expected format.
Note
This review supplements the existing Copilot-bot findings regarding (1) HTTP fetch held while
egionsCache.mu is locked and (2) io.ReadAll without a size limit — both of which remain unaddressed on the latest commit.
Summary: 2 High, 5 Medium, 2 Low findings. The High findings (no fallback + no retry) are the main blockers — they introduce a resilience regression vs. the previous hardcoded approach.
jongio
left a comment
There was a problem hiding this comment.
The design is clean - runtime fetch with caching, structured errors, and normalization all follow the extension's existing patterns well.
Main concern: this introduces a hard network dependency where there wasn't one. If https://aka.ms/azd-ai-agents/regions is unreachable (corp proxy, offline, outage), azd init fails before the user can even select a subscription. The old hardcoded list had no network requirement at this stage. Consider embedding the committed hosted-agent-regions.json as a compile-time fallback so the fetch degrades gracefully instead of halting init.
Also: the JSON manifest drops several regions from the old hardcoded list (eastus, canadaeast, centralus, germanywestcentral, italynorth, southcentralus, uaenorth, uksouth, westeurope). If that's intentional, a quick note in the PR description would help reviewers.
Fetch the regions manifest without holding regionsCache.mu so a caller's canceled context returns immediately instead of waiting up to the fetch timeout. The fetch is coordinated via an in-flight handle so concurrent callers share a single network round-trip; on failure the in-flight slot clears so the next caller retries instead of latching the error. Also bound the response body with io.LimitReader (1 MiB cap) to guard against unexpectedly large or hostile responses from the source URL. Adds tests for the size cap and for concurrent callers sharing a single fetch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If the live manifest fetch fails (transient network, restrictive proxy, outage), fall back to the build-time embedded copy of hosted-agent-regions.json so 'azd init' is not blocked. The JSON moves into the cmd package because //go:embed cannot reference files outside the package directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use context.WithoutCancel(callerCtx) so the goroutine inherits ctx values without being abortable by any single caller. Resolves the gosec G118 lint failure flagging context.Background() inside the goroutine. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fetch the hosted-agent supported regions list at runtime from a JSON manifest, instead of hardcoding it. This lets us update the region list by merging JSON to
mainwithout cutting an extension release.Embedded fallback
The same manifest is embedded at build time via
//go:embedand used as a fallback when the live fetch fails (transient network, restrictive proxy, outage), soazd initis not blocked by network issues. The JSON lives inside thecmdpackage because//go:embedcannot reference files outside the package directory.Dropped regions
The new manifest drops several regions that were in the prior hardcoded list (
eastus,canadaeast,centralus,germanywestcentral,italynorth,southcentralus,uaenorth,uksouth,westeurope). These are intentional — the hosted-agent service is not available in those regions today.