Skip to content

feat(ai-agents): fetch hosted-agent supported regions from manifest#7930

Merged
trangevi merged 6 commits intoAzure:mainfrom
antriksh30:antriksh30/fetch-hosted-agent-regions
Apr 28, 2026
Merged

feat(ai-agents): fetch hosted-agent supported regions from manifest#7930
trangevi merged 6 commits intoAzure:mainfrom
antriksh30:antriksh30/fetch-hosted-agent-regions

Conversation

@antriksh30
Copy link
Copy Markdown
Contributor

@antriksh30 antriksh30 commented Apr 27, 2026

Fetch the hosted-agent supported regions list at runtime from a JSON manifest, instead of hardcoding it. This lets us update the region list by merging JSON to main without cutting an extension release.

Embedded fallback

The same manifest is embedded at build time via //go:embed and used as a fallback when the live fetch fails (transient network, restrictive proxy, outage), so azd init is not blocked by network issues. The JSON lives inside the cmd package because //go:embed cannot reference files outside the package directory.

Dropped regions

The new manifest drops several regions that were in the prior hardcoded list (eastus, canadaeast, centralus, germanywestcentral, italynorth, southcentralus, uaenorth, uksouth, westeurope). These are intentional — the hosted-agent service is not available in those regions today.

Replace the hardcoded supportedHostedAgentRegions slice with a fetch from a JSON manifest committed in the repo. Adding or removing regions no longer requires shipping a new extension release. The manifest URL temporarily points at raw.githubusercontent.com on main; will switch to an aka.ms link once provisioned.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the azure.ai.agents extension to dynamically retrieve the hosted-agent supported Azure regions from a JSON manifest (hosted via GitHub raw) and cache the result per-process, replacing the previously hardcoded Go list.

Changes:

  • Add runtime fetch + per-process cache for hosted-agent supported regions, backed by a committed JSON manifest.
  • Thread context.Context + error handling through region-dependent helpers (supportedRegionsForInit, supportedModelLocations) and update call sites.
  • Add unit tests for manifest fetching/normalization/timeout behavior and update model-location intersection tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cli/azd/extensions/azure.ai.agents/internal/exterrors/codes.go Adds a new structured error code for region-manifest fetch failures.
cli/azd/extensions/azure.ai.agents/internal/cmd/init_models.go Updates model-location prompt flow to handle supportedModelLocations errors.
cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go Implements manifest fetch, normalization, caching, and region/model-location helpers.
cli/azd/extensions/azure.ai.agents/internal/cmd/init_foundry_resources_helpers.go Switches location allowlist retrieval to the new runtime-fetched region list.
cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations_test.go Adds tests for manifest fetching and updates tests for the new helper signatures/caching.
cli/azd/extensions/azure.ai.agents/hosted-agent-regions.json Introduces the hosted-agent supported regions manifest consumed at runtime.

Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations_test.go Outdated
Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go Outdated
Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go Outdated
- Switch manifest URL to https://aka.ms/azd-ai-agents/regions and make
  it a var so tests can override it.
- supportedModelLocations now returns a structured CodeNoSupportedModelLocations
  error when the intersection is empty (an empty allowlist would otherwise
  disable downstream filtering and let users pick unsupported regions).
- init_models.go callsites handle CodeNoSupportedModelLocations gracefully by
  continuing the recovery loop with a helpful message instead of aborting.
- TestSupportedRegionsForInit_FetchesOnceAndCaches now exercises the real
  cached-fetch path via a URL override; tests touching the shared regionsCache
  no longer run in parallel.
- Added test asserting the structured error code.
@antriksh30 antriksh30 requested a review from Copilot April 27, 2026 17:10
@antriksh30 antriksh30 marked this pull request as ready for review April 27, 2026 17:17
@antriksh30 antriksh30 requested a review from therealjohn April 27, 2026 17:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go
Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go Outdated
Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go Outdated
@Azure Azure deleted a comment from Copilot AI Apr 27, 2026
Copy link
Copy Markdown
Contributor

@JeffreyCA JeffreyCA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good from my end, will let @therealjohn/@trangevi give final approval

@JeffreyCA JeffreyCA linked an issue Apr 27, 2026 that may be closed by this pull request
3 tasks
Copy link
Copy Markdown
Member

@trangevi trangevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address copilot comments, otherwise looks find

Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

Good design: decoupling region updates from extension releases is a real operational improvement, and the implementation uses modern Go 1.26 patterns well. The main concerns are resilience — there's no fallback or retry when the manifest fetch fails, which is a regression from the always-available hardcoded list. A few input validation and error handling gaps round out the findings. The existing Copilot-bot findings (mutex held during fetch, unbounded io.ReadAll) also remain unaddressed.

What's Done Well

  • Modern Go 1.26 patterns throughout: .Context(), slices.Clone(), �rrors.AsType
  • Cache returns cloned slices to prevent caller mutation — solid defensive programming
  • Tests correctly avoid .Parallel() for shared-cache tests (addressed from prior review)
  • Structured error code (CodeNoSupportedModelLocations) with graceful recovery loop in init_models.go
  • Commits 2-3 thoroughly address therealjohn's feedback

High Priority

H1. No fallback when manifest fetch fails
init_locations.go — supportedRegionsForInit() ~L135-142

When the manifest URL is unreachable (network down, air-gapped environment), the function returns a hard error with no fallback to the previously-hardcoded region list. This is a user-facing regression: the old hardcoded behavior always worked regardless of connectivity.

Suggestion: Consider embedding the committed hosted-agent-regions.json via go:embed as a fallback when the remote fetch fails. This preserves the operational benefit (update by merging JSON) while maintaining offline resilience.

H2. No retry logic for transient failures
init_locations.go — etchHostedAgentRegionsFromURL() ~L174-216

Single HTTP attempt with a 5s timeout. A brief DNS hiccup or transient 503 causes immediate failure — users must manually retry the entire �zd init flow.

Suggestion: Add 2-3 retries with exponential backoff (e.g., 100ms, 500ms, 1s) for transient errors (timeouts, 5xx). Don't retry permanent errors (404, malformed JSON).

Medium Priority

M1. Error chain broken
init_locations.go —
egionsFetchError()

The error helper uses mt.Sprintf("...: %v", err) which discards the error chain. Per repo standards ( mt.Errorf("context: %w", err)), callers should be able to use �rrors.Is/�rrors.As on the underlying cause to distinguish timeouts from parse errors.

M2. No manifest schema versioning
hosted-agent-regions.json

The JSON manifest has no version field. If the schema needs to change in the future, older clients will fail with cryptic parse errors rather than a clear "unsupported schema version" message.

Suggestion: Add "version": 1 and validate on parse.

M3. Timing-dependent test
init_locations_test.go — TestFetchHostedAgentRegionsFromURL_RespectsTimeout

The test sleeps for imeout+2s then asserts elapsed < imeout+1s. On slow CI runners, the margin may be insufficient, causing flaky failures.

Suggestion: Remove the elapsed-time assertion (just verify the timeout error occurs) or widen the margin significantly.

M4. No Content-Type validation on HTTP response
init_locations.go ~L190

The response body is processed as JSON regardless of Content-Type. If the aka.ms redirect hits a captive portal or error page serving HTML, json.Unmarshal produces a cryptic error rather than a clear "unexpected content type" message.

M5. Region strings not format-validated
init_locations.go ~L204-209

Region names from the manifest are normalized (lowered/trimmed) but not validated against a format pattern. Invalid entries would pass through and fail later at Azure API call time with confusing errors.

Low Priority

L1. Error code naming inconsistency — CodeNoSupportedModelLocations uses a negation-noun pattern unlike existing codes (CodeUnsupportedHost, CodeUnsupportedAgentKind). Consider CodeUnsupportedModelLocations for consistency.

L2. URL var comment could be more descriptive — The comment at L109 explains the var is for test overrides but doesn't describe what the manifest contains or its expected format.

Note

This review supplements the existing Copilot-bot findings regarding (1) HTTP fetch held while
egionsCache.mu is locked and (2) io.ReadAll without a size limit — both of which remain unaddressed on the latest commit.


Summary: 2 High, 5 Medium, 2 Low findings. The High findings (no fallback + no retry) are the main blockers — they introduce a resilience regression vs. the previous hardcoded approach.

Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is clean - runtime fetch with caching, structured errors, and normalization all follow the extension's existing patterns well.

Main concern: this introduces a hard network dependency where there wasn't one. If https://aka.ms/azd-ai-agents/regions is unreachable (corp proxy, offline, outage), azd init fails before the user can even select a subscription. The old hardcoded list had no network requirement at this stage. Consider embedding the committed hosted-agent-regions.json as a compile-time fallback so the fetch degrades gracefully instead of halting init.

Also: the JSON manifest drops several regions from the old hardcoded list (eastus, canadaeast, centralus, germanywestcentral, italynorth, southcentralus, uaenorth, uksouth, westeurope). If that's intentional, a quick note in the PR description would help reviewers.

Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/init_locations.go
Antriksh Jain and others added 3 commits April 27, 2026 23:53
Fetch the regions manifest without holding regionsCache.mu so a caller's
canceled context returns immediately instead of waiting up to the fetch
timeout. The fetch is coordinated via an in-flight handle so concurrent
callers share a single network round-trip; on failure the in-flight slot
clears so the next caller retries instead of latching the error.

Also bound the response body with io.LimitReader (1 MiB cap) to guard
against unexpectedly large or hostile responses from the source URL.

Adds tests for the size cap and for concurrent callers sharing a single
fetch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If the live manifest fetch fails (transient network, restrictive proxy,
outage), fall back to the build-time embedded copy of
hosted-agent-regions.json so 'azd init' is not blocked.

The JSON moves into the cmd package because //go:embed cannot reference
files outside the package directory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use context.WithoutCancel(callerCtx) so the goroutine inherits ctx values
without being abortable by any single caller. Resolves the gosec G118 lint
failure flagging context.Background() inside the goroutine.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@trangevi trangevi merged commit 0a113e6 into Azure:main Apr 28, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hide unsupported regions for hosted agents

7 participants