feat: launch Agent Inspector from azd ai agent run by anchenyi · Pull Request #8264 · Azure/azure-dev

anchenyi · 2026-05-20T07:27:02Z

Summary

azd ai agent run now automatically launches Agent Inspector for the local agent run. Users can opt out with --no-inspector.

Changes

Add --no-inspector to azd ai agent run.
Launch Agent Inspector through the azd workflow service.
Suppress Inspector CLI output when launched from azd ai agent run, so the terminal keeps showing the agent run output.
Show install guidance when azure.ai.inspector is not installed:
- azd extension install azure.ai.inspector
Keep azd ai inspector as a command group; Inspector only starts from azd ai inspector launch.

Inspector Web Demo

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds automatic Agent Inspector launch when running a local agent via azd ai agent run, with an opt-out flag, and introduces a standalone azure.ai.inspector extension that serves the Inspector SPA and proxies JSON-RPC/WebSocket + HTTP/SSE traffic to the local agent.

Changes:

Add --no-inspector to azd ai agent run and launch Inspector via the azd workflow service using a silent mode.
Implement the Inspector extension server/runtime (embedded SPA, WS JSON-RPC bridge, HTTP fetch/invoke + SSE proxying) and add tests.
Update extension packaging (version bump, metadata/README/changelog, build scripts, lint config) and include bundled SPA assets.

Reviewed changes

Copilot reviewed 34 out of 119 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
cli/azd/extensions/azure.ai.inspector/version.txt	Bumps inspector extension version.
cli/azd/extensions/azure.ai.inspector/main.go	Updates module import path for extension entrypoint.
cli/azd/extensions/azure.ai.inspector/internal/version/version.go	Adds build-time version/commit/date variables.
cli/azd/extensions/azure.ai.inspector/internal/inspector/server_test.go	Adds server routing test for SPA index fallback.
cli/azd/extensions/azure.ai.inspector/internal/inspector/server.go	Adds local HTTP server + SPA asset handler + WS endpoint.
cli/azd/extensions/azure.ai.inspector/internal/inspector/rpc.go	Implements JSON-RPC handling over WebSocket for the SPA.
cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_sse_test.go	Adds test for upstream SSE cancellation on WS close.
cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_sse.go	Implements SSE proxying and streaming to SPA (and optional sink).
cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_fetch.go	Implements HTTP fetch/invoke proxying and response shaping.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/index.html	Adds embedded SPA entrypoint HTML.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/qwen-T0CAGeOv.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/qwen-B0layTYq.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/minimax20-MO7AnRq7.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/minimax-1g4txH6T.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/kimi_o_80-BtOGfzOQ.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/kimi_o_20-DhxCfzjk.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/hljs/github.css	Adds bundled SPA syntax highlighting theme.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/hljs/github-dark.css	Adds bundled SPA syntax highlighting theme.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/foundry-openai-tmS3rrL9.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/foundry-openai-kEJt47xx.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-openai-OsxdyxhS.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-openai-Da_qp4ay.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-minimax-cFev7e0G.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-minimax-BCqYI7Go.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-kimi-BZ4xkefv.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-kimi-B-UniIay.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-deepseek-DDRLQatW.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/fireworks-deepseek-BG-COsKx.svg	Adds bundled SPA asset (composite icon).
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets/assets/ai21-labs-D2iYCjBT.svg	Adds bundled SPA asset.
cli/azd/extensions/azure.ai.inspector/internal/inspector/assets.go	Embeds SPA assets into the extension binary.
cli/azd/extensions/azure.ai.inspector/internal/cmd/version.go	Switches version reporting to new internal/version package.
cli/azd/extensions/azure.ai.inspector/internal/cmd/sse_render.go	Adds SSE rendering for terminal mirroring.
cli/azd/extensions/azure.ai.inspector/internal/cmd/root_test.go	Adds tests ensuring launch is an explicit subcommand + metadata.
cli/azd/extensions/azure.ai.inspector/internal/cmd/root.go	Refactors root command, adds launch/listen/metadata, adds debug logging hook.
cli/azd/extensions/azure.ai.inspector/internal/cmd/metadata.go	Removes redundant metadata command wrapper.
cli/azd/extensions/azure.ai.inspector/internal/cmd/inspector.go	Adds `azd ai inspector launch` command implementation.
cli/azd/extensions/azure.ai.inspector/internal/cmd/debug.go	Adds debug logging routing to file (or discard).
cli/azd/extensions/azure.ai.inspector/internal/cmd/context.go	Removes context command (no longer part of extension surface).
cli/azd/extensions/azure.ai.inspector/go.sum	Updates module dependency checksums.
cli/azd/extensions/azure.ai.inspector/go.mod	Renames module path and updates dependencies for inspector runtime.
cli/azd/extensions/azure.ai.inspector/extension.yaml	Updates extension metadata, version, examples, and required azd version.
cli/azd/extensions/azure.ai.inspector/cspell.yaml	Adds dictionary words for extension-specific terms.
cli/azd/extensions/azure.ai.inspector/ci-build.ps1	Updates build flags and linker vars to new version package + hardening.
cli/azd/extensions/azure.ai.inspector/build.sh	Updates linker vars path for version embedding.
cli/azd/extensions/azure.ai.inspector/build.ps1	Updates linker vars path for version embedding.
cli/azd/extensions/azure.ai.inspector/README.md	Adds detailed usage and development docs.
cli/azd/extensions/azure.ai.inspector/CHANGELOG.md	Updates changelog for 0.1.0-preview.
cli/azd/extensions/azure.ai.inspector/AGENTS.md	Adds extension-specific conventions/instructions.
cli/azd/extensions/azure.ai.inspector/.golangci.yaml	Adds lint configuration for the extension module.
cli/azd/extensions/azure.ai.inspector/.gitattributes	Disables whitespace normalization for bundled SPA artifacts.
cli/azd/extensions/azure.ai.agents/internal/cmd/run_test.go	Adds tests for `--no-inspector` and workflow-based inspector launch.
cli/azd/extensions/azure.ai.agents/internal/cmd/run.go	Launches inspector by default after agent is ready; adds opt-out and install guidance.

Comments suppressed due to low confidence (5)

cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_sse.go:1

This code emits user-facing output directly to stdout/stderr (printUserInput, fmt.Printf, fmt.Fprintln) even when the inspector is launched with --silent from azd ai agent run, which contradicts the PR’s goal to suppress inspector output. Additionally, the error message hardcodes POST even though the request method can vary. Route all output through the configurable logger/SSESink (or gate it on a verbosity/debug flag), and construct the error string using the resolved request method.
cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_sse.go:1
This code emits user-facing output directly to stdout/stderr (printUserInput, fmt.Printf, fmt.Fprintln) even when the inspector is launched with --silent from azd ai agent run, which contradicts the PR’s goal to suppress inspector output. Additionally, the error message hardcodes POST even though the request method can vary. Route all output through the configurable logger/SSESink (or gate it on a verbosity/debug flag), and construct the error string using the resolved request method.
cli/azd/extensions/azure.ai.inspector/internal/inspector/proxy_fetch.go:1
Echoing raw user input directly to stderr can leak potentially sensitive content into terminals/log capture (and currently bypasses --silent). Consider removing this entirely or only emitting it via the logger under an explicit debug/verbose flag so it is opt-in.
cli/azd/extensions/azure.ai.inspector/internal/inspector/server.go:1
The SPA fallback currently serves index.html for any fs.Stat error, not just missing files. That can mask real problems (e.g., permission/corrupt embed) and turn them into misleading 200 responses. Prefer falling back only on not-exist errors (e.g., errors.Is(err, fs.ErrNotExist)), and return a 500 for other errors.
cli/azd/extensions/azure.ai.inspector/internal/inspector/rpc.go:1
Spawning an unbounded goroutine per inbound WS message can create avoidable memory/CPU pressure if the client misbehaves (or if the UI sends bursts). Consider handling non-streaming methods inline, and only offloading streaming-related methods; or introduce a bounded worker/semaphore to limit concurrent handlers.

wbreza

Thanks for this — the Inspector auto-launch is a great UX. I've focused this review on the new attack surface, concurrency, and the default-behavior change. Findings are grouped by priority.

Existing Copilot review findings (proxy_sse stdout bypass, proxy_fetch user-input echo, server.go SPA fallback, rpc.go unbounded goroutine, debug.go log file leak, inspector.go scanner.Err) still apply at HEAD — please resolve those alongside the items below; several chain together.

🔴 Blockers

1. Local SSRF via unauthenticated WS + arbitrary-URL HTTP proxy

internal/inspector/server.go (handleWS) + proxy_fetch.go / proxy_sse.go

/agentdev/ws/rpc is exposed with no token / no handshake — only an Origin check (bypassable, see #2). Once connected, webviewProxy/fetch, webviewProxy/invoke, and webviewProxy/fetchSSE accept a client-supplied URL and forward it via http.NewRequest with no scheme/host validation.

Any local process (poisoned npm postinstall, malicious VS Code extension, a browser tab once #2 is exploited) can:

Exfiltrate IMDS managed-identity tokens (http://169.254.169.254/metadata/identity/...)
Read intranet services / loopback dev databases / other CLIs' RPC ports
POST arbitrary payloads to the local agent or any other localhost service

Auto-launching this from azd ai agent run by default makes the dev-machine surface materially larger.

Suggested fix:

Mint a per-session random token on Server.Start, embed it in index.html, require it on WS upgrade (Sec-WebSocket-Protocol or query param). Reject upgrades without it.
Constrain proxy URLs: scheme in {http, https}, host resolves to 127.0.0.1/::1/localhost, ideally port == cfg.AgentPort.

2. DNS rebinding bypasses `CheckOrigin`

internal/inspector/server.go:62-68

CheckOrigin: func(r *http.Request) bool {
    origin := r.Header.Get("Origin")
    if origin == "" { return true }
    return origin == "http://"+r.Host || origin == "https://"+r.Host
}

r.Host is attacker-controlled. Classic DNS-rebinding: developer visits evil.example → JS rebinds the host to 127.0.0.1 → browser sends Host: evil.example:8087, Origin: http://evil.example, check passes. Binding to 127.0.0.1 does not prevent this — the browser resolves locally itself.

Combined with #1, any browser visit to a malicious page while Inspector is running becomes click-zero exploitation for the full inspector lifetime.

Suggested fix: Validate r.Host against an explicit allowlist (127.0.0.1:<port>, [::1]:<port>, localhost:<port>); reject empty Origin on WS upgrades; ideally also gate on the session token from #1.

3. Nil-map panic race: `registerStream` after `cleanup()`

internal/inspector/rpc.go:94, 295 ↔ proxy_sse.go:46, proxy_fetch.go:138

handleWS dispatches each message in a new goroutine (go sess.handleMessage(&msg)). On WS close, defer sess.cleanup() sets s.streams = nil. An already-scheduled webviewProxy/fetchSSE or proxyInvoke goroutine then calls s.registerStream(id, cancel) → s.streams[id] = cancel on a nil map → process-wide panic. Reproducible on any disconnect-during-request scenario.

Suggested fix:

func (s *rpcSession) registerStream(id string, cancel context.CancelFunc) {
    s.streamsMu.Lock()
    defer s.streamsMu.Unlock()
    if s.closed { cancel(); return }
    s.streams[id] = cancel
}

…and in cleanup(): clear(s.streams) + s.closed = true under the same mutex.

4. No `recover` in per-message goroutines

internal/inspector/rpc.go:94

go sess.handleMessage(&msg) runs against client-controlled JSON; any nil-deref / OOB in any handler crashes the entire inspector process — and in auto-launch mode also propagates back to the parent azd ai agent run.

Suggested fix: Wrap the goroutine body in defer func() { if r := recover(); r != nil { /* log + send RPC error if msg.ID != nil */ } }().

🟠 Major

Empty Origin accepted unconditionally (server.go:64-66) — if origin == "" { return true } lets any non-browser local process reach #1's SSRF without needing the rebinding chain.
openUrlInBrowser is an unauth'd URI-handler launcher (rpc.go:186-201) — Any WS client can drive browser.OpenURL with arbitrary schemes (ms-msdt:, file:, vscode:, etc.). Pairs poorly with the Follina-class URI-handler vulnerabilities that ship regularly. Allowlist http/https only; reject embedded credentials.
Unbounded WS frame size & no deadlines → memory DoS (rpc.go:59-95) — No conn.SetReadLimit, no SetReadDeadline, no ping/pong. A single 10 GB frame OOMs the process. Set conn.SetReadLimit(1<<20), periodic ping/pong, cap concurrent sessions.
Body logging bypasses --silent via s.logger.Printf (proxy_fetch.go:111, 152) — s.logger.Printf("invoke ... body: %s", ..., p.Body) is not gated by cfg.Silent. With auto-launch, every prompt + every model response lands on the parent's stderr. (Distinct from Copilot's fmt.Fprintln finding in proxy_sse.go — this is the structured logger path.) When Silent, log only metadata (requestID, method, status, length).
Auto-launched inspector lifetime not bound to parent process (extensions/azure.ai.agents/internal/cmd/run.go:258-326 + inspector.go:124-132) — workflow.Run(ctx, ...) spawns inspector with no PID capture, no explicit Kill on cancel(). Please verify: after Ctrl+C on azd ai agent run, does netstat -ano | findstr :8087 show port 8087 still bound? If yes, inspector keeps proxying with no UI, subsequent runs collide on the port, and a stale proxy stays reachable. Capture child PID and kill on cleanup, or pass --parent-pid for self-termination.
streamSSELines SSESink deadlock / goroutine leak (proxy_sse.go:116-165 ↔ inspector.go:141 injectSSEEvents) — readSSEStream returns early on response.completed, dropping the inner pipe reader; injectSSEEvents blocks on Fprintln into an unbuffered pipe with no reader → goroutine + HTTP body pinned until streamCtx is cancelled. Happens on normal completion when trailing events arrive. Close the sink writer on completion, or make injectSSEEvents ctx-aware.
proxyInvoke streaming branch can leak response body on panic (proxy_fetch.go:137-145) — go s.pumpSSE(p.RequestID, resp, true) has no top-level defer resp.Body.Close(); body closure depends on the cancel-driven forcing goroutine that may not fire on panic / early return. Add defer resp.Body.Close() at top of pumpSSE (idempotent with existing path).
Default-behavior UX change is ungated and undocumented (extensions/azure.ai.agents/internal/cmd/run.go + azure.ai.agents/README.md) — Auto-launching Inspector is a material UX shift with no env-var/global opt-out (only per-invocation --no-inspector), no Preview gating per docs/reference/feature-status.md, no README update on the agents extension explaining the new default / --no-inspector / headless behavior, and no telemetry distinguishing auto-launch attempts/successes/failures. CI and headless dev environments will see surprising behavior. Add an AZD_AI_AGENT_AUTO_INSPECTOR=false (or similar) global opt-out, document the new default, mark as Preview, emit telemetry.
No Content-Security-Policy or hardening headers on the served SPA (server.go:129-137) — Only Content-Type + Cache-Control set. Given the 2.6 MB upstream JS bundle, a supply-chain compromise or XSS gets full origin powers. Add Content-Security-Policy: default-src 'self'; connect-src 'self' ws://localhost:<port>, X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: no-referrer.
Missing OSS attribution for embedded third-party SPA bundle — Embeds ~2.6 MB from qidon/tryinspector (KaTeX fonts, highlight.js themes, etc.) with no NOTICE / THIRD_PARTY_NOTICES / LICENSE references. Microsoft OSS policy generally requires attribution for redistributed third-party code, and incompatible licenses (GPL/AGPL) would block release. Please confirm upstream license compatibility and add the appropriate notices to cli/azd/extensions/azure.ai.inspector/.
No integration test that Inspector failure doesn't break agent run (extensions/azure.ai.agents/internal/cmd/run_test.go) — The core safety property of the new default-on behavior — "inspector launch failure must not crash agent run" — has no test. Mock the workflow client to return RPC errors and panics; assert the agent continues. Also add an explicit test that --no-inspector skips the workflow call entirely.

🟡 Minor

isInspectorExtensionMissingMessage brittle string match (run.go:369-373) — Substring-matches "unknown command"/"inspector"/"unknown flag: --port". If azd's workflow runner changes wording, install guidance silently regresses to a generic warning. Prefer a typed sentinel/status code, or rely on the existing pre-check.
Windows Ctrl+C kills agent child ungracefully (run.go:203-245) — exec.CommandContext on Windows = TerminateProcess; no chance to flush OTel/logs. Use CREATE_NEW_PROCESS_GROUP + GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT) with grace timeout.
loadAzdEnvironment error silently swallowed (run.go:177-182) — User gets confusing downstream "endpoint is empty" errors. Mirror the warning pattern used right below for resolveConnectionCredentials.
No size limit on error/buffered io.ReadAll(resp.Body) (proxy_sse.go:~76, proxy_fetch.go:~55) — Malicious/buggy upstream can OOM Inspector. Use io.LimitReader (4 KB for error bodies, 16 MB for buffered).
streamHTTPClient has no response-header timeout (proxy_fetch.go:58) — Same client is used for proxyInvoke before content-type is known; a hung non-SSE upstream blocks the goroutine. Set Transport.ResponseHeaderTimeout: 30s.
apim-request-id to stdout in standalone mode (proxy_sse.go:75-77) — fmt.Printf("Trace ID: %s\n", ...) writes to stdout; breaks | jq / stdout-as-data flows. Use stderr.
.golangci.yaml line-length 220 vs azd core 125 — Inherit from core or use file-scoped //nolint:lll for justified long lines (mostly the embedded asset SHA-stamped paths).
runInspector browser-open goroutine leak on Start failure (inspector.go:124-132) — Goroutine blocks on <-ready forever if Start fails before binding. Close ready in error path, or select on ctx.Done().
Defense-in-depth on assetsHandler path handling (server.go:139-150) — http.ServeMux cleans paths today, but a future re-mount with StripPrefix could bypass that. Explicit path.Clean + escape rejection.
Test coverage gaps — Missing tests for: SSE upstream 4xx/5xx + read-error paths, malformed SSE lines, SPA fallback for real fs.Stat errors vs ErrNotExist, JSON-RPC error response structure, very long data: lines (>64 KB scanner buffer), --no-inspector skips workflow call.
Test flakiness — time.Sleep / hardcoded timeouts in proxy_sse_test.go, server_test.go — replace with synchronization primitives or t.Context() deadlines.
Install warning has no rate-limit (run.go ~200) — Prints on every run if extension absent. Consider a once-per-day marker file.
CHANGELOG entry minimal — Agents extension uses richer entries linking PR/feature.
requiredAzdVersion accuracy (extension.yaml:8) — Verify >1.23.13 matches the minimum azd version supporting workflow.Run for extensions.
AGENTS.md error-handling guidance gap (extensions/azure.ai.inspector/AGENTS.md:51) — Clarify this extension uses plain wrapped errors (no exterrors), unlike the agents extension.

⚪ Nits

srv.URL() returns http://localhost:%d while listener binds 127.0.0.1 only (server.go:73-75, 85)
cspell.yaml — comment why azureaiinspector (Go module camelCase) is allowed
build.sh:12 — comment casing vs actual $EXTENSION_ID_SAFE
debug.go log file close error swallowed
root_test.go findCommand no recursion depth guard
run_test.go appendPortEnvVars relies on slice index ordering

Happy to discuss any of the above — particularly grateful for the contribution and the careful refactor of the extension internals. The blockers are all addressable with localized changes (session token + URL host check + nil-map fix + recover).

trangevi

There is a \azure-dev.github\workflows\approval-ext-azure-ai-inspector.yml file which controls a github pipeline to block PRs on a separate approval list from the standard codeowners file. This is intended to still get helpful reviews from the broader AZD group (in terms of sticking to AZD standards, GO best practices, etc.) while still blocking the PR on a smaller set of people who know the proper code flow for each extension. We should update that file with folks on your team for proper ownership of this new extension.

In .github/workflows/approval-ext-azure-ai-inspector.yml, add a REQUIRED_APPROVERS environment variable to the step:

       - name: Check for required team approval
         uses: actions/github-script@v9
         env:
           EXTENSION_PATH: "cli/azd/extensions/azure.ai.inspector/"
           WORKFLOW_PATH: ".github/workflows/approval-ext-azure-ai-inspector.yml"
           OVERRIDE_COMMAND: "/inspector-extension-approval override"
           REQUIRED_APPROVERS: '["user1", "user2", "user3"]'  # ← add this
         with:
           script: |
             const script = require('./.github/scripts/pr-approval-foundry-extensions-shared.js');
             await script({ github, context, core });

The shared script (line 31-33) checks for this env var and uses it instead of the default list.

anchenyi · 2026-05-21T06:20:29Z

Thanks for this — the Inspector auto-launch is a great UX. I've focused this review on the new attack surface, concurrency, and the default-behavior change. Findings are grouped by priority.

Existing Copilot review findings (proxy_sse stdout bypass, proxy_fetch user-input echo, server.go SPA fallback, rpc.go unbounded goroutine, debug.go log file leak, inspector.go scanner.Err) still apply at HEAD — please resolve those alongside the items below; several chain together.

🔴 Blockers

1. Local SSRF via unauthenticated WS + arbitrary-URL HTTP proxy

internal/inspector/server.go (handleWS) + proxy_fetch.go / proxy_sse.go

/agentdev/ws/rpc is exposed with no token / no handshake — only an Origin check (bypassable, see #2). Once connected, webviewProxy/fetch, webviewProxy/invoke, and webviewProxy/fetchSSE accept a client-supplied URL and forward it via http.NewRequest with no scheme/host validation.

Any local process (poisoned npm postinstall, malicious VS Code extension, a browser tab once #2 is exploited) can:

Exfiltrate IMDS managed-identity tokens (http://169.254.169.254/metadata/identity/...)

Read intranet services / loopback dev databases / other CLIs' RPC ports

POST arbitrary payloads to the local agent or any other localhost service

Auto-launching this from azd ai agent run by default makes the dev-machine surface materially larger.

Suggested fix:

Mint a per-session random token on Server.Start, embed it in index.html, require it on WS upgrade (Sec-WebSocket-Protocol or query param). Reject upgrades without it.

Constrain proxy URLs: scheme in {http, https}, host resolves to 127.0.0.1/::1/localhost, ideally port == cfg.AgentPort.

2. DNS rebinding bypasses CheckOrigin

internal/inspector/server.go:62-68
CheckOrigin: func(r *http.Request) bool {
    origin := r.Header.Get("Origin")
    if origin == "" { return true }
    return origin == "http://"+r.Host || origin == "https://"+r.Host
}
r.Host is attacker-controlled. Classic DNS-rebinding: developer visits evil.example → JS rebinds the host to 127.0.0.1 → browser sends Host: evil.example:8087, Origin: http://evil.example, check passes. Binding to 127.0.0.1 does not prevent this — the browser resolves locally itself.

Combined with #1, any browser visit to a malicious page while Inspector is running becomes click-zero exploitation for the full inspector lifetime.

Suggested fix: Validate r.Host against an explicit allowlist (127.0.0.1:<port>, [::1]:<port>, localhost:<port>); reject empty Origin on WS upgrades; ideally also gate on the session token from #1.

3. Nil-map panic race: registerStream after cleanup()

internal/inspector/rpc.go:94, 295 ↔ proxy_sse.go:46, proxy_fetch.go:138

handleWS dispatches each message in a new goroutine (go sess.handleMessage(&msg)). On WS close, defer sess.cleanup() sets s.streams = nil. An already-scheduled webviewProxy/fetchSSE or proxyInvoke goroutine then calls s.registerStream(id, cancel) → s.streams[id] = cancel on a nil map → process-wide panic. Reproducible on any disconnect-during-request scenario.

Suggested fix:
func (s *rpcSession) registerStream(id string, cancel context.CancelFunc) {
    s.streamsMu.Lock()
    defer s.streamsMu.Unlock()
    if s.closed { cancel(); return }
    s.streams[id] = cancel
}
…and in cleanup(): clear(s.streams) + s.closed = true under the same mutex.

4. No recover in per-message goroutines

internal/inspector/rpc.go:94

go sess.handleMessage(&msg) runs against client-controlled JSON; any nil-deref / OOB in any handler crashes the entire inspector process — and in auto-launch mode also propagates back to the parent azd ai agent run.

Suggested fix: Wrap the goroutine body in defer func() { if r := recover(); r != nil { /* log + send RPC error if msg.ID != nil */ } }().

🟠 Major

Empty Origin accepted unconditionally (server.go:64-66) — if origin == "" { return true } lets any non-browser local process reach Test pipeline triggers #1's SSRF without needing the rebinding chain.

openUrlInBrowser is an unauth'd URI-handler launcher (rpc.go:186-201) — Any WS client can drive browser.OpenURL with arbitrary schemes (ms-msdt:, file:, vscode:, etc.). Pairs poorly with the Follina-class URI-handler vulnerabilities that ship regularly. Allowlist http/https only; reject embedded credentials.

Unbounded WS frame size & no deadlines → memory DoS (rpc.go:59-95) — No conn.SetReadLimit, no SetReadDeadline, no ping/pong. A single 10 GB frame OOMs the process. Set conn.SetReadLimit(1<<20), periodic ping/pong, cap concurrent sessions.

Body logging bypasses --silent via s.logger.Printf (proxy_fetch.go:111, 152) — s.logger.Printf("invoke ... body: %s", ..., p.Body) is not gated by cfg.Silent. With auto-launch, every prompt + every model response lands on the parent's stderr. (Distinct from Copilot's fmt.Fprintln finding in proxy_sse.go — this is the structured logger path.) When Silent, log only metadata (requestID, method, status, length).

Auto-launched inspector lifetime not bound to parent process (extensions/azure.ai.agents/internal/cmd/run.go:258-326 + inspector.go:124-132) — workflow.Run(ctx, ...) spawns inspector with no PID capture, no explicit Kill on cancel(). Please verify: after Ctrl+C on azd ai agent run, does netstat -ano | findstr :8087 show port 8087 still bound? If yes, inspector keeps proxying with no UI, subsequent runs collide on the port, and a stale proxy stays reachable. Capture child PID and kill on cleanup, or pass --parent-pid for self-termination.

streamSSELines SSESink deadlock / goroutine leak (proxy_sse.go:116-165 ↔ inspector.go:141 injectSSEEvents) — readSSEStream returns early on response.completed, dropping the inner pipe reader; injectSSEEvents blocks on Fprintln into an unbuffered pipe with no reader → goroutine + HTTP body pinned until streamCtx is cancelled. Happens on normal completion when trailing events arrive. Close the sink writer on completion, or make injectSSEEvents ctx-aware.

proxyInvoke streaming branch can leak response body on panic (proxy_fetch.go:137-145) — go s.pumpSSE(p.RequestID, resp, true) has no top-level defer resp.Body.Close(); body closure depends on the cancel-driven forcing goroutine that may not fire on panic / early return. Add defer resp.Body.Close() at top of pumpSSE (idempotent with existing path).

Default-behavior UX change is ungated and undocumented (extensions/azure.ai.agents/internal/cmd/run.go + azure.ai.agents/README.md) — Auto-launching Inspector is a material UX shift with no env-var/global opt-out (only per-invocation --no-inspector), no Preview gating per docs/reference/feature-status.md, no README update on the agents extension explaining the new default / --no-inspector / headless behavior, and no telemetry distinguishing auto-launch attempts/successes/failures. CI and headless dev environments will see surprising behavior. Add an AZD_AI_AGENT_AUTO_INSPECTOR=false (or similar) global opt-out, document the new default, mark as Preview, emit telemetry.

No Content-Security-Policy or hardening headers on the served SPA (server.go:129-137) — Only Content-Type + Cache-Control set. Given the 2.6 MB upstream JS bundle, a supply-chain compromise or XSS gets full origin powers. Add Content-Security-Policy: default-src 'self'; connect-src 'self' ws://localhost:<port>, X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: no-referrer.

Missing OSS attribution for embedded third-party SPA bundle — Embeds ~2.6 MB from qidon/tryinspector (KaTeX fonts, highlight.js themes, etc.) with no NOTICE / THIRD_PARTY_NOTICES / LICENSE references. Microsoft OSS policy generally requires attribution for redistributed third-party code, and incompatible licenses (GPL/AGPL) would block release. Please confirm upstream license compatibility and add the appropriate notices to cli/azd/extensions/azure.ai.inspector/.

No integration test that Inspector failure doesn't break agent run (extensions/azure.ai.agents/internal/cmd/run_test.go) — The core safety property of the new default-on behavior — "inspector launch failure must not crash agent run" — has no test. Mock the workflow client to return RPC errors and panics; assert the agent continues. Also add an explicit test that --no-inspector skips the workflow call entirely.

🟡 Minor

isInspectorExtensionMissingMessage brittle string match (run.go:369-373) — Substring-matches "unknown command"/"inspector"/"unknown flag: --port". If azd's workflow runner changes wording, install guidance silently regresses to a generic warning. Prefer a typed sentinel/status code, or rely on the existing pre-check.

Windows Ctrl+C kills agent child ungracefully (run.go:203-245) — exec.CommandContext on Windows = TerminateProcess; no chance to flush OTel/logs. Use CREATE_NEW_PROCESS_GROUP + GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT) with grace timeout.

loadAzdEnvironment error silently swallowed (run.go:177-182) — User gets confusing downstream "endpoint is empty" errors. Mirror the warning pattern used right below for resolveConnectionCredentials.

No size limit on error/buffered io.ReadAll(resp.Body) (proxy_sse.go:~76, proxy_fetch.go:~55) — Malicious/buggy upstream can OOM Inspector. Use io.LimitReader (4 KB for error bodies, 16 MB for buffered).

streamHTTPClient has no response-header timeout (proxy_fetch.go:58) — Same client is used for proxyInvoke before content-type is known; a hung non-SSE upstream blocks the goroutine. Set Transport.ResponseHeaderTimeout: 30s.

apim-request-id to stdout in standalone mode (proxy_sse.go:75-77) — fmt.Printf("Trace ID: %s\n", ...) writes to stdout; breaks | jq / stdout-as-data flows. Use stderr.

.golangci.yaml line-length 220 vs azd core 125 — Inherit from core or use file-scoped //nolint:lll for justified long lines (mostly the embedded asset SHA-stamped paths).

runInspector browser-open goroutine leak on Start failure (inspector.go:124-132) — Goroutine blocks on <-ready forever if Start fails before binding. Close ready in error path, or select on ctx.Done().

Defense-in-depth on assetsHandler path handling (server.go:139-150) — http.ServeMux cleans paths today, but a future re-mount with StripPrefix could bypass that. Explicit path.Clean + escape rejection.

Test coverage gaps — Missing tests for: SSE upstream 4xx/5xx + read-error paths, malformed SSE lines, SPA fallback for real fs.Stat errors vs ErrNotExist, JSON-RPC error response structure, very long data: lines (>64 KB scanner buffer), --no-inspector skips workflow call.

Test flakiness — time.Sleep / hardcoded timeouts in proxy_sse_test.go, server_test.go — replace with synchronization primitives or t.Context() deadlines.

Install warning has no rate-limit (run.go ~200) — Prints on every run if extension absent. Consider a once-per-day marker file.

CHANGELOG entry minimal — Agents extension uses richer entries linking PR/feature.

requiredAzdVersion accuracy (extension.yaml:8) — Verify >1.23.13 matches the minimum azd version supporting workflow.Run for extensions.

AGENTS.md error-handling guidance gap (extensions/azure.ai.inspector/AGENTS.md:51) — Clarify this extension uses plain wrapped errors (no exterrors), unlike the agents extension.

⚪ Nits

srv.URL() returns http://localhost:%d while listener binds 127.0.0.1 only (server.go:73-75, 85)

cspell.yaml — comment why azureaiinspector (Go module camelCase) is allowed

build.sh:12 — comment casing vs actual $EXTENSION_ID_SAFE

debug.go log file close error swallowed

root_test.go findCommand no recursion depth guard

run_test.go appendPortEnvVars relies on slice index ordering

Happy to discuss any of the above — particularly grateful for the contribution and the careful refactor of the extension internals. The blockers are all addressable with localized changes (session token + URL host check + nil-map fix + recover).

@wbreza

Thanks for the detailed review. I went through the blocker, major, minor, and nit items and made targeted changes based on the threat model for this feature.

Agent Inspector is a local developer UI served on loopback and launched from azd ai agent run. The goal is to keep the local happy path simple while preventing the concrete abuse cases: using the inspector as an open proxy to arbitrary network targets, launching arbitrary URI handlers, leaking raw prompt/response bodies in silent mode, or leaving stale listeners after the agent run exits.

For items implemented exactly as requested, I marked them as resolved. For items that are partial or intentionally left out of this PR, I included the reason.

Blockers

#	Status	Reply
1. Local SSRF via WS + arbitrary proxy URL	Partially addressed	Proxy URLs are constrained to `http`/`https`, local hosts only (`localhost`, `127.0.0.1`, `::1`), and the configured agent port. This prevents proxying to IMDS, intranet hosts, unrelated local services, or a different localhost port. I did not add a per-session token in this PR because the proxy URL allowlist plus Host/Origin checks cover the concrete proxy-abuse path without changing the local request contract. If we want a stronger CSRF-style boundary for the loopback UI, I would track that as a follow-up design change.
2. DNS rebinding / `CheckOrigin`	Resolved
3. Stream registration race after cleanup	Resolved
4. Panic in per-message goroutines	Resolved

Major

#	Status	Reply
5. Empty `Origin` accepted	Resolved
6. `openUrlInBrowser` arbitrary URI launcher	Resolved
7. WS frame size / deadlines	Partially addressed	Added `SetReadLimit`, read deadline, pong deadline refresh, periodic ping, and write deadline. The frame limit is bounded but set at 16 MB to avoid breaking larger local prompt/request payloads. I did not add a concurrent session cap because the server is loopback-only and now has bounded frame size/deadline behavior; we can add a cap later if there is a concrete stress case.
8. Body logging with `--silent`	Resolved
9. Inspector lifetime after Ctrl+C	Verified, no code change	Verified with `azd-dev ai agent run`: Ctrl+C releases both 8087 and 8088. I did not observe a stale inspector process or port collision.
10. SSESink deadlock / goroutine leak	Resolved
11. Streaming response body close	Resolved
12. Default behavior docs / gating / telemetry	Partially addressed	Added agents README documentation for the default inspector launch and `--no-inspector`. I did not update the changelog because changelogs are maintained with release/version updates. I also did not add a global env opt-out, preview gate, or telemetry in this PR. Since this is a local developer convenience launched from an explicit local run command, I think the per-command `--no-inspector` opt-out is sufficient for this iteration.
13. CSP and hardening headers	Resolved
14. OSS attribution	Resolved
15. Tests for inspector launch failure / `--no-inspector`	Partially addressed	Added targeted tests that workflow launch failure only warns and does not fail agent run, and that `--no-inspector` skips workflow launch. I did not add full E2E coverage in this PR.

Minor

#	Status	Reply
16. Brittle missing-extension string match	Not changed	There is now an installed-extension pre-check before launch. The string match is only fallback handling for unexpected workflow errors, so I left it as-is instead of introducing a new typed workflow error contract in this PR.
17. Windows Ctrl+C graceful child shutdown	Not changed	This is broader process-control behavior. I verified the required lifecycle property for this PR: Ctrl+C releases both the agent and inspector ports.
18. `loadAzdEnvironment` error swallowed	Resolved
19. No size limit on buffered/error bodies	Resolved
20. `streamHTTPClient` response-header timeout	Not changed	The streaming client is used for long-lived SSE paths and is bound to the per-stream context. Adding a short response-header timeout risks breaking slow local agents; I left this out unless we have a concrete hang case to tune against.
21. `apim-request-id` to stdout	Resolved
22. `.golangci.yaml` line length	Not changed	This is lint configuration churn and does not affect the security or runtime behavior addressed in this PR.
23. Browser-open goroutine on `Start` failure	Resolved
24. Asset handler path/fallback hardening	Resolved
25. Test coverage gaps	Partially addressed	Added targeted tests for the current fixes, including body limits and asset fallback behavior. I did not expand this into full E2E or exhaustive SSE/parser coverage in this PR.
26. Test sleep/timeouts	Not changed	Existing tests are passing; replacing the broader timeout style is test refactoring outside the current fix.
27. Install warning rate-limit	Not changed	This is product behavior/state management and would add marker-file complexity. The current warning is explicit and actionable.
28. CHANGELOG entry	Not changed	Changelog updates are release/version-bump work, not part of this PR fix.
29. `requiredAzdVersion` accuracy	Verified, no code change	`WorkflowService.Run` was introduced by #4985 and first appears in stable `azd` 1.15.0, so `>1.23.13` is stricter than the workflow-service minimum. I left the current value because this extension is built and tested against the current azd extension SDK line and stays aligned with the existing `azure.ai.agents` extension requirement.
30. AGENTS.md error-handling guidance	Resolved

Nits

Item	Status	Reply
`srv.URL()` uses `localhost` while listener binds `127.0.0.1`	Resolved
`cspell.yaml` comment for `azureaiinspector`	Resolved
`build.sh` comment casing	Resolved
`debug.go` close error swallowed	Resolved
`root_test.go findCommand` recursion guard	Resolved
`run_test.go appendPortEnvVars` ordering	Resolved

anchenyi · 2026-05-21T06:21:56Z

There is a \azure-dev.github\workflows\approval-ext-azure-ai-inspector.yml file which controls a github pipeline to block PRs on a separate approval list from the standard codeowners file. This is intended to still get helpful reviews from the broader AZD group (in terms of sticking to AZD standards, GO best practices, etc.) while still blocking the PR on a smaller set of people who know the proper code flow for each extension. We should update that file with folks on your team for proper ownership of this new extension.
In .github/workflows/approval-ext-azure-ai-inspector.yml, add a REQUIRED_APPROVERS environment variable to the step:

       - name: Check for required team approval
         uses: actions/github-script@v9
         env:
           EXTENSION_PATH: "cli/azd/extensions/azure.ai.inspector/"
           WORKFLOW_PATH: ".github/workflows/approval-ext-azure-ai-inspector.yml"
           OVERRIDE_COMMAND: "/inspector-extension-approval override"
           REQUIRED_APPROVERS: '["user1", "user2", "user3"]'  # ← add this
         with:
           script: |
             const script = require('./.github/scripts/pr-approval-foundry-extensions-shared.js');
             await script({ github, context, core });

The shared script (line 31-33) checks for this env var and uses it instead of the default list.

Added "anchenyi", "XiaofuHuang", "swatDong" in the REQUIRED_APPROVERS list

anchenyi added 5 commits May 20, 2026 14:22

feat: migrated from qidon/tryinspector

d180789

chore: azd ai inspector launch

5581f61

fix: version

8f03c50

feat: auto launch inspector in azd ai agent run

bfb5546

chore: update SPA

9de65bc

microsoft-github-policy-service Bot added the customer-reported identify a customer issue label May 20, 2026

anchenyi marked this pull request as ready for review May 20, 2026 07:31

Copilot AI review requested due to automatic review settings May 20, 2026 07:31

anchenyi requested review from JeffreyCA, hemarina, jongio, rajeshkamal5050, tg-msft, therealjohn, trangevi, trrwilson, vhvb1989, wbreza and weikanglim as code owners May 20, 2026 07:31

Copilot started reviewing on behalf of anchenyi May 20, 2026 07:37 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread cli/azd/extensions/azure.ai.inspector/internal/cmd/debug.go

Comment thread cli/azd/extensions/azure.ai.inspector/ci-build.ps1 Outdated

Comment thread cli/azd/extensions/azure.ai.inspector/internal/cmd/inspector.go

fix: address inspector review comments

3409ee1

wbreza reviewed May 20, 2026

View reviewed changes

trangevi requested changes May 20, 2026

View reviewed changes

Comment thread cli/azd/extensions/azure.ai.agents/internal/cmd/run.go

JeffreyCA added the ext-agents azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes} extensions label May 20, 2026

This was referenced May 20, 2026

refactor: migrate project endpoint commands to new scaffold #8243

Open

feat(toolboxes): add azd ai toolbox direct commands #8203

Open

trangevi linked an issue May 20, 2026 that may be closed by this pull request

Add azd ai agent Inspector auto-launch and structured event stream for the agent inner loop #8193

Open

chore: set inspector extension approvers

f172bed

anchenyi requested a review from danieljurek as a code owner May 21, 2026 01:38

anchenyi added 5 commits May 21, 2026 10:51

fix: resolve blockers

2d2bfc1

fix: resolve majors

fe36192

fix: resolve minors and nits

53fa6b7

fix: minors and nits

276bf94

fix: CI

52d6264

anchenyi requested a review from trangevi May 21, 2026 06:22

chore: update REQUIRED_APPROVERS

7eb5418

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: launch Agent Inspector from azd ai agent run#8264

feat: launch Agent Inspector from azd ai agent run#8264
anchenyi wants to merge 13 commits into
Azure:mainfrom
anchenyi:feat/inspector-extension

anchenyi commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Uh oh!

trangevi left a comment

Uh oh!

Uh oh!

anchenyi commented May 21, 2026

🔴 Blockers

1. Local SSRF via unauthenticated WS + arbitrary-URL HTTP proxy

2. DNS rebinding bypasses `CheckOrigin`

3. Nil-map panic race: `registerStream` after `cleanup()`

4. No `recover` in per-message goroutines

🟠 Major

🟡 Minor

⚪ Nits

Uh oh!

anchenyi commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

anchenyi commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Inspector Web Demo

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

🔴 Blockers

1. Local SSRF via unauthenticated WS + arbitrary-URL HTTP proxy

2. DNS rebinding bypasses CheckOrigin

3. Nil-map panic race: registerStream after cleanup()

4. No recover in per-message goroutines

🟠 Major

🟡 Minor

⚪ Nits

Uh oh!

trangevi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anchenyi commented May 21, 2026

🔴 Blockers

1. Local SSRF via unauthenticated WS + arbitrary-URL HTTP proxy

2. DNS rebinding bypasses CheckOrigin

3. Nil-map panic race: registerStream after cleanup()

4. No recover in per-message goroutines

🟠 Major

🟡 Minor

⚪ Nits

Blockers

Major

Minor

Nits

Uh oh!

anchenyi commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

anchenyi commented May 20, 2026 •

edited

Loading

2. DNS rebinding bypasses `CheckOrigin`

3. Nil-map panic race: `registerStream` after `cleanup()`

4. No `recover` in per-message goroutines

2. DNS rebinding bypasses `CheckOrigin`

3. Nil-map panic race: `registerStream` after `cleanup()`

4. No `recover` in per-message goroutines