Extension framework - propagate error suggestions from event handlers and custom service targets#7791
Extension framework - propagate error suggestions from event handlers and custom service targets#7791JeffreyCA wants to merge 3 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves azd’s extension framework error propagation so extension-authored structured errors (including Suggestion) survive event handler/custom service target boundaries and aren’t overridden by azd’s YAML-based suggestion pipeline.
Changes:
- Preserve original error types when raising a single event handler error (instead of flattening to a string).
- Extend the event gRPC contract to include structured
ExtensionErrordetails in handler status messages and unwrap them on the host side. - Skip azd’s YAML error-suggestion pipeline when an extension error already includes a suggestion; add a JSON-output regression test for structured errors.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/ext/event_dispatcher.go | Preserves single handler error types; aggregates suggestions from extension errors. |
| cli/azd/pkg/azdext/event_manager.go | Sends structured ExtensionError back to host on handler failure. |
| cli/azd/pkg/azdext/event.pb.go | Regenerated protobuf output to include ExtensionError fields. |
| cli/azd/internal/grpcserver/event_service.go | Unwraps structured handler errors coming back from extensions. |
| cli/azd/grpc/proto/event.proto | Adds ExtensionError fields to handler status messages. |
| cli/azd/cmd/middleware/error.go | Skips YAML suggestion pipeline when extension suggestion is present. |
| cli/azd/cmd/middleware/ux_test.go | Adds coverage for JSON output shape when returning ErrorWithSuggestion. |
Comments suppressed due to low confidence (1)
cli/azd/pkg/ext/event_dispatcher.go:157
- When aggregating multiple handler errors, the returned ErrorWithSuggestion sets Suggestion but leaves Message empty. Since Message is part of the structured error contract (and is asserted in other tests), consider setting Message to a user-friendly summary (or the combined error text) so JSON/UX output isn’t missing the explanatory field.
if len(suggestions) > 0 {
return &internal.ErrorWithSuggestion{
Err: combinedErr,
Suggestion: strings.Join(suggestions, "\n"),
}
…service targets Extension errors with suggestions (LocalError / ServiceError) raised from project/service event handlers and custom service targets were losing their structured information by the time they reached middleware, so the host fell back to generic error rendering and the YAML error-suggestion pipeline overrode the extension's specific guidance. - proto: add ExtensionError to ProjectHandlerStatus and ServiceHandlerStatus - event_manager: WrapError on the extension side when reporting failed status - event_service: UnwrapError on the host side to restore typed errors - ext/event_dispatcher: when only a single handler error occurs, return it unchanged so structured types survive for downstream errors.As; also pick up suggestions from azdext extension errors in the multi-error path - cmd/middleware/error: short-circuit the YAML pipeline when the error already carries an extension-supplied suggestion - cmd/middleware/ux_test: regression test that ErrorWithSuggestion in JSON output mode does not emit a leading consoleMessage envelope Fixes Azure#7706 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1e64bfc to
f1de4cf
Compare
Address PR Azure#7791 review feedback: - pkg/ext/event_dispatcher.go: fix "their" -> "there" typo in comment - pkg/azdext/event_manager_test.go: assert that the failed-status reply carries a populated ExtensionError (Message/Suggestion/LocalErrorDetail) so the host can unwrap it back into a typed *LocalError - cmd/middleware/error_test.go: add regression test that an azdext.LocalError with a Suggestion bypasses the YAML error-suggestion pipeline (using a message that would otherwise match the QuotaExceeded rule) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jongio
left a comment
There was a problem hiding this comment.
Approach looks right. The end-to-end story (extension -> ExtensionError proto -> UnwrapError on host -> middleware bypass when Suggestion is set) is coherent and the existing Copilot thread captures most of the technical depth. Two things worth tightening before merge.
Address PR Azure#7791 review feedback from @jongio: - event_dispatcher.go: collect suggestions with else-if so a wrapped ErrorWithSuggestion containing a *LocalError doesn't append both its own Suggestion and the inner extension error's Suggestion (azd-wrapped wins). - event_dispatcher_test.go: add Test_Single_Error_Preserves_Type to lock in the load-bearing single-error fast path that lets typed extension errors (LocalError, ErrorWithSuggestion) survive RaiseEvent for Azure#7706. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
wbreza
left a comment
There was a problem hiding this comment.
Thanks @JeffreyCA — the intent of preserving structured extension errors across the gRPC boundary is a solid direction, and the proto additions are cleanly scoped. Requesting changes because of one architectural gap (Links not transportable) plus a few correctness/test coverage items. Happy to pair on whichever option you pick for the Links question below.
🔴 Architectural gap — full payload not transported
H1. Links (and other ErrorWithSuggestion fields) cannot cross the gRPC boundary
Core error types vs. what the proto transports today:
| Field | errorhandler.ErrorWithSuggestion |
ExtensionError proto |
LocalError / ServiceError (extension side) |
|---|---|---|---|
Err (original) |
✅ | ❌ (only string message) | ❌ |
Message |
✅ | ✅ | ✅ |
Suggestion |
✅ | ✅ | ✅ |
Links []ErrorLink (URL+Title) |
✅ | ❌ not in proto | ❌ not in struct |
Code / Category (local) |
— | ✅ | ✅ |
ErrorCode / StatusCode / ServiceName (service) |
— | ✅ | ✅ |
Concretely: grpc/proto/errors.proto:35-47 has no links field, and pkg/azdext/extension_error.go's LocalError/ServiceError lack a Links field. An extension that wants to surface "see for help" through UxMiddleware (cmd/middleware/ux.go:71-83) can't — the links are dropped before they ever reach the host. This partially defeats the PR's stated intent of preserving rich extension-authored error guidance.
Options to close the gap
Option A — Minimal: add repeated ErrorLink links to ExtensionError (plus matching Links []ErrorhandlerLink on LocalError/ServiceError, and round-trip in WrapError/UnwrapError).
- Pros: small surface (one proto field, one struct field, two helper updates), fully proto-typed, forward-compatible additive change.
- Cons: doesn't preserve opaque error chain; each future
ErrorWithSuggestionfield requires another proto bump. - Effort: small (~1 day incl. tests).
Option B — Canonical: google.rpc.Status + google.protobuf.Any rich-error details. Define ErrorWithSuggestionDetails (message/suggestion/links/err/code/category), pack into Any, return as gRPC status.Error.
- Pros: industry-standard gRPC rich-error pattern; already used in this repo (
extensions/azure.ai.agents/internal/exterrors/errors.go:216-220unpackserrdetails.ErrorInfo); full-fidelity; extensions can add custom detail types without further proto changes. - Cons: larger refactor (service-side error returns become
status.Error(...); envelopes and handler signatures change). - Effort: medium (~3–5 days).
Option C — JSON blob (bytes error_payload_json). Serialize ErrorWithSuggestion directly.
- Pros: trivial, preserves all fields implicitly.
- Cons: opaque to proto tooling; no schema evolution safety; harder to debug on the wire.
- Effort: smallest.
Option D — gRPC trailing metadata. Not recommended: metadata is string-only, size-limited, and duplicates the role of google.rpc.Status + Details.
Recommendation
Start with Option A for this PR (keeps the scope manageable and unblocks the stated goal), and open a follow-up to migrate to Option B as the long-term direction — especially given this codebase already consumes google.rpc.Status details in at least one extension. Option B also eliminates the need for H1-style table tracking in the future: any new ErrorWithSuggestion field just rides inside the Any payload.
If you prefer to go straight to Option B in this PR, I'm happy to help sketch the proto + envelope changes.
🔴 Other High
H2. Host-side UnwrapError round-trip untested
cli/azd/internal/grpcserver/event_service.go:182,303 calls azdext.UnwrapError(statusMsg.*.Error), but no test drives a failed ProjectHandlerStatus{Error: &ExtensionError{...}} through createProjectEventHandler / createServiceEventHandler and asserts the returned typed error. The extension-side test only verifies status.Error is populated; WrapError/UnwrapError are tested in isolation. Add a round-trip case in event_service_test.go asserting errors.AsType[*azdext.LocalError] on the returned error (including Suggestion, and eventually Links).
H3. Backward-compat fallback untested
When an older extension sends Status:"failed" with Error == nil, the code must fall through to the original fmt.Errorf("extension %s project hook %s failed: %s", ...). No test covers this path. Silent drop risk for pre-structured-error extensions — worth a two-line regression test.
🟡 Medium
M1. UnwrapError treats empty Message as nil
pkg/azdext/extension_error.go:152-153 — return nil when msg.GetMessage() == "". An extension constructing &LocalError{Code:"x", Suggestion:"y", Message:""} round-trips to nil, and the host then builds "extension X hook Y failed: " with an empty trailer. Guard should be msg == nil, and WrapError should default Message to err.Error() for typed errors with empty Message.
M2. Forward-compat contract not documented in .proto
grpc/proto/event.proto:73,88 — the new ExtensionError error field is silently ignored by older azd CLIs (proto3 unknown-field behavior), so new extensions must also continue populating status+message. The .proto doc currently says only "Optional structured error details". Please add a note: "Also populate message for compatibility with older azd hosts that don't read this field." A third-party extension author reading only the proto today could reasonably populate error alone and break users on older azd.
M3. Middleware bypass gated only on Suggestion
cli/azd/cmd/middleware/error.go:179-181 — bypass triggers only if azdext.ErrorSuggestion(err) != "". A typed *LocalError{Code:"x", Category:"validation", Suggestion:""} still enters errorPipeline.Process; if a YAML rule matches the message, the returned *ErrorWithSuggestion wraps the typed error and downstream telemetry classification no longer sees the LocalError type. This risks mis-attributing error.service.name for a purely local error (AGENTS.md: "Only set error.service.name when an external service actually returned the error"). Suggest bypassing when errors.AsType[*LocalError]/[*ServiceError] succeeds, regardless of Suggestion.
M4. ServiceError variant uncovered in handler round-trip tests
event_manager_test.go:178-218, 333-375 only return *LocalError from failing handlers. The WrapError branch for *azdext.ServiceError (Origin=SERVICE, ServiceErrorDetail fields) at extension_error.go:75-88 is not exercised by the new handler tests. Add a parametrized case.
M5. context.Background() in new tests
event_dispatcher_test.go:485, 491, 506, 512 — 4 new call sites use context.Background(). AGENTS.md mandates t.Context() (Go 1.26).
🟢 Low
- L1. Multi-handler aggregation still flattens types —
pkg/ext/event_dispatcher.go:130-148useserrors.New(strings.Join(lines, ","))when ≥2 handlers fail, discarding typed*LocalError/*ServiceError. Considererrors.Join(handlerErrors...)soerrors.AsTypestill works upstream. - L2. Host loses extension identity on the structured path —
internal/grpcserver/event_service.go:182,304returnsextErrunchanged, so debug logs no longer record whichextension.Id/eventNameproduced the failure. Alog.Printf("extension %s hook %s failed: %v", ...)right beforereturn extErrkeeps diagnostic context without changing UX. - L3. Default category drift for legacy extensions —
pkg/azdext/extension_error.go:177-196falls through to&LocalError{Category: LocalErrorCategoryLocal}forOrigin=UNSPECIFIEDwith no source. Previously these surfaced as*fmt.wrapError; worth a telemetry-dashboard regression check. - L4. Redundant message fields —
pkg/azdext/event_manager.go:235-245, 286-296sets bothhandlerMessageandhandlerError.Message. Harmless today but they can diverge if future types overrideError(). DerivehandlerMessagefromhandlerError.GetMessage()afterWrapError, or drop one. - L5. Tests not table-driven —
Test_Single_Error_Preserves_Typeuses two hand-writtent.Runblocks with ~90% duplicated setup (event_dispatcher_test.go:474+). AGENTS.md: "prefer table-driven tests." - L6. Inverse-case missing for middleware bypass —
error_test.go:235covers "extension suggestion present → bypass"; a companion assertingLocalError{Suggestion:""}still flows through the YAML pipeline would lock the semantics paired with M3 above.
Note: None of the above overlap with the 17 inline comments or the prior reviewers' feedback — these are supplemental. Happy to iterate on the Links design; Option A is a quick unblock and Option B can be the follow-up.
jongio
left a comment
There was a problem hiding this comment.
Re-reviewed against 62e665f. My earlier comments are resolved.
@wbreza's review covers the remaining depth - H1 (Links not transportable across the gRPC boundary), M1 (UnwrapError treats empty Message as nil), and M3 (middleware bypass gated only on Suggestion, which leaks error.service.name for typed local errors) are the load-bearing items. Agree with all three.
One concrete add on the Copilot wrap-with-%w threads: the structured-error path in event_service.go silently drops debug context. The new branch returns extErr directly, and there's no log statement above it that captures extension.Id + eventName + the error. wbreza's L2 captures this. A two-line log.Printf right before return extErr keeps the user-facing message clean (which is the point of declining the wrap) while preserving diagnostic context for support investigations.
Resolves #7706
With this PR, extension-authored error suggestions now flow through to the user when an error is raised from a project/service event handler or a custom service target.
Previously, the rich
LocalError/ServiceErrorreturned by the extension was flattened into a plain string before reaching the host's middleware, so:Suggestiontext was dropped, and