feat: add telemetry, documentation, and edge case handling for extension upgrade#7853
Conversation
1e87e77 to
2e112b1
Compare
jongio
left a comment
There was a problem hiding this comment.
Clean decomposition of the upgrade loop into upgradeOneExtension with proper context cancellation. Telemetry follows existing span patterns, and the atomic config write is a solid improvement over the truncate-and-write pattern.
One thing worth tracking: TestUpgradeOneExtension_NetworkFailure_SourceCreation documents that network failures during source resolution silently become "delisted" (0 matches, no error from SourceManager). In a --all batch with a network outage, every extension would show "no longer available in any configured registry" rather than a connectivity error. The isNetworkError check on FindExtensions errors works when the error propagates, but the silent-drop path in SourceManager bypasses it. Worth a follow-up if this UX gap matters for users.
2e112b1 to
18679f6
Compare
|
Addressed review feedback in commit 18679f6: Redundant type assertions (line 1567, @jongio): Simplified Network outage edge case (line 1218, @jongio): Created follow-up issue to surface SourceManager errors so network failures aren't silently treated as delisted extensions. (See #7973.) Rebase (line 44, @vhvb1989): Rebased onto updated |
jongio
left a comment
There was a problem hiding this comment.
Addresses my previous feedback. isNetworkError is cleaner now with just the net.Error interface check + string fallback. Follow-up issue #7973 for the SourceManager silent-drop path is the right call - that's a separate concern. The new upgrade_result.go extraction is clean, and the JSON contract documentation is helpful.
…ion upgrade - Add per-extension telemetry events (ext.upgrade) with name, version from/to, source, duration_ms, and outcome attributes - Add distinct promotion telemetry events (ext.promote) with source.from and source.to attributes for tracking registry promotion adoption - Update azd extension upgrade help text to document default registry behavior, auto-promotion, --source override, continue-on-error in --all mode, and --output json for structured reports - Handle delisted extensions as skipped (not failed) so batch continues - Detect network errors from FindExtensions/Upgrade and show retry suggestion distinguishing connectivity issues from missing extensions - Implement atomic config writes (write-to-temp-then-rename) to prevent corruption during interrupted batch upgrades - Add telemetry field constants for extension upgrade tracking - Update snapshot tests for new help text Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Verify atomic write already uses os.CreateTemp (no code change needed) - Add tests for temp file cleanup: success path, directory creation failure with clear error, and original file preservation on write failure - Add doc comments on upgradeOneExtension and emitPromotionEvent documenting the telemetry testing strategy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
18679f6 to
9116471
Compare
9116471 to
12b6c74
Compare
There was a problem hiding this comment.
Pull request overview
This PR enhances azd extension upgrade with per-extension telemetry, improved help text/snapshots, and more resilient behavior during batch upgrades (including delisted extensions and network failures), plus safer (atomic) config persistence.
Changes:
- Add OpenTelemetry event names and attribute keys for extension upgrade/promotion spans.
- Update
azd extension upgradehelp text and snapshot/fig spec outputs. - Implement atomic config file writes (temp file + rename) and add tests around atomicity/cleanup/overwrite behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/config/file_config_manager.go | Switch config saves to temp-file + rename for atomic writes. |
| cli/azd/pkg/config/file_config_manager_test.go | Add tests for atomic save behavior, overwrite, and temp-file cleanup. |
| cli/azd/internal/tracing/fields/fields.go | Add new tracing attribute keys for upgrade/promotion telemetry. |
| cli/azd/internal/tracing/events/events.go | Add new tracing event constants for upgrade/promotion. |
| cli/azd/cmd/extension.go | Add per-extension telemetry spans, improved help text, and delisted/network edge handling. |
| cli/azd/cmd/extension_upgrade_test.go | Update delisted behavior expectations and add tests for network error classification. |
| cli/azd/cmd/testdata/TestUsage-azd-extension.snap | Update usage snapshot for new upgrade description text. |
| cli/azd/cmd/testdata/TestUsage-azd-extension-upgrade.snap | Update usage snapshot for new upgrade description text. |
| cli/azd/cmd/testdata/TestFigSpec.ts | Update Fig completion descriptions for the upgrade command. |
| cli/azd/.vscode/cspell.yaml | Add “delisted” to spellchecker allowlist. |
12b6c74 to
8ee4616
Compare
Remove redundant net.DNSError and net.OpError type assertions from isNetworkError - both implement net.Error, which is already checked. Keep the interface check and string-matching fallback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8ee4616 to
909d36e
Compare
jongio
left a comment
There was a problem hiding this comment.
Both prior findings addressed - isNetworkError simplified to the net.Error interface check, and delisted extensions now skip cleanly in batch mode (issue #7973 tracks the SourceManager silent-drop path separately). Telemetry spans, atomic config writes, and network error handling all look solid with good test coverage.
jongio
left a comment
There was a problem hiding this comment.
Incremental change since my last review: mutex-based concurrency in file_config_manager.go is cleaner than the previous atomic temp-file-rename approach, and correctly avoids deadlock on the recursive vault save. Error messages also improved. No new concerns.
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Summary
Implements telemetry, documentation updates, and edge case handling for
azd extension upgrade, addressing #7840.Problem
No per-extension telemetry exists for upgrade operations. Help text doesn't document the new default registry or promotion behavior. Edge cases (delisted extensions, network failures) aren't handled gracefully, and config writes aren't atomic.
Solution
Per-Extension Telemetry
ext.upgradespan with: extension name, version from/to, source, duration_ms, outcomeext.promotespan with source.from/source.to for tracking adoption rates independentlyinternal/tracing/CLI Help Text Updates
ShortandLongdescriptions forazd extension upgrade--sourceoverride,--allcontinue-on-error,--output jsonDelisted Extension Handling
FindExtensions()returns 0 matches: reports(-) Skipped:with "extension no longer available in any configured registry"Network Failure Handling
isNetworkError()helper detectsnet.Error,net.DNSError,net.OpError, and common network error patterns(x) Failed:with actionable message; batch continuesAtomic Config Writes
pkg/config/file_config_manager.goChanges
New/Modified Files
internal/tracing/events/events.go— 2 new event constants (ext.upgrade,ext.promote)internal/tracing/fields/fields.go— 7 new telemetry field constantscmd/extension.go— Telemetry emission, help text, delisted handling, network error detectionpkg/config/file_config_manager.go— Atomic write patterncmd/extension_upgrade_test.go— Tests for delisted, network errorspkg/config/file_config_manager_test.go— Atomic write testsTesting
TestIsNetworkError— 9 table-driven cases for network error detectionTestUpgradeOneExtension_DelistedSkipped— verifies delisted → skipped behaviorTestUpgradeOneExtension_NetworkFailure— verifies network errors with retry guidanceTest_FileConfigManager_Save_Atomic— verifies atomic write patternTest_FileConfigManager_Save_Overwrite— verifies overwrite correctnessVerification
go build— compiles cleanlygolangci-lint— 0 issuesResolves #7840
Relates to #6235
Depends on #7852