Enhance crash telemetry with richer diagnostics and EndBuild hang detection#13304
Merged
YuliiaKovalova merged 3 commits intomainfrom Mar 2, 2026
Merged
Enhance crash telemetry with richer diagnostics and EndBuild hang detection#13304YuliiaKovalova merged 3 commits intomainfrom
YuliiaKovalova merged 3 commits intomainfrom
Conversation
…ection - Add StackCaller: skips throw-helper frames to find actual crash site - Add FullStackTrace: multi-frame sanitized trace (4096 char cap) - Add ExceptionMessage: truncated + path-redacted to avoid PII - Add CrashThreadName: captures thread identity at crash time - Add EndBuild hang detection: replace infinite WaitOne() with timed 30s loops that emit periodic diagnostic telemetry - Add CrashExitType.EndBuildHang with 6 diagnostic properties: EndBuildWaitPhase, EndBuildWaitDurationMs, PendingSubmissionCount, SubmissionsWithResultNoLogging, ThreadExceptionRecorded, UnmatchedProjectStartedCount - Add DumpHangDiagnosticsToFile: persists hang state to disk - PII protection: regex path redaction in exception messages, SanitizeFilePathsInText for stack traces, SanitizeStackFrame for individual frames - 61 tests covering all new functionality
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enhances MSBuild's crash telemetry infrastructure to improve diagnostics for two critical scenarios: crashes through throw-helpers (like ErrorUtilities.ThrowInternalError) and EndBuild hangs. Previously, all throw-helper crashes appeared identical in telemetry because only the throw-helper frame was captured, making triage nearly impossible. Additionally, when EndBuild hung indefinitely, no diagnostic data was emitted since the process never reached the finally block.
Changes:
- Added richer crash diagnostics:
StackCaller(skips throw-helpers to reveal the actual bug location),FullStackTrace(complete sanitized stack),ExceptionMessage(truncated with PII redaction), andCrashThreadName - Implemented EndBuild hang detection by replacing infinite
WaitOne()calls with 30-second timed loops that periodically emit diagnostic telemetry viaCrashExitType.EndBuildHang - Applied comprehensive PII protection through path redaction in stack frames, exception messages, and diagnostic outputs
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/ExceptionHandling.cs | Added DumpHangDiagnosticsToFile to persist hang diagnostics to disk for later retrieval from customer machines |
| src/Framework/Telemetry/CrashTelemetryRecorder.cs | Added CollectAndEmitEndBuildHangDiagnostics for immediate hang telemetry emission and defined 30-second diagnostic interval constant |
| src/Framework/Telemetry/CrashTelemetry.cs | Added new telemetry properties (StackCaller, FullStackTrace, ExceptionMessage, CrashThreadName, EndBuild hang properties), PII sanitization methods (TruncateMessage, ExtractStackCaller, ExtractFullStackTrace, SanitizeFilePathsInText), and new EndBuildHang exit type |
| src/Framework.UnitTests/CrashTelemetry_Tests.cs | Added 16 new tests covering StackCaller extraction for all throw-helpers, PII redaction, truncation, hang diagnostic serialization, and verification that dropped properties are absent |
| src/Build/BackEnd/BuildManager/BuildManager.cs | Replaced infinite WaitOne() calls in EndBuild with timed 30-second loops that emit periodic diagnostics via EmitEndBuildHangDiagnostics |
Account for the suffix length when truncating, so total output (content + '... [truncated]') never exceeds MaxStackTraceLength. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add null-forgiving operator after ShouldNotBeNull() assertions for StackTop and FullStackTrace properties.
MichalPavlik
approved these changes
Mar 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When MSBuild crashes via throw-helpers like
ErrorUtilities.ThrowInternalError, crash telemetry previously captured only the throw-helper frame inStackTop— making triage nearly impossible since allInternalErrorExceptioncrashes look identical. Additionally, whenEndBuild()hangs waiting for submissions or nodes, no telemetry was emitted at all because the crash telemetry in thefinallyblock is unreachable during a hang.This PR addresses both problems.
Changes
Richer crash diagnostics (all crash types)
StackCallerThrowInternalError,VerifyThrow, etc.) — the frame you actually need for triageFullStackTraceExceptionMessageCrashThreadNameEndBuild hang detection
Replaces infinite
WaitOne()calls inEndBuild()with timed 30-second loops that emit periodic diagnostic telemetry viaCrashExitType.EndBuildHang:EndBuildWaitPhaseEndBuildWaitDurationMsPendingSubmissionCountSubmissionsWithResultNoLoggingLoggingCompletedis false — the ones blocking EndBuildThreadExceptionRecordedUnmatchedProjectStartedCountHang state is also persisted to
%TEMP%\MSBuild_pid-{pid}.hang.txtviaDumpHangDiagnosticsToFilefor later retrieval from customer machines.PII protection
All new telemetry properties are sanitized to prevent PII leaks:
<redacted>(preserves line numbers)C:\...) and Unix (/...) paths →<path>MSB0001: Internal MSBuild Error:prefix