[Diagnostics] In-Proc CrashReport lifecycle#128738
Merged
Merged
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce InProcCrashReportLifecycle to own the on-disk lifecycle of
in-proc crash reports: choosing the output directory, validating its
writability, generating report names, and enforcing the retention cap so
the crash path only has to emit a report into a prepared location.
The crash-time entry points (preparing the report file, building paths)
remain allocation-free and signal-safe; the initialization-time work
(directory probing, pruning) runs once during startup before the crash
handler is armed and may use the heap and non-async-signal-safe calls.
Shared string copy/append helpers used by both the reporter and the
lifecycle are factored into CrashReportStringUtils
(crashreportstringutils.{h,cpp}) so the two callers no longer carry
duplicate implementations.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add design notes co-located with the in-proc crash report lifecycle source, capturing the Android/iOS crash-artifact investigation, the configuration and on-disk layout, the init- vs crash/signal-path split, the retention model, and the key design decisions and review findings that shaped the implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FinishReportFile published the completed report via link() followed by an unconditional unlink() of the temp file, discarding the link() return value. Android and Apple mobile app-private storage reject hard links with EPERM, so every crash report was silently lost on the exact platforms this in-proc feature targets. Replace the link+unlink commit with a same-directory rename, which is POSIX async-signal-safe and permitted in those sandboxes. A best-effort access(final, F_OK) check before the rename preserves the exclusive-create intent that link's EEXIST previously provided; the final name is collision-resistant by construction, so the residual TOCTOU window is benign. On any failure the temp file is still removed. Update the header contract and design doc to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ProbeDirectoryWritable now also exercises a same-directory rename (the primitive FinishReportFile uses to publish a completed report), not just create/delete. If a sandbox permits create/delete but rejects rename, file output is disabled at initialization with a clear diagnostic instead of silently losing every report on the signal path at crash time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds lifecycle-managed storage for CoreCLR in-proc crash reports, including a managed directory layout, retention modes, and safer temp-to-final publishing for platforms where reports are written from the faulting process.
Changes:
- Adds crash report lifecycle management with root path configuration, retention parsing, directory setup, pruning, and temp-file publishing.
- Wires lifecycle-managed output into the existing in-proc crash reporter while preserving legacy
DbgMiniDumpNamebehavior. - Adds shared crash-report string helpers, CMake integration, config entries, and design documentation.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/coreclr/vm/crashreportstackwalker.cpp |
Reads new crash-report lifecycle configuration and passes it into reporter settings. |
src/coreclr/inc/clrconfigvalues.h |
Adds CrashReportRootPath and CrashReportMaxFileCount configuration entries. |
src/coreclr/debug/crashreport/inproccrashreportlifecycle.h |
Declares the lifecycle manager for report directory setup, retention, and crash-path publishing. |
src/coreclr/debug/crashreport/inproccrashreportlifecycle.cpp |
Implements lifecycle initialization, report scanning, pruning, path construction, and final publishing. |
src/coreclr/debug/crashreport/inproccrashreportlifecycle.design.md |
Documents the design rationale and retention model. |
src/coreclr/debug/crashreport/inproccrashreporter.h |
Extends reporter settings with lifecycle root and retention options. |
src/coreclr/debug/crashreport/inproccrashreporter.cpp |
Integrates lifecycle-managed file output into report creation and finalization. |
src/coreclr/debug/crashreport/crashreportstringutils.h |
Adds shared signal-safe string utility declarations. |
src/coreclr/debug/crashreport/crashreportstringutils.cpp |
Implements shared bounded string copy/append helpers. |
src/coreclr/debug/crashreport/CMakeLists.txt |
Includes the new lifecycle and string utility sources in the crashreport object library. |
lateralusX
reviewed
Jun 1, 2026
lateralusX
reviewed
Jun 1, 2026
lateralusX
reviewed
Jun 1, 2026
lateralusX
reviewed
Jun 1, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
lateralusX
reviewed
Jun 2, 2026
Consolidates the follow-up changes addressing maintainer review of the in-proc crash report lifecycle into a single net change over the prior pushed state, so there are no intermediate commits that add behavior a later commit removes. Net behavior of the lifecycle and its reporter integration: - Publish completed reports with a same-directory rename guarded by an ENOENT existence check, instead of link(); rename is async-signal-safe and is permitted in the mobile app-private storage sandboxes where hard links are rejected with EPERM. - Probe the report directory for create/rename/delete at init using local stack buffers, disabling file output up front on an unusable directory. - Reuse CrashReportStringUtils::CopyString rather than a second local string-copy helper. - Require an absolute CrashReportRootPath; the runtime no longer expands a leading '~' or $HOME. The integration layer supplies a concrete path. - Drop the per-app process-name path segment: on mobile the root is already app-private, so reports live directly under <root>/.dotnet/crash-reports/. - Name reports with a nanosecond CLOCK_REALTIME timestamp and drop the suffix-retry loop; nanosecond resolution makes collisions vanishingly unlikely, so no disambiguating suffix is needed. - Replace the retention modes with a single bounded model: prune to maxFileCount at init (caching the oldest report only when at the bound) and unlink that cached oldest on the crash path before publishing a new report, so the bound holds whether or not a crash follows. - Remove the process-liveness probe for stale temp cleanup; a leftover .tmp can only come from a defunct run of the same app, so it is removed unconditionally. - Make the lifecycle-managed root the only JSON sink: drop DbgMiniDumpName (and its template expansion and cached hostname) from the in-proc reporter. With no root configured the reporter runs compact-log-only. - Clamp configured maxFileCount to [1, INT32_MAX], falling back to the default for out-of-range or unparseable values. - Update the design notes to match the implemented model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integrate the in-proc crash report watchdog feature from upstream with the crash report lifecycle work. Conflict resolution: - InProcCrashReporterSettings keeps lifecycleRootPath (replacing the removed reportPath/DbgMiniDumpName mechanism) and adds the watchdog timeoutSeconds. - CreateReport keeps the watchdog CrashReportWatchdogScope. - CrashReportConfigure wires both the lifecycle (lifecycleRootPath/maxFileCount) and the watchdog (timeoutSeconds); the old reportPath/dumpName assignment is dropped. - GetCrashReportMaxFileCount now shares the watchdog's TryParseCrashReportConfigurationInteger bounded parser (strict, [1, INT32_MAX]) instead of a local strtol, so both crash-report config integers validate the same way. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename settings field lifecycleRootPath -> reportRootPath for clarity and consistency with the CrashReportRootPath configuration knob. - Inline the single-use DeleteCachedReport helper into the crash path. - Document why retention pruning uses a linear oldest-scan rather than a min-heap. - Document why TryParseUnsigned is used instead of strtoul/strtoull. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove duplicate <errno.h> (crashreportstackwalker.cpp) and <stdlib.h> (inproccrashreporter.cpp) includes. - Correct the CrashReportMaxFileCount config description to a positive integer bound; the -1/0 modes were dropped during the earlier simplification. - Inline strtoull/strtoul in TryParseReportName and remove the TryParseUnsigned helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lateralusX
reviewed
Jun 3, 2026
lateralusX
reviewed
Jun 3, 2026
lateralusX
reviewed
Jun 3, 2026
lateralusX
reviewed
Jun 3, 2026
lateralusX
reviewed
Jun 3, 2026
Address reviewer feedback on the in-proc crash report lifecycle init path: - Value-initialize the kept-report array so any never-populated slots stay well-defined (hygiene; the slots were never read uninitialized). - Populate each kept slot directly from the local full path instead of copying into a temporary FileInfo and then assigning the whole nested struct, removing a redundant ~1KB path copy per retained report. The full-set overflow comparison is inlined (timestamp, then path) to avoid materializing a candidate FileInfo. - Use O_TRUNC instead of O_CREAT|O_EXCL for the throwaway temp report file and the directory-writability probe. Neither needs the never-overwrite guarantee, so O_TRUNC tolerates a stale leftover while still failing on a genuinely unwritable directory. This also lets the probe drop its preceding unlink of a stale probe file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On platforms where createdump can't fork a helper process (notably Android and Apple mobile under CoreCLR), crash reports are written in-proc from the faulting thread. Today those reports are written to a single configured path with no management: nothing controls where they accumulate or how many are kept, so they pile up unbounded in app-private storage.
This feature gives the in-proc crash reporter a durable, bounded, app-local directory for its *.crashreport.json output. The design borrows the proven mechanics that the mobile platforms already use for their own crash artifacts:
Android (tombstones). The OS keeps a bounded rotating set of tombstone files. tombstoned writes completed dumps to /data/tombstones/tombstone_00, tombstone_01, …, with a default cap of 32 (tombstoned.max_tombstone_count). Dumps are first written to anonymous temp files and only linked into their final tombstone_NN path once the dumper reports completion, so a failed or timed-out dump never leaves a named partial tombstone.
iOS / Apple. Crash reports are OS-managed diagnostic logs (.ips/.crash), generated after the app is allowed to finish crashing. They live in the device's Analytics Data and are surfaced via Xcode/Console, TestFlight, and the App Store. Retention and pruning are entirely Apple/OS-managed — there is no app-controlled rotation contract.
This PR introduces an opt-in crash-report lifecycle that owns the report directory layout and applies a retention policy, all using async-signal-safe primitives on the crash path.
<root>/.dotnet/crash-reports/report-<timestampNs>-<pid>.crashreport.jsonConfiguration
Testing
Validated end-to-end on an Android x64 emulator (CoreCLR): the bounded retention policy holds (oldest pruned at the cap), reports are valid JSON, the never-overwrite invariant holds, and the init probe leaves no stray artifacts. Design notes are in inproccrashreportlifecycle.design.md.