Track NVMe unsafe shutdowns and flag controller resets by LucaCappelletti94 · Pull Request #1 · LucaCappelletti94/cargo-slow

LucaCappelletti94 · 2026-06-02T07:41:22Z

Adds the NVMe unsafe_shutdowns lifetime counter as a tracked metric and a live recommendation that flags it climbing without a reboot.

Tracking

The counter is parsed from the smartctl NVMe health log (nvme_smart_health_information_log.unsafe_shutdowns), with SATA power-loss attribute fallbacks for non-NVMe drives. It is aggregated per device and logged to CSV as two columns: smart_unsafe_shutdowns_total (scalar aggregate) and smart_unsafe_shutdowns (per-disk string, e.g. nvme0:44,nvme2:0). The dual representation mirrors the existing disk_temp_max / disk_temps convention, keeping which-drive detail in the log alongside a scalar to scan. The total also shows in the TUI Temps/Sys panel.

The fields stay None unless at least one disk actually reports the counter, so a SATA-only host does not log a misleading 0.

Detection

build_recommendations is a new history-aware entry point that combines the existing single-snapshot checks with checks that need more than one sample. generate_recommendations is unchanged, so its tests stay valid.

unsafe_shutdown_recommendation flags the counter rising while the host stays up. The baseline is taken from the start of the current uninterrupted uptime run, found by walking back through the in-memory history until uptime drops (a reboot boundary). Any increment that coincides with a real reboot or power loss is therefore excluded, and only the controller-reset-on-bus signature is reported. When it fires:

NVMe Reset Detected (Warning): N unsafe shutdown(s) with no reboot: controller reset itself. Check: dmesg | grep -i nvme. Try nvme_core.default_ps_max_latency_us=0 (APST), then firmware update; RMA if it recurs

Tests

Adds parsing of the NVMe health log field, per-device aggregation, and four detection cases: flat counter stays quiet, a rise with monotonic uptime fires, a rise across an uptime reset is ignored, and build_recommendations surfaces the finding end to end. Full suite is 54 passing, fmt and clippy -Dwarnings clean.

Track NVMe unsafe shutdowns and flag controller resets

05bafb2

LucaCappelletti94 merged commit b373082 into main Jun 2, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track NVMe unsafe shutdowns and flag controller resets#1

Track NVMe unsafe shutdowns and flag controller resets#1
LucaCappelletti94 merged 1 commit into
mainfrom
track-nvme-unsafe-shutdowns

LucaCappelletti94 commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LucaCappelletti94 commented Jun 2, 2026

Tracking

Detection

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant