Skip to content

boot/nxboot: add flush barriers and CRC-validate primary before boot#3428

Merged
michallenc merged 3 commits intoapache:masterfrom
neilberkman:fix/nxboot-power-loss-hardening
Mar 18, 2026
Merged

boot/nxboot: add flush barriers and CRC-validate primary before boot#3428
michallenc merged 3 commits intoapache:masterfrom
neilberkman:fix/nxboot-power-loss-hardening

Conversation

@neilberkman
Copy link
Contributor

Summary

Two hardening fixes for nxboot power-loss resilience:

  1. Flush barriers between critical partition operations — Add flash_partition_flush() calls after copy_partition() completes in perform_update(). Without explicit barriers, writes may remain buffered in RAM when nxboot proceeds to the next phase. A power loss between phases can leave the recovery image uncommitted while the staging partition has already been consumed.

  2. Full CRC validation before booting primary — Replace validate_image_header() with validate_image() in the final primary validation path of nxboot_perform_update(). The header-only check does not CRC-check the image body. After an interrupted update, a corrupt primary with an intact header would pass this check and be booted.

Impact

  • boot/nxboot only. No impact on other bootloaders or applications.
  • Adds one fsync() call after each copy_partition() in the update path. On platforms without write buffering this is a no-op.
  • Adds one full-image CRC computation at the end of the update path before booting. On SAMv7 this adds negligible boot time (confirmed by the nxboot maintainer).

Testing

Tested with Renode emulation fault injection on nucleo-h743zi nxboot. The flush barriers and CRC validation together eliminate the persistent boot failure observed when FTL write buffering is enabled (92/94 failure rate reduced to 0/94).

cederom
cederom previously approved these changes Mar 17, 2026
Copy link
Contributor

@cederom cederom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @neilberkman good catch! :-)

Two hardening fixes for nxboot power-loss resilience:

1. Add flash_partition_flush() calls between critical partition
   operations in perform_update(). Without explicit flush barriers,
   writes may remain buffered in RAM (e.g. via FTL rwbuffer) when
   nxboot proceeds to the next phase. A power loss between phases
   can leave the recovery image uncommitted while the staging
   partition has already been consumed.

   Flush points added:
   - After copy_partition(primary, recovery) completes
   - After copy_partition(update, primary) completes, before
     erasing the staging first sector

2. Replace validate_image_header() with validate_image() in the
   final primary validation path of nxboot_perform_update(). The
   header-only check validates magic and platform identifier but
   does not CRC-check the image body. After an interrupted update,
   a corrupt primary with an intact header would pass this check
   and be booted, resulting in a persistent boot failure.

Signed-off-by: Neil Berkman <neil@xuku.com>
@neilberkman neilberkman dismissed stale reviews from xiaoxiang781216 and cederom via 8dbc177 March 18, 2026 03:10
@neilberkman neilberkman force-pushed the fix/nxboot-power-loss-hardening branch from 6e81e34 to 8dbc177 Compare March 18, 2026 03:10
@neilberkman
Copy link
Contributor Author

neilberkman commented Mar 18, 2026

Force-pushed to fix a build error on arm-13: validate_image() takes a single fd argument, not two. Removed the stale &header parameter. No other changes — the fix is a one-line correction.

Update: new failures seem to be CI flakiness.

The comment previously stated CRC was not calculated before
boot. This is no longer accurate after adding full image CRC
validation in validate_image().

Signed-off-by: Neil Berkman <neil@xuku.com>
@neilberkman neilberkman force-pushed the fix/nxboot-power-loss-hardening branch from 225cc54 to 66c9805 Compare March 18, 2026 07:39
The header variable in nxboot_perform_update() is no longer
used after validate_image() was changed to take only the fd.

Signed-off-by: Neil Berkman <neil@xuku.com>
Copy link
Contributor

@michallenc michallenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tested both kernel and apps patches on SAMv7, everything works fine, thanks!

@cederom
Copy link
Contributor

cederom commented Mar 18, 2026

@michallenc when all is set please do the honors (merge) :-) :-)

@michallenc michallenc merged commit cb880b7 into apache:master Mar 18, 2026
40 checks passed
@neilberkman neilberkman deleted the fix/nxboot-power-loss-hardening branch March 18, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants