-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double swap on interrupted revert #485
Conversation
I confirm this fix. I've tested this PR with my app from #480. |
0aa34d3
to
8fd2b83
Compare
Note: I force pushed a commit with documentation and comment changes. I did not change any code with this force push. (I messed up the trailer structure representation in my first commit). |
@ccollins476ad Please cherry-pick the commit from PR #482 |
@ccollins476ad From your commit message:
More importantly than |
@utzig, that is a good point, and something that I completely missed. At the very least, I will update the commit message. Do you think this could cause a problem? I think the scratch area is guaranteed to be large enough to accommodate this change, but I could use a second opinion! |
@ccollins476ad I don't think it will cause issues; so I was thinking about this and it seems to me that if you cannot fit the data of the last sector with metadata on scratch, than it won't fit on the last sector as well, so in the end this seems like a good move into simplifying the code. Also metadata on scratch is only ever used it there's data to be swapped in any slot's last sectors, thus it's a bit of a corner case that would be better off having a specific test added on the simulator. Your changes LGTM, I just need to finish reviewing one commit (the big one!). |
@@ -616,6 +642,15 @@ boot_set_pending(int permanent) | |||
rc = boot_write_image_ok(fap); | |||
} | |||
|
|||
if (rc == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BOOT_MAGIC_GOOD case swap_type should be checked and checked in case operation was interrupted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anyone would be able to schedule a new swap operation if we were resuming an interruption; what exact issue do you have mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think we need to worry about the resume case here. boot_set_pending()
doesn't get called from the boot loader, it gets called from the application. If the application is running, there can't be a swap in progress.
This is a little weird, though. If the application tries to schedule a swap operation, and one is already scheduled, we just return 0. It seems like we ought to check if the two swap operations have the same type. If they are different, we should return an error. I think this should go in a separate PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was a power-down had occurred just before writing the swap_type. Shouldn't an application be able to recognize such a undone request unambiguously, and be able to renew boot_set_pending() after that? For sure it is possible to check that the application had already collected proper upgrade image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvlsianpu I think I see what you mean.
Application schedules a "test" swap. This requires two flash writes:
- Write
magic
to secondary trailer. - Write
swap_type
to secondary trailer.
If the device reboots after step 1 (before step 2), the trailer will be incomplete.
However, I don't think this is a problem. Since this is a new swap (not a resume), mcuboot won't even look at the swap_type
field at startup. It should execute the "test" swap as expected.
If mcuboot then reboots in the middle of this swap, then we have a potential issue. Mcuboot will recognize that a swap was interrupted and needs to be resumed, but it won't be able to determine the swap type. I think this just puts us back where we were before this bug fix ("revert" will fail; all other swap types happen to work by chance). Since you cannot specify a "revert" swap type with the boot_set_pending()
function, I don't think there is a problem here.
I do think there is a general issue of the device resetting with a partially-written trailer. This issue has always existed, though, so it is probably best left for a separate PR.
Please let me know if I am still missing something.
@ccollins476ad Please fixup the last commit. |
Prior to this change, the scratch image trailer had a different format from a slot image trailer. Specifically: 1. The scratch trailer only contained a single set of status entries (three bytes); the slot trailer contained `BOOT_STATUS_MAX_ENTRIES` sets of status entries. 2. The scratch trailer did not contain the `copy_done` field. This inconsistency required some extra conditional logic in the trailer handling code. It is simpler to just use the same trailer format everywhere. This commit removes this inconsistency. Now, the scratch trailer structure is identical to that of the slot trailer. Signed-off-by: Christopher Collins <ccollins@apache.org>
Signed-off-by: Christopher Collins <ccollins@apache.org>
This fixes mcu-tools#480. When mcuboot rewrites image trailers during a swap, some information is lost. If a reset occurs before the swap completes, mcuboot may not be able to determine what which swap type to resume upon startup. Specifically, if a "revert" swap gets interupted, mcuboot will perform an extraneous swap on the subsequent boot. See mcu-tools#480 for details. This commit adds an additional field to the image trailer: `swap-type`. The new trailer structure is illustrated below: ``` 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ ~ ~ Swap status (BOOT_MAX_IMG_SECTORS * min-write-size * 3) ~ ~ ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Encryption key 0 (16 octets) [*] ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Encryption key 1 (16 octets) [*] ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Swap size | 0xff padding (4 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Swap type | 0xff padding (7 octets) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Copy done | 0xff padding (7 octets) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Image OK | 0xff padding (7 octets) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ MAGIC (16 octets) ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` The `swap-type` field contains one of the `BOOT_SWAP_TYPE_[...]` constants. Every time a trailer is written, this field is written along with it. When resuming an interrupted swap, mcuboot uses this field alone to determine the type of swap being resumed. For new swap operations (non-resume case), this field is not read at all; instead, mcuboot consults the `boot_swap_tables` array to determine the swap operation to perform (as it did prior to this commit). Some additional changes were necessary to make all the simulated unit tests pass: * Before initiating a new swap operation, always write the image trailer to the scratch area. This step allows mcuboot to persist the `swap-type` field somewhere before erasing the trailer in the primary slot. If a reset occurs immediately after the erase, mcuboot recovers by using the trailer in the scratch area. * Related to the above: if the scratch area is being used to hold status bytes (because there are no spare sectors in the primary slot), erase the scratch area immediately after the trailer gets written to the primary slot. This eliminates ambiguity regarding the location of the current trailer in case a reset occurs shortly afterwards. Signed-off-by: Christopher Collins <ccollins@apache.org>
Make images slightly larger to allow more swap status metadata to be written to flash, to increase amount of debugging info and possibility of failures on random write fails. Signed-off-by: Fabio Utzig <utzig@apache.org> Signed-off-by: Christopher Collins <ccollins@apache.org>
This extends the test+revert case with an interruption on the revert stage, as it was previously only interrupted on the test stage. For simplicity the interruption happens on the same interruption point for both test and revert stages. Signed-off-by: Fabio Utzig <utzig@apache.org> Signed-off-by: Christopher Collins <ccollins@apache.org>
Thanks, @utzig. I have fixed up that commit, and reworded the "scratch trailer structure" commit as discussed above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I suggest waiting for an approval by @nvlsianpu as well!
Mcuboot changed TLV in PR: Fix double swap on interrupted revert (mcu-tools/mcuboot#485) Above bugfix changes a little way for upgrade request. This path introduces re-formatted original mcuboot bootutil_misc.c code as much as it was reasonable which includes the bugfix adaptation and support for devices witch have bite erased sate different than 1 as well. Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>
Mcuboot changed TLV in PR: Fix double swap on interrupted revert (mcu-tools/mcuboot#485) Above bugfix changes a little way for upgrade request. This path introduces re-formatted original mcuboot bootutil_misc.c code as much as it was reasonable which includes the bugfix adaptation and support for devices witch have bite erased sate different than 1 as well. Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>
This fixes #480.
When mcuboot rewrites image trailers during a swap, some information is lost. If a reset occurs before the swap completes, mcuboot may not be able to determine which swap type to resume upon startup. Specifically, if a "revert" swap gets interupted, mcuboot will perform an extraneous swap on the subsequent boot. See #480 for details.
NOTE: A different PR (#482) adds unit tests which test this fix.
This PR adds an additional field to the image trailer:
swap-type
. The new trailer structure is illustrated below:The
swap-type
field contains one of theBOOT_SWAP_TYPE_[...]
constants. Every time a trailer is written, this field is written along with it. When resuming an interrupted swap, mcuboot uses this field alone to determine the type of swap being resumed. For new swap operations (non-resume case), this field is not read at all; instead, mcuboot consults theboot_swap_tables
array to determine the swap operation to perform (as it did prior to this PR).Some additional changes were necessary to make all the simulated unit tests pass:
Before initiating a new swap operation, always write the image trailer to the scratch area. This step allows mcuboot to persist the
swap-type
field somewhere before erasing the trailer in the primary slot. If a reset occurs immediately after the erase, mcuboot recovers by using the trailer in the scratch area.Related to the above: if the scratch area is being used to hold status bytes (because there are no spare sectors in the primary slot), erase the scratch area immediately after the trailer gets written to the primary slot. This eliminates ambiguity regarding the location of the current trailer in case a reset occurs shortly afterwards.