Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

Merged
merged 1 commit into from
Aug 29, 2019
Merged

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

merged 1 commit into from
Aug 29, 2019

Conversation

VeijoPesonen
Copy link
Contributor

@VeijoPesonen VeijoPesonen commented Aug 27, 2019

Description

When flashing a binary STLink won't skip writing padding which happens to be the same value as flash's erase value. STM32L4 based targets have an additional 8-bit of embedded ECC for each 64-bit word of data. The initial value, when a sector is erased, for the ECC bits is 0xFF.
When you write the erase value to a given address these bits gets modified to something different due to the ECC algorithm in use. The visible bits are intact but difference in ECC value prevents flipping any 1's to 0's. Only way to proceed is to erase the whole sector.

Mbed OS HAL API doesn't provide a way to check is a sector erased or not. In this case code was relying on the fact that the erase value would indicate is a sector erased.

For further details please see STM32L475 Internal Flash driver write issue

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm
@SeppoTakalo
@JammuKekkonen - original author of the fix.

Release Notes

When flashing a binary STLink won't skip writing padding which happens
to be the same value as flash's erase value. STM32L4 based targets
have an additional 8-bit of embedded ECC for each 64-bit word of data.
The initial value, when a sector is erased, for the ECC bits is 0xFF.
When you write the erase value to a given address these bits gets
modified to something different due to the ECC algoritm in use. The
visible bits are intact but difference in ECC value prevents flipping
any 1's to 0's. Only way to proceed is to erase the whole sector.
@VeijoPesonen VeijoPesonen changed the title Bugfix: won't rely on erase value to detect is a sector erased Bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019
@VeijoPesonen VeijoPesonen changed the title Bugfix: won't rely on flash erase value to detect is a sector erased TDBStore bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019
@ciarmcom
Copy link
Member

@VeijoPesonen, thank you for your changes.
@SeppoTakalo @JammuKekkonen @kjbracey-arm @ARMmbed/mbed-os-storage @ARMmbed/mbed-os-maintainers please review.

@0xc0170 0xc0170 requested a review from a team August 27, 2019 13:08
@SeppoTakalo
Copy link
Contributor

@VeijoPesonen Should we drop the whole int is_erase_unit_erased(uint8_t area, uint32_t offset, bool &erased); API as it is not working?

@kjbracey
Copy link
Contributor

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

@VeijoPesonen
Copy link
Contributor Author

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

@davidsaada
Copy link
Contributor

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.
Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

TDBStore implements an "erase as you go" method of operation. This means that instead of erasing an entire TDBStore area upon init/reset/GC, we only erase the first sector (in order to keep the area invalid), and then upon crossing sector boundaries on writes, we check whether the next sector is erased. If not - we erase it. The reason for this MO is that the alternative of erasing the whole area in advance (in the aforementioned cases) can take an unacceptably long time. Again - we don't do it on each write, but only when we cross a sector boundary (otherwise it would be very inefficient).
In order to solve the problem you raised from its root, one will need to replace the current implementation of comparing to the erase value with the usage of a newly added "is sector erased" API. However, this would require an extremely large scale change. It would first need to be added to the HAL layer (where most implementations would still use the current way of comparing to the erase value). Then you will need to add this API to the block device's interface (BlockDevice.h) and to all relevant block devices.

@SeppoTakalo
Copy link
Contributor

@0xc0170 This is needed for 5.14.

No need for more review. Separate email discussion ongoing whether this functionality has ever worked correctly, and whether we should drop it. But for now on, this immediate fix is required.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 28, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Aug 28, 2019

Test run: FAILED

Summary: 1 of 4 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-IAR

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 28, 2019

Build restarted, known build issue

@mbed-ci
Copy link

mbed-ci commented Aug 28, 2019

Test run: FAILED

Summary: 3 of 4 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-GCC_ARM
  • jenkins-ci/mbed-os-ci_build-IAR
  • jenkins-ci/mbed-os-ci_build-ARM

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 28, 2019

The crypto example should be fixed, CI was restarted

@mbed-ci
Copy link

mbed-ci commented Aug 29, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@0xc0170 0xc0170 merged commit c4a2e3f into ARMmbed:master Aug 29, 2019
@VeijoPesonen VeijoPesonen deleted the tdbstore_ecc_fix branch September 6, 2019 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants