TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

VeijoPesonen · 2019-08-27T12:42:24Z

Description

When flashing a binary STLink won't skip writing padding which happens to be the same value as flash's erase value. STM32L4 based targets have an additional 8-bit of embedded ECC for each 64-bit word of data. The initial value, when a sector is erased, for the ECC bits is 0xFF.
When you write the erase value to a given address these bits gets modified to something different due to the ECC algorithm in use. The visible bits are intact but difference in ECC value prevents flipping any 1's to 0's. Only way to proceed is to erase the whole sector.

Mbed OS HAL API doesn't provide a way to check is a sector erased or not. In this case code was relying on the fact that the erase value would indicate is a sector erased.

For further details please see STM32L475 Internal Flash driver write issue

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm
@SeppoTakalo
@JammuKekkonen - original author of the fix.

Release Notes

When flashing a binary STLink won't skip writing padding which happens to be the same value as flash's erase value. STM32L4 based targets have an additional 8-bit of embedded ECC for each 64-bit word of data. The initial value, when a sector is erased, for the ECC bits is 0xFF. When you write the erase value to a given address these bits gets modified to something different due to the ECC algoritm in use. The visible bits are intact but difference in ECC value prevents flipping any 1's to 0's. Only way to proceed is to erase the whole sector.

ciarmcom · 2019-08-27T13:00:47Z

@VeijoPesonen, thank you for your changes.
@SeppoTakalo @JammuKekkonen @kjbracey-arm @ARMmbed/mbed-os-storage @ARMmbed/mbed-os-maintainers please review.

SeppoTakalo · 2019-08-27T13:18:06Z

@VeijoPesonen Should we drop the whole int is_erase_unit_erased(uint8_t area, uint32_t offset, bool &erased); API as it is not working?

kjbracey · 2019-08-27T13:36:02Z

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

VeijoPesonen · 2019-08-28T06:16:19Z

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

davidsaada · 2019-08-28T06:51:25Z

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.
Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

TDBStore implements an "erase as you go" method of operation. This means that instead of erasing an entire TDBStore area upon init/reset/GC, we only erase the first sector (in order to keep the area invalid), and then upon crossing sector boundaries on writes, we check whether the next sector is erased. If not - we erase it. The reason for this MO is that the alternative of erasing the whole area in advance (in the aforementioned cases) can take an unacceptably long time. Again - we don't do it on each write, but only when we cross a sector boundary (otherwise it would be very inefficient).
In order to solve the problem you raised from its root, one will need to replace the current implementation of comparing to the erase value with the usage of a newly added "is sector erased" API. However, this would require an extremely large scale change. It would first need to be added to the HAL layer (where most implementations would still use the current way of comparing to the erase value). Then you will need to add this API to the block device's interface (BlockDevice.h) and to all relevant block devices.

SeppoTakalo · 2019-08-28T14:19:22Z

@0xc0170 This is needed for 5.14.

No need for more review. Separate email discussion ongoing whether this functionality has ever worked correctly, and whether we should drop it. But for now on, this immediate fix is required.

0xc0170 · 2019-08-28T14:28:31Z

CI started

mbed-ci · 2019-08-28T17:13:53Z

Test run: FAILED

Summary: 1 of 4 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

jenkins-ci/mbed-os-ci_build-IAR

0xc0170 · 2019-08-28T17:26:19Z

Build restarted, known build issue

mbed-ci · 2019-08-28T18:40:40Z

Test run: FAILED

Summary: 3 of 4 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

jenkins-ci/mbed-os-ci_build-GCC_ARM
jenkins-ci/mbed-os-ci_build-IAR
jenkins-ci/mbed-os-ci_build-ARM

0xc0170 · 2019-08-28T19:21:43Z

The crypto example should be fixed, CI was restarted

mbed-ci · 2019-08-29T00:23:45Z

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

VeijoPesonen changed the title ~~Bugfix: won't rely on erase value to detect is a sector erased~~ Bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019

VeijoPesonen changed the title ~~Bugfix: won't rely on flash erase value to detect is a sector erased~~ TDBStore bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019

ciarmcom requested review from JammuKekkonen, kjbracey, SeppoTakalo and a team August 27, 2019 13:00

ciarmcom added the needs: review label Aug 27, 2019

JammuKekkonen approved these changes Aug 27, 2019

View reviewed changes

0xc0170 requested a review from a team August 27, 2019 13:08

SeppoTakalo approved these changes Aug 27, 2019

View reviewed changes

0xc0170 added the release-version: 5.14.0-rc1 label Aug 28, 2019

0xc0170 approved these changes Aug 28, 2019

View reviewed changes

0xc0170 added needs: CI and removed needs: review labels Aug 28, 2019

0xc0170 added ready for merge and removed needs: CI labels Aug 29, 2019

0xc0170 merged commit c4a2e3f into ARMmbed:master Aug 29, 2019

0xc0170 removed the ready for merge label Aug 29, 2019

VeijoPesonen deleted the tdbstore_ecc_fix branch September 6, 2019 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

Uh oh!

VeijoPesonen commented Aug 27, 2019 •

edited

Loading

Uh oh!

ciarmcom commented Aug 27, 2019

Uh oh!

SeppoTakalo commented Aug 27, 2019

Uh oh!

kjbracey commented Aug 27, 2019

Uh oh!

VeijoPesonen commented Aug 28, 2019

Uh oh!

davidsaada commented Aug 28, 2019

Uh oh!

SeppoTakalo commented Aug 28, 2019

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 28, 2019

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 28, 2019

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 29, 2019

Uh oh!

Uh oh!

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

Uh oh!

Conversation

VeijoPesonen commented Aug 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Pull request type

Reviewers

Release Notes

Uh oh!

ciarmcom commented Aug 27, 2019

Uh oh!

SeppoTakalo commented Aug 27, 2019

Uh oh!

kjbracey commented Aug 27, 2019

Uh oh!

VeijoPesonen commented Aug 28, 2019

Uh oh!

davidsaada commented Aug 28, 2019

Uh oh!

SeppoTakalo commented Aug 28, 2019

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 28, 2019

Test run: FAILED

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 28, 2019

Test run: FAILED

Uh oh!

0xc0170 commented Aug 28, 2019

Uh oh!

mbed-ci commented Aug 29, 2019

Test run: SUCCESS

Uh oh!

Uh oh!

VeijoPesonen commented Aug 27, 2019 •

edited

Loading