Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDBStore bugfix: won't rely on flash erase value to detect is a sector erased #11349

Merged
merged 1 commit into from Aug 29, 2019

Conversation

@VeijoPesonen
Copy link
Contributor

commented Aug 27, 2019

Description

When flashing a binary STLink won't skip writing padding which happens to be the same value as flash's erase value. STM32L4 based targets have an additional 8-bit of embedded ECC for each 64-bit word of data. The initial value, when a sector is erased, for the ECC bits is 0xFF.
When you write the erase value to a given address these bits gets modified to something different due to the ECC algorithm in use. The visible bits are intact but difference in ECC value prevents flipping any 1's to 0's. Only way to proceed is to erase the whole sector.

Mbed OS HAL API doesn't provide a way to check is a sector erased or not. In this case code was relying on the fact that the erase value would indicate is a sector erased.

For further details please see STM32L475 Internal Flash driver write issue

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm
@SeppoTakalo
@JammuKekkonen - original author of the fix.

Release Notes

When flashing a binary STLink won't skip writing padding which happens
to be the same value as flash's erase value. STM32L4 based targets
have an additional 8-bit of embedded ECC for each 64-bit word of data.
The initial value, when a sector is erased, for the ECC bits is 0xFF.
When you write the erase value to a given address these bits gets
modified to something different due to the ECC algoritm in use. The
visible bits are intact but difference in ECC value prevents flipping
any 1's to 0's. Only way to proceed is to erase the whole sector.
@VeijoPesonen VeijoPesonen changed the title Bugfix: won't rely on erase value to detect is a sector erased Bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019
@VeijoPesonen VeijoPesonen changed the title Bugfix: won't rely on flash erase value to detect is a sector erased TDBStore bugfix: won't rely on flash erase value to detect is a sector erased Aug 27, 2019
@ciarmcom ciarmcom requested review from JammuKekkonen, kjbracey-arm, SeppoTakalo and ARMmbed/mbed-os-maintainers Aug 27, 2019
@ciarmcom

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

@0xc0170 0xc0170 requested a review from ARMmbed/team-st-mcd Aug 27, 2019
@SeppoTakalo

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2019

@VeijoPesonen Should we drop the whole int is_erase_unit_erased(uint8_t area, uint32_t offset, bool &erased); API as it is not working?

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2019

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

@VeijoPesonen

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2019

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.

Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

@davidsaada

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2019

Hmm, this is quite interesting. There are presumably a large class of devices where 0xFFFFFFFF does mean erased, and you can always flip bits from 1->0. Do we have any other flash users around that try to do that sort of thing? If so, they'd not work in this ECCed flash either.
Does this end up greatly increasing the number of erase cycles? Do higher levels do erase then write, and this then ends up doing erase,erase,write? There must have been a reason for putting the is_erased optimisation in in the first place, right?

The original implementation is done by @davidsaada - PR #8667. David, what was the original reason to add the check - if a region is already erased - before actually carrying out the procedure?

TDBStore implements an "erase as you go" method of operation. This means that instead of erasing an entire TDBStore area upon init/reset/GC, we only erase the first sector (in order to keep the area invalid), and then upon crossing sector boundaries on writes, we check whether the next sector is erased. If not - we erase it. The reason for this MO is that the alternative of erasing the whole area in advance (in the aforementioned cases) can take an unacceptably long time. Again - we don't do it on each write, but only when we cross a sector boundary (otherwise it would be very inefficient).
In order to solve the problem you raised from its root, one will need to replace the current implementation of comparing to the erase value with the usage of a newly added "is sector erased" API. However, this would require an extremely large scale change. It would first need to be added to the HAL layer (where most implementations would still use the current way of comparing to the erase value). Then you will need to add this API to the block device's interface (BlockDevice.h) and to all relevant block devices.

@SeppoTakalo

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2019

@0xc0170 This is needed for 5.14.

No need for more review. Separate email discussion ongoing whether this functionality has ever worked correctly, and whether we should drop it. But for now on, this immediate fix is required.

@0xc0170 0xc0170 added needs: CI and removed needs: review labels Aug 28, 2019
@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

CI started

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 28, 2019

Test run: FAILED

Summary: 1 of 4 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-IAR
@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

Build restarted, known build issue

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 28, 2019

Test run: FAILED

Summary: 3 of 4 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-GCC_ARM
  • jenkins-ci/mbed-os-ci_build-IAR
  • jenkins-ci/mbed-os-ci_build-ARM
@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 28, 2019

The crypto example should be fixed, CI was restarted

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 29, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@0xc0170 0xc0170 added ready for merge and removed needs: CI labels Aug 29, 2019
@0xc0170 0xc0170 merged commit c4a2e3f into ARMmbed:master Aug 29, 2019
25 checks passed
25 checks passed
continuous-integration/jenkins/pr-head This commit looks good
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Success!
Details
travis-ci/docs Success!
Details
travis-ci/doxy-spellcheck Success!
Details
travis-ci/events Success! Runtime is 8573 cycles.
Details
travis-ci/gitattributestest Success!
Details
travis-ci/include_check Success!
Details
travis-ci/licence_check Success!
Details
travis-ci/littlefs Success! Code size is 8464B.
Details
travis-ci/psa-autogen Success!
Details
travis-ci/tools-py2.7 Success!
Details
travis-ci/tools-py3.5 Success!
Details
travis-ci/tools-py3.6 Success!
Details
travis-ci/tools-py3.7 Success!
Details
@VeijoPesonen VeijoPesonen deleted the VeijoPesonen:tdbstore_ecc_fix branch Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.