Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlashIAP driver: Add retries to erase and program operations. #9548

Merged
merged 1 commit into from
Jan 30, 2019

Conversation

davidsaada
Copy link
Contributor

Description

Few boards may fail the write actions due to HW limitations (like critical drivers that
disable flash operations). Just retry a few times until success.
In addition, remove the redundant retries in NVStore (not needed now).

This fixes failures we saw in client tests, mainly with the UBLOX_EVK_ODIN_W2 board.
Tested with all flash related tests (FlashIAP, NVStore, KVStore) on K64F and UBLOX_EVK_ODIN_W2.

Pull request type

[x] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

@ciarmcom
Copy link
Member

@davidsaada, thank you for your changes.
@ARMmbed/mbed-os-storage @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

@ciarmcom ciarmcom requested review from a team January 30, 2019 12:00
if (ret != 0) {
ret = -1;
break;
// Few boards may fail the write actions due to HW limitations (like critical drivers that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the comment to "erase" instead of "write"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

// Few boards may fail the write actions due to HW limitations (like critical drivers that
// disable flash operations). Just retry a few times until success.
ret = -1;
for (unsigned int retry = 0; retry < num_write_retries && ret; retry++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be an exit trigger of retries loop in case of success (maybe if total_size == 0)?

Copy link
Contributor Author

@davidsaada davidsaada Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret is the exit trigger (ret == 0 will exit on the first iteration).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I missed ret at the condition.
Could flash_program_page() be called several times for one "program" ?
If so, what happens if not the first one fails , will it retry from the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. It will retry from the beginning. This will likely succeed in most of the flash components (programming the same value twice), if not all, but there's always a chance that this logic won't always work. Will try to modify the code accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the code to retry the physical operation (and not on the entire process). Please re-review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liked the idea (-:

Few boards may fail the write actions due to HW limitations (like critical
drivers that disable flash operations). Just retry a few times until success.
In addition, remove the redundant retries in NVStore (not needed now).
Copy link
Contributor

@offirko offirko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JanneKiiskila
Copy link
Contributor

Client test results looking ok, please get this to 5.11.3 ASAP. @adbridge @0xc0170

Copy link

@deepikabhavnani deepikabhavnani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine to me.

For my understanding:
Issue was noticed earlier with NVStore and fix is now made generic for FlashIAP as I understand. But any reason for magic number 16 (I am happy that we are limiting to some number and not in while(1)) just curious.

Also description states failure is due to HW limitation like critical drivers that disable flash operations, do we have more information on this? Which driver disables flash operation in Mbed OS.

@cmonr
Copy link
Contributor

cmonr commented Jan 30, 2019

CI started

@davidsaada
Copy link
Contributor Author

Changes look fine to me.

For my understanding:
Issue was noticed earlier with NVStore and fix is now made generic for FlashIAP as I understand. But any reason for magic number 16 (I am happy that we are limiting to some number and not in while(1)) just curious.

Also description states failure is due to HW limitation like critical drivers that disable flash operations, do we have more information on this? Which driver disables flash operation in Mbed OS.

@deepikabhavnani We started noticing this issue in NVStore, but were afraid to make the change in the driver back then, which was a mistake apparently. We saw it happening with WiFi (which is also what the client test application operated in the ODIN board), but then learned that this problem could also happen in other protocols such as BLE.
As for the retries number - it just seemed like a reasonable one. In practice, we saw that 1-2 retires were enough, but couldn't tell whether other protocols would block the driver for longer times, so 16 seemed like a good number.

@0xc0170
Copy link
Contributor

0xc0170 commented Jan 30, 2019

Also description states failure is due to HW limitation like critical drivers that disable flash operations, do we have more information on this? Which driver disables flash operation in Mbed OS.

I would like to also understand this. So some stacks just disable these operations and led to failures in kvstore.

@davidsaada
Copy link
Contributor Author

Also description states failure is due to HW limitation like critical drivers that disable flash operations, do we have more information on this? Which driver disables flash operation in Mbed OS.

I would like to also understand this. So some stacks just disable these operations and led to failures in kvstore.

Was KVStore in this case (discovered by client tests), but the failures were lower - in the flash driver, hence the change.

@mbed-ci
Copy link

mbed-ci commented Jan 30, 2019

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 1
Build artifacts

@cmonr cmonr merged commit 87e57d3 into ARMmbed:master Jan 30, 2019
cmonr pushed a commit that referenced this pull request Jan 30, 2019
FlashIAP driver: Add retries to erase and program operations.
@cmonr
Copy link
Contributor

cmonr commented Jan 30, 2019

#9508 (comment)

@davidsaada davidsaada deleted the david_flashiap_retries branch February 22, 2019 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants