Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve QSPIFBlockDevice conformance to SFDP #11531

Merged
merged 26 commits into from Nov 13, 2019

Conversation

@kyle-cypress
Copy link

kyle-cypress commented Sep 19, 2019

Description

Proposed fix for #11530 .

In several cases QSPIFBlockDevice does not conform to the SFDP standard JESD216. These include:

  • Status register reading and writing
  • Soft resetting
  • Enabling of 4-byte addressing
  • Enabling of quad output mode
  • Use of the legacy erase instruction

In these cases QSPIFBlockDevice often assumes that certain instructions or methods are universal across flash devices when they are not. This can cause issues in devices that do not match these assumptions. This pull request resolves many of these issues and better aligns QSPIFBlockDevice with JESD216.

This PR is dependent on #11604

Pull request type

[x] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

Release Notes

@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Sep 19, 2019

This is a WIP submission to gather feedback while we finalize our internal review and testing.

@ciarmcom ciarmcom requested review from ARMmbed/mbed-os-maintainers Sep 19, 2019
@ciarmcom

This comment has been minimized.

@0xc0170 0xc0170 removed the request for review from ARMmbed/mbed-os-test Sep 20, 2019
@@ -380,7 +380,7 @@ int QSPIFBlockDevice::erase(bd_addr_t addr, bd_size_t in_size)
int type = 0;
uint32_t offset = 0;
uint32_t chunk = 4096;
unsigned int cur_erase_inst = _erase_instruction;
qspi_inst_t cur_erase_inst = _erase_instruction;

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Sep 20, 2019

Member

What is the reason for this change? I dont see anything in the commit msg to explain.

This comment has been minimized.

Copy link
@kyle-cypress

kyle-cypress Sep 23, 2019

Author

Consistency - the current code uses a mixture of signed and unsigned integers to represent instruction values
Clarity - convey the meaning better of the data than a generic int (and encourage consistency)
Efficiency: The instruction is only 8 bits wide, so size the data type to fit.

@0xc0170 0xc0170 removed request for ARMmbed/mbed-os-core Sep 26, 2019
@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Sep 26, 2019

@ARMmbed/mbed-os-storage Can you please review this proposal?

@@ -160,7 +166,7 @@ class QSPI : private NonCopyable<QSPI> {
* @returns
* Returns QSPI_STATUS_SUCCESS on successful reads and QSPI_STATUS_ERROR on failed reads.
*/
qspi_status_t read(int instruction, int alt, int address, char *rx_buffer, size_t *rx_length);
qspi_status_t read(qspi_inst_t instruction, int alt, int address, char *rx_buffer, size_t *rx_length);

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Sep 26, 2019

Member

int should be sufficient for inst as it can be any number, narrow down this won't make much difference here.

Same as alt, address and others.

This is breaking change, so needs to be justified to go in.

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Sep 26, 2019

Member

The usage of this in the block device was wrong, should follow what the driver specifies.

This comment has been minimized.

Copy link
@kyle-cypress

kyle-cypress Sep 26, 2019

Author

The usage of this in the block device was wrong, should follow what the driver specifies.

Can you clarify what you mean? Are you saying that the use of unsigned int instead of int in the block device was wrong?

This comment has been minimized.

Copy link
@kyle-cypress

kyle-cypress Sep 26, 2019

Author

int should be sufficient for inst as it can be any number, narrow down this won't make much difference here.

Same as alt, address and others.

This is breaking change, so needs to be justified to go in.

This change is not mandatory for the fix here so I can pull it out of this review and into a separate PR if it makes things easier. But in principle, shouldn't values which correspond to bit patterns of a particular width be represented as types of particular widths? If so, then I would argue that a separate PR should also replace alt and address with more specific types.

This comment has been minimized.

Copy link
@SeppoTakalo

SeppoTakalo Sep 27, 2019

Contributor

I'm not that familiar with SQPIF, but if commands are limited to 8 bytes, then I would say that this is acceptable, and it is not a breaking change.

Mbed OS have not guaranteed to have binary compatibility, so moving from int to uint8_t should be OK.

@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Sep 26, 2019

This again changes a driver and the block device - as these are two different areas, it would be good to split them, driver first - if accepted, block device can be updated. Otherwise this takes a longer time to review and integrate (they are dependent on each other).

Copy link
Contributor

SeppoTakalo left a comment

There are lots of changes not related to QSPIF, can you split your changes to separate PRs so that we can properly review each change on its own.

Currently I'm not able to understand why this PR touches things like LittleFS, KVStore and DeviceKey.

@adbridge adbridge added needs: work and removed needs: review labels Sep 27, 2019
@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Sep 27, 2019

Yes, I will split out the related refactoring and cleanups into their own PRs that this one can then depend on.

@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Oct 1, 2019

Separated out #11604 for the qspi_inst_t change (will rebase this PR to remove that commit soon)

@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Oct 3, 2019

I've made the following updates:

  • Rebased onto #11604 . This can be rebased onto master instead if necessary, but when I tried it was not applying cleanly due the changes in types, so I'd like to hold off on doing so unless it is looking like #11604 is not going to go in before this one.
  • The lfs changes were extracted into #11614 (which may be superceded by #11624). One of the two is required for features-storage-tests-kvstore-filesystemstore_tests to pass, but there is not a strict ordering dependency.
  • Removed the devicekey/tdb changes. A modified variant of one portion of the devicekey tests was extracted as #11628 . The broader refactoring will be submitted as an additional PR.
  • I left the changes to enable more tests on PSoC6 in this PR.

_mutex.lock();

// All commands other than Read and RSFDP use default 1-1-1 bus mode (Program/Erase are constrained by flash memory performance more than bus performance)
_qspi.configure_format(QSPI_CFG_BUS_SINGLE, QSPI_CFG_BUS_SINGLE, _address_size, QSPI_CFG_BUS_SINGLE,

This comment has been minimized.

Copy link
@SeppoTakalo

SeppoTakalo Oct 4, 2019

Contributor

Do you need to check return value from configure_format()?

This comment has been minimized.

Copy link
@kyle-cypress

kyle-cypress Oct 4, 2019

Author

Yes, the return value needs to be checked for configure_format across the board, but that is handled in #11603 (applying the check to additional/relocated configure_format calls would be handled during the rebase of whichever PR goes in second).

@maclobdell

This comment has been minimized.

Copy link
Contributor

maclobdell commented Oct 8, 2019

@ARMmbed/mbed-os-maintainers - Can this be merged. It seems the issues have been addressed. Please confirm.

@adbridge adbridge added needs: CI and removed needs: work labels Oct 11, 2019
@adbridge

This comment has been minimized.

Copy link
Contributor

adbridge commented Oct 11, 2019

CI started

Kyle Kearney added 6 commits Oct 21, 2019
QSPIFBlockDevice::_clear_block_protection() has logic to retain the
WIP and WEL bits in status register 1, but it failed to account for
the situation where the QE bit is also in status register 1.
In _sfdp_set_quad_enabled, note the status register and bit therein
for the quad enable, so that _clear_block_protection can retain it.
Introduce a separate function for handling alterations to device interaction
which are not covered by the SFDP tables and therefore require checking against
the vendor id.
Default to 2 status registers, but update this value if necessary
 during vendor quirk handling for parts (currently only Macronix)
 which have one status register and two control registers. For the
 purposes of QSPIFBlockDevice, these are all considered status
 (or at least "status-like") registers because they are all written
 via the Write Status Register instruction.
Set the custom RDCR instruction for Macronix during quirk handling.
Update reading and writing of status registers to handle a variable
 number of status registers.
Use a vendor id check to only perform this enable on devices which define the
 second configuration register where the fast mode enable bit lives.
Change _enable_fast_mode to use the standard status register reading and writing functions
4-byte addressing has been seen to cause failures on NORDIC
boards and with Macronix memories. Suppress the attempt to enable it
on that hardware (via vendor quirks and a target check) until either
the failure cause can be fixed or a more robust suppression mechanism
is implemented.
The addition of trace logging during greentea tests pushes the multithreaded
read-write test beyond the limits of the stack it allocates for its threads.
The increase of 128 bytes was chosen by experimentation.
@kyle-cypress kyle-cypress force-pushed the kyle-cypress:pr/qspi-sfdp branch from 073285a to 0103e3a Nov 13, 2019
@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Nov 13, 2019

1.) Rebased on latest master in hopes that, if I understood the comment correctly, this will pull in a fix for the issue @jeromecoutant was encountering.
2.) Slightly increased the thread stack size for the block device multithreaded test to fix the stack overflow.

This will probably still fail on NRF52840_DK with a timeout. This is because the extensive debug-level tracing in QSPIFBlockDevice, which is now enabled during the CI, causes the general block device test to slow down substantially while it waits for the UART (in my testing it increased from ~2 minutes to 10-15). The kvstore static test looks like it has a similar issue. @maclobdell plans to seek agreement on the proper way to resolve this early tomorrow morning.

@jeromecoutant

This comment has been minimized.

Copy link
Contributor

jeromecoutant commented Nov 13, 2019

debug-level tracing (...) is now enabled during the CI

@0xc0170 Really? trace should not be enabled by default....

@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Nov 13, 2019

@jeromecoutant This was to help debugging the previous issues, before we start a new CI, will disable that.

@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Nov 13, 2019

CI restarted (tracing disabled)

@mbed-ci

This comment has been minimized.

Copy link

mbed-ci commented Nov 13, 2019

Test run: FAILED

Summary: 1 of 11 test jobs failed
Build number : 8
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_greentea-test
@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Nov 13, 2019

@kyle-cypress One failure for k64f in the logs, please review

mbedgt: :463::FAIL: Expected 0 Was 263 [1573647330.85][CONN][RXD] :463::FAIL: Expected 0 Was 263

Looks like deinit test case failing only, @ARMmbed/mbed-os-storage would you be able to reproduce this one for k64f?

@adbridge

This comment has been minimized.

Copy link
Contributor

adbridge commented Nov 13, 2019

@kyle-cypress Looks like the failure is K64F on gcc_arm:

[1573647393.38][CONN][INF] found KV pair in stream: {{__testcase_finish;SEC_kvstore_deinit;1;0}}, queued...
[1573647393.48][CONN][RXD] >>> Test cases: 110 passed, 1 failed with reason 'Test Cases Failed'
[1573647393.48][CONN][RXD] >>> TESTS FAILED!
@maclobdell

This comment has been minimized.

Copy link
Contributor

maclobdell commented Nov 13, 2019

@adbridge, @0xc0170 - K64F does not have QSPIF. I strongly believe these failures are unrelated to the changes in this PR and are a result of unstable kvstore tests.

@maclobdell

This comment has been minimized.

Copy link
Contributor

maclobdell commented Nov 13, 2019

after talking to the maintainers, I think the K64F test fail could be real. I will test it manually. Since the kvstore tests were touched and this PR is the only one that K64F kvstore tests are failing, it needs t obe investigated. Perhaps another increase of stack size is necessary. Will try to resolve by the end of today. Any help and support anyone can provide is much appreciated.

@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Nov 13, 2019

Thanks @maclobdell . Please let me know if you need any help debugging. For reference, the commit that changed that test is 92829bd (a1c7403 also touches that file, but just to add PSOC6 to the set of targets on which it runs).

@kyle-cypress

This comment has been minimized.

Copy link
Author

kyle-cypress commented Nov 13, 2019

The kvstore failure on K64F definitely seems to be intermittent though. In the second run of the results referenced by #11531 (comment), the kvstore tests pass on K64F and the only failure was networking related.

@adbridge

This comment has been minimized.

Copy link
Contributor

adbridge commented Nov 13, 2019

@0xc0170 CI now seems to have passed on this ?

@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Nov 13, 2019

yes, restarted as above made a point about the last run (this confirms it, 2x subsequent run are all OK). Lets get this in and check nightly in the morning

@0xc0170 0xc0170 added ready for merge and removed needs: CI labels Nov 13, 2019
@0xc0170 0xc0170 merged commit 539779f into ARMmbed:master Nov 13, 2019
26 checks passed
26 checks passed
continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Success!
Details
travis-ci/docs Success!
Details
travis-ci/doxy-spellcheck Success!
Details
travis-ci/events Success! Runtime is 8725 cycles.
Details
travis-ci/gitattributestest Success!
Details
travis-ci/include_check Success!
Details
travis-ci/licence_check Success!
Details
travis-ci/littlefs Success! Code size is 8420B.
Details
travis-ci/psa-autogen Success!
Details
travis-ci/tools-py2.7 Success!
Details
travis-ci/tools-py3.5 Success!
Details
travis-ci/tools-py3.6 Success!
Details
travis-ci/tools-py3.7 Success!
Details
@maclobdell

This comment has been minimized.

Copy link
Contributor

maclobdell commented Nov 13, 2019

I have been running kvstore tests on 3 different FRDM-K64F on the master branch, before and after this PR was merged. I see different intermittent failures across different boards, but I see the same failures before and after this PR.

mostly I see these tests failing, but it doesn't seem to be related at all to this PR, unless the flash is getting left in a weird state and causing the test to fail subsequent runs.

features-storage-tests-kvstore-filesystemstore_tests | Testing Edge Cases
features-storage-tests-kvstore-general_tests_phase_1 | TDB_kvstore_init

@0xc0170

This comment has been minimized.

Copy link
Member

0xc0170 commented Nov 14, 2019

Thanks @maclobdell for the report. I've just noticed failures in some other PRs and nightly as well, going to create a new issue to track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.