New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silent file corruption on NRF devices #3623
Comments
The nice_nano does not have an external QSPI flash chip, and uses the nRF52840's internal flash for a filesystem. On the Particle Xenon, CIRCUITPY is on the external flash chip. Do you see any difference between boards that use use the internal flash and external flash when you see this problem? |
I remembered incorrectly. Sorry for that mistake, the Xenon works perfectly. |
OK, so this is probably internal filesystem only. Thanks! That's very helpful. We have seen this in the past, and I have a vague memory of trying to fix it, but clearly it needs more investigation. |
Let me know when you think you have a fix, let me know the branch and I'll give it a shot on my devices as well and report back. I don't have nearly a deep enough understanding of circuitpython to contribute farther. |
@dhalbert isn't this a bug? I'd expect the bug label and either the 6.x.x milestone if we're going to debug soon or |
It is a bug that I marked it with the wrong milestone :) |
Wanted to follow this up with tests on Mac and Windows. Same results. Success reported when writing the file, safely ejected the disk. File corruption detected when running code as there were errors on the code that was read back on the nice_nano itself. Rebooting the microcontroller (hard reset), the corruption could be seen in that file on the computer. Edit: Looks like I may have spoken too soon. It may be writing correctly, and reporting to the OS that it's writing faster than it is. I left the board for 5 minutes after the copy, and eventually it seemed to settle if I didn't interrupt it. Running |
Following up that this still is an issue.
Clean formatting storage with storage.erase_filesystem seems to help for a while. Eventually I have to run that or all files are corrupt and the device will disconnect me from a comm (REPL) any time I copy a file. |
I've managed to come up with a way to consistently get corruption. I've frozen in as many modules as I could to fill up the primary storage. My custom circuitpython image is at 1.1MB. Adding only a few small files onto the CIRCUITPY drive and I consistently see corruption very often. Hopefully this will help you reproduce this more quickly to see what is causing the issue. I can send over the U2F for the nice_nano as well that I used to trigger this as well if it is at all helpful. |
Do you see this even if you don't run any code, or do you have to run code and eventually it happens? Is the code you're running using |
There is only 1MB of flash on the nRF52840! So I'm not sure why it's not catching that your image is too big. Even before this experiment of adding additional frozen modules, did you have some frozen modules? (EDIT) I think you may be referring to the .uf2 size, which is considerably larger than the actual size of the firmware in flash. (There is empty space in the .uf2 file, and overhead). |
I am on the trail of a possible issue. |
The code does run bluetooth pairing, but no microcontroller.nvm usage.
I was referring the UF2 size, my apologizes. This issues was opened using only stock Circuitpython, downloaded from the official site. |
Wanted to report back that I now have an unbootable microcontroller. Even flashing fresh (official) circuitpython to the device seems to not get to a REPL. I assume that this may be related to this bug as it's only been used on Circuitpython |
I have reproduced bad filesystem writes by writing large files (~180k) on a PCA10059 (uses internal filesystem). They take a very long time, sometimes, to write. I am investigating a couple of hypotheses about what's wrong. @kdb424 You could try erasing everything re-flashing the bootloader. See the instructions in https://github.com/adafruit/Adafruit_nRF52_Bootloader/blob/master/README.md |
I re-tested this with 7.0.0-beta.0, and I can no longer reproduce the problem. I tried copying a 200k file several times, We have fixed a number of things about background tasks, USB, and time-keeping since this report, though I'm not sure which fix may have finally fixedt this problem, if indeed it is fixed. I will close this for now, but please reopen if you see it again with 7.0.0-beta.0 or later. Thanks for your report. |
On large disk writes, filesystem corruption seems to be an issue on the disk. It reports as working properly to the host OS, even after a reboot of the controller. This has been tested on several devices, but is most problematic on the nice_nano, which has almost identical specs to the itsybitsyNRF. Tested using several Cpy versions including 6 rc.0. Steps to reproduce
https://github.com/KMKfw/kmk_firmware
sync
on linuxThis issue is much more likely to occur while trying to replace files on disk, though even initial copy seems to fail, and half of the time, crashes to safe mode if a disk sync is forced. rsync reports that the files are correct, when the code is run, random files will have errors that make no sense, and be corrupt with garbage data. Copying the same files over them works eventually, and all code with 0 changes will run properly.
Tested on Linux, as well as a Windows 10 VM on Virtualbox
The text was updated successfully, but these errors were encountered: