-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize lfs #397
Serialize lfs #397
Conversation
This is great - thanks Henry and all. I have seen a file corruption issue remaining with 372 (much more infrequent than before but still there). Wonderful to have this wrapped up. |
@mattratto -- You're right! This will be wonderful to have wrapped up. It'd be great to have some indication that this resolves a real-world reported instance, such as your remaining corruption problem. Would you be willing to reply confirming that you no longer see your remaining corruption after using this fix? (after you are fairly confident, of course) Let's hope this closes this one for good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and all of your hard work, and sorry for the late response. I have been busy with usb things in the last few days. To be honest, I don't like the wrap function, it make code difficult to read and maintain especially adding more API. We should just drop the VERIFY_LFS() and use normal way to return error instead.
@henrygab do you think we also need mutex for the |
I make PR to your branch to unwrap the VERIFY_LFS() here, since it it a bit too much work to ask you to change, please let's me know what you think. |
I still see the assert file with the stress test, we definitely need to wrap up the
|
fa2e67a
to
c054219
Compare
Yes, I think we need to also use the mutex when lfs APIs are used from within |
@hathach -- the I understand that you have indicated that you do not prefer the use of a wrapper function. I am happy to consider alternatives, but believe the safest change is to wrap each APIs that is called from another public API, so as to allow the serialization without major changes to the existing, tested, code flow. What do you think? |
@hathach To maximize working time, I've gone ahead and put forward change that continues to work for the stress test. It was not as simple as I had hoped. Three issues in particular arose:
Accordingly, I strongly request your careful code review and testing for commit 2a2f771. |
2a2f771
to
446c6ee
Compare
Rebase changes to be based from 0.15.0 ... |
I haven’t tried the rebase, but use of the previous version still results in periodic lockups - only on sequential removal of files. This is happening after the saving of approximately 100 separate 50byte files. When I loop through and remove them, sometimes the code halts. I haven’t yet run this with debug on, but once I do, I will post more info. I am wondering if I am not allowing enough of a delay between each remove, though from my reading of the new code this should not be an issue. I should add that this is occurring in a production environment which includes Bluetooth transactions. Matt |
Another note - this is occurring on brand new boards. I am not running a file system format before placing files on the flash, following initial install of software. Is it possible that reliability would be increased by adding an initial format prior to adding files? |
Sorry for late response, is busy migrating from travis to github action, and bunch o f unnamed work at the end of year. Will look at this next. |
@mattratto -- Question: Are your files in the root directory, or are they all in a subdirectory that only you use? Why: As a test, if you they were in root, and you moved all your files out into a subdirectory, would you still see a problem?
|
They were all in root. I’m moving to subdirectory and continuing to test. Is there anything fancy I should be doing with this? I’ve just appended /data/ to the file names for the write, and open /data/ for the reads and deletes. As always, my thanks Henry! |
First, I have very little understanding of what the SD does. This currently is limiting me to (hopefully educated) guesses on how this might occur. That said... No, nothing fancy. A subdirectory is all that's needed to get the potential reduction in interactions. @hathach -- do you know when, why, and how the SD is saving bluetooth pairing data to flash? |
So I’ve put my data files in a directory and now, instead of deleting them individually, I delete and recreate the directory. Either the use of the directory, or this new deletion process seems to have resolved remaining issues. I am of course continuing to test, but looking good so far! |
@mattratto That data supports my current suspicion that the softdevice is messing with (writing directly to) sectors that Adafruit_LittleFS is using. (( I'm curious if there is any repro without sd enabled, but I can no longer repro the issue with this branch )) Can you get a repro if you go back to the prior behavior (saving in root directory), and not enable the softdevice / bluetooth? Obviously, this presumes you have multiple tasks that each use the filesystem. If you can repro even without the softdevice enabled, then something else is awry. |
@mattratto -- Matt, I want to be clear that, if the SD is the cause of corruption by writing to the LFS file system without being serialized, using a subdirectory will only make the problem much more rare, and not eliminate it entirely. This is because there are still globally shared data structures on the media, so if/when certain conditions get hit, it would be more likely to cause conflict. I think your prior method of deleting many files in sequence may have been a good test case (but also good to know the recursive directory deletion can also further reduce the likelihood of conflict with SD). |
Sorry for another delay, was busy with other works lately.
To be clear SD doesn't write to LFS, only the Bluefruit library save information to LFS when
CCCD changes occurs more often then pairing with new device, if there is conflict, it is probably it. |
char const* name (void); | ||
|
||
bool isDirectory (void); | ||
File openNextFile (uint8_t mode = FILE_O_READ); | ||
void rewindDirectory (void); | ||
|
||
// Has to be public, in order to grant 'friend' permissions to Adafruit_LittleFS private member _mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henrygab as long as it works, but I think we should wrap up the LFS lock/unlock() as _lock()/_unlock()
with the comment to tell use not to call it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Because this PR fixes data corruption, I think it is important to accept even if you want to change some function names later.
Separately, if you want the design to better hide this, I would recommend following the PIMPL (Pointer-to-Implementation) solution. This would allow 100% hiding of all the non-public APIs, and the interactions between the FS / File abstractions, without exposing any of this internal shared locking. However, changing to PIMPL would be a bigger architectural change … my goal was to minimize changes while serializing access to LFS APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agrees, let's merge this first. We can make further modification and naming later on if needed.
No, I don't need any fancy way to absolute hide API from user. I would simply add underscore to _lock/_unlock
public API with comment saying it is intended to be used internally and expect user not to call it :D
https://github.com/adafruit/Adafruit_nRF52_Arduino/blob/master/libraries/Bluefruit52Lib/src/bluefruit.h#L239
Per @hathach's PR, this removes the wrapper functions.
This solution appears to work. However, it does not follow all best practices. For example, the use of friend declarations, and the need to make the lock/unlock functions in File public (in order to get friend status). However, it passes the stress test without errors.
wow, you are too quick, I made a modification and submit it as PR (as usual) to your branch, but look like github has difficulty to pick up the differences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for super quick fix for comments, I am kind of get lost when differences presented by github now. I agreed that this fixes the stress_test and could be merged. I can then manually compare with my modification and submit as separated PR for further discussion.
char const* name (void); | ||
|
||
bool isDirectory (void); | ||
File openNextFile (uint8_t mode = FILE_O_READ); | ||
void rewindDirectory (void); | ||
|
||
// Has to be public, in order to grant 'friend' permissions to Adafruit_LittleFS private member _mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agrees, let's merge this first. We can make further modification and naming later on if needed.
No, I don't need any fancy way to absolute hide API from user. I would simply add underscore to _lock/_unlock
public API with comment saying it is intended to be used internally and expect user not to call it :D
https://github.com/adafruit/Adafruit_nRF52_Arduino/blob/master/libraries/Bluefruit52Lib/src/bluefruit.h#L239
Thank you very much @henrygab for spending lots of time fixing this issue. This has been draining a lot more time than expected ( and I am got pulled to work on other things meanwhile). I am very much appreciated for your time and effort. Looking forward to see more of your PRs in the future :) |
@hathach -- I am humbled by the many volunteers, without whose help (reporting, testing the many incomplete fixes, and final validation) this would not have been possible, including:
A special thanks also to @geky for helping me understand LFS! |
Amazing! Great to watch this work being done! |
Indeed, thank you everyone to help fixing this annoying bugs and be PATIENT with me :D |
HI all, and specifically Henry, after months of no corruption issues, I have now seen 5 issues, exhibited with both Adafruit BSP 0.16.0 and with Adafruit BSP 0.20.5 (most recent release.) In each case, the error is the same LFS 494 Corrupted Directory blocks. It is possible that these devices had an older version of the BSP installed previously and thus the corruption may have occurred while using an older version of the Adafruit littlefs libraries. It is resolved by a format, and has not reoccurred in any of the above examples. However, I am wondering how I might insulate my code against corruption issues. My main problem is that when my code runs into a corruption issue, it locks up. Does anyone have any suggestions how I could prevent this? I would like to show an error message and potentially kick off a format when the code encounters file corruption, but I have no idea how to do this since the lock up occurs while executing a file open. thanks, Matt |
In the interests of full disclosure, below is the code that is locking up when it encounters an error. If this is really a code rather than a library issue, I do apologize: InternalFS.begin(); //start file system
File item(InternalFS);
File dir("data/", FILE_O_READ, InternalFS);
while( (item = dir.openNextFile(FILE_O_READ)) )
{
String filename = item.name();
if(filename.indexOf("data") == 0) {
uint32_t readlen;
char buffer[999] = { 0 };
readlen = item.read(buffer, sizeof(buffer));
item.close();
for (int e=0; e<readlen; e++) { //now parse and separate
Serial.print(buffer[e]);
}
Serial.println("---");
}
}
dir.close();
} |
after further thought, am thinking this is a code issue rather than a library issue per se. For anyone who wanders by, I realized that the 'Internal_ListFiles' example does not lock up, but my code above does. The only difference is that the example code only grabs the file names, but I get the names but also try to read the file contents. So I am wondering if there is a way to check the file integrity prior to attempting to read - maybe in fact, the item.name read is returning an error code? |
@mattratto the file system should done the checking for us, once the FS is corrupted, it is too hard to check that, since you need to understand how the files is stored on the flash and understand how these inode chain work etc... Are you still seeing corruption issue with latest library ? |
@hathach I am getting file corruption error messages when running code with latest library. However, I do not know if they were created by the library, or occurred when running an older library, prior to update to 0.20.5 BSP. These show up when I run Internal_ListFiles example code and seem to be happening around the use of openNextFile. I went and looked at this function in Adafruit_LittleFS_file.cpp and found on line 393: (void)ret._open(filepath, mode); // return value is ignored ... caller is expected to check isOpened() I couldn't find isOpened, but am now trying isOpen() on each item prior to proceeding: while( (item = dir.openNextFile(FILE_O_READ)) ) I will test this and report back. |
@mattratto I see your problem now, since this PR is merged and closed. You should really re-test the library with newly format drive, if you still have issue with latest code then. Feel free to open an new issue, that make following it easier ( your could point a ref link to this if needed). |
OK. Will do. Thanks @hahatch
Matt Ratto
Associate Professor
Bell University Labs Chair in Human-Computer Interaction
Faculty of Information
University of Toronto
@mattratto
From: Ha Thach<mailto:notifications@github.com>
Sent: Sunday, August 30, 2020 5:33 AM
To: adafruit/Adafruit_nRF52_Arduino<mailto:Adafruit_nRF52_Arduino@noreply.github.com>
Cc: Matt Ratto<mailto:matt.ratto@utoronto.ca>; Mention<mailto:mention@noreply.github.com>
Subject: Re: [adafruit/Adafruit_nRF52_Arduino] Serialize lfs (#397)
@mattratto<https://github.com/mattratto> I see your problem now, since this PR is merged and closed. You should really re-test the library with newly format drive, if you still have issue with latest code then. Feel free to open an new issue, that make following it easier ( your could point a ref link to this if needed).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#397 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAFCK6EJM72ZQQTLPSDHIJLSDIMGRANCNFSM4J53UWTQ>.
|
Fix #350
Fix #325
Fix #227
Fix #222
Fix #393
@hathach -- This PR supercedes #372 (Guard against race conditions in Flash FS cache), as #72 was not really providing a proper fix of the underlying problem:
LFS itself is not safe to call simultaneously from multiple contexts.
This PR solves this by wrapping the InternalFS API with a non-recursive mutex, as recommended.
I realize the PR uses newly declared wrapper functions. I tried to avoid this, but the changes got messy quite quickly, mostly due to some of the macros used to verify return codes exiting the function (which would leak the mutex). This is a reasonably clean solution, without the more drastic move to use of RAII objects such as
std::lock_guard<>
.