New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider use of fdatasync() #22
Comments
Hi,
The
My reading of this is that with a slow disk you would potentially get cache thrashing ( Maybe we should provide an option/env var to make the Julius |
Addendum: Let me quote Tobi Oetiker on this:
|
Maybe we could add an option to postpone |
It’s a nice idea, but I wonder if it helps in this use case (lots of very small files). Do you propose to leave the fd’s open? In that case we will easily overflow somewhere if we just leave open the fd’s when the caller does a |
If the user calls I haven't thought this through yet, but the general idea remains. I'm speaking my mind here:
|
ad 1., just considering the possibility – We have no clue about what happens to a file after we close the fd. We might be right in most cases by remembering the file name and then acting on a re-opened version of that. But how do you find the name of the file that was opened? Consider multiple hard links on the same inode etc. – I think this adds a lot of potential for errors or inconsistent behavior, while not really gaining much. |
I know there are issues about that in general, but in this specific case (rsync and possibly tar and others), it makes sense.
two ways:
Regarding hardlinks, it doesn't matter which hardlink you use to reference the corresponding cached pages, they're physically the same. Am I missing something here? @hhoffstaette : that said, isn't there a dontneed feature for rsync that's already available? |
FWIW: We could close it upon user request, but just before that use @hhoffstaette: Any thoughs on the ideas discussed here…? |
@Feh Sorr for the silence. Yes, lots of ideas and even started to hack on it a bit already. :) |
@noushi Yes, I had Tobi Oetiker's original --drop-cache patch in rsync, but it had the same behaviour (fsync for every file) and was more or less unmaintained/unhackable. Also the rsync maintainers continue to refuse to merge it - probably a good thing in hindsight, considering the bad performance implications. nocache with its library/wrapper script approach is more versatile. |
Of course. Patches and pull requests welcome! :-) |
I'd like to use nocache in the server for rsync backups. There rsync will only read files, so it'd be best to use POSIX_FADV_NOREUSE, without any syncs of course. Therefore I think an option to chose the argument to posix_fadvise would be worth it. |
@hhoffstaette Any new developments on your part? How would you think about a simple option that’ll just disable calling |
Julius Plenz (notifications@github.com) wrote on 14 July 2014 12:48:
And the change to POSIX_FADV_NOREUSE instead of POSIX_FADV_DONTNEED. |
AFAICS, this would make the semantics correct, and that’s what
As you can see in mm/fadvise.c, |
While looking at the (unrelated) current issue with glibc 2.28 I just realized I never commented on Long story short: I never ended up with the planned "complicated" solution of batching small-file writes together and doing syncfs() for a whole target fs, with a delayed fadvise() for the whole batch. Instead I just applied a patch that simply skips fsync() for small files < 8MB, and that fixed 99% of my performance problem while at the same time cleaning the page cache "well enough" after copying larger files. If anybody cares, the commit in question is here. Maybe this can be an inspiration for an official flag to reduce fsync() overhead. Given all my other commitments I don't see myself contributing to this anymore any time soon. |
Nocache is a great way to prevent buffer cache pollution e.g. by rsync.
Unfortunately I just found out the hard way that it also totally destroys rsync performance with small-file workloads due to calling fdatasync() (via sync_if_writable() in free_unclaimed_pages()). Since this happens for every little file (incl. temp files!) on close(), performance becomes completely unacceptable. Not only is it dog slow, it is also not helpful for drive longevity.
I have for now preloaded libeatmydata in addition to nocache, which negates this effect and restores performance; however this should not be necessary. fadvise() works just fine without prior fsync() - buffer pages are flushed out in nice batches, and still marked for reuse.
Unless there is a technical reason that I'm unaware of please consider removing the use of fdatasync().
The text was updated successfully, but these errors were encountered: