-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs/cache
: fix parent not getting pinned when remote is a file & hasher
: look for cached hash if passed hash unexpectedly blank
#7668
base: master
Are you sure you want to change the base?
Conversation
Before this change, when cache.GetFn was called on a file rather than a directory, two cache entries would be added (the file + its parent) but only one of them would get pinned if the caller then called Pin(f). This left the other one exposed to expiration if the ci.FsCacheExpireDuration was reached. This was problematic because both entries point to the same Fs, and if one entry expires while the other is pinned, the Shutdown method gets erroneously called on an Fs that is still in use. An example of the problem showed up in the Hasher backend, which uses the Shutdown method to stop the bolt db used to store hashes. If a command was run on a Hasher file (ex. `rclone md5sum --download hasher:somelargefile.zip`) and hashing the file took longer than the --fs-cache-expire-duration (5m by default), the bolt db was stopped before the hashing operation completed, resulting in an error. This change fixes the issue by ensuring that the Pin/Unpin functions also Pin/Unpin any aliases in lockstep, so that pinning a file also pins the cache entry for its parent.
Before this change, Hasher would sometimes try to stop a bolt db that was already stopped, resulting in an error. This change fixes the issue by checking first whether the db is already stopped. https://forum.rclone.org/t/hasher-with-gdrive-backend-does-not-return-sha1-sha256-for-old-files/44680/11?u=nielash
Before this change, Hasher did not check whether a "passed hash" (hashtype natively supported by the wrapped backend) returned from a backend was blank, and would sometimes return a blank hash to the caller even when a non-blank hash was already stored in the db. This caused issues with, for example, Google Drive, which has SHA1 / SHA256 hashes for some files but not others (https://rclone.org/drive/#sha1-or-sha256-hashes-may-be-missing) and sometimes also does not have hashes for very recently modified files. After this change, Hasher will check if the received "passed hash" is unexpectedly blank, and if so, it will continue to try other enabled methods, such as retrieving a value from the database, or possibly regenerating it. https://forum.rclone.org/t/hasher-with-gdrive-backend-does-not-return-sha1-sha256-for-old-files/44680/9?u=nielash
98068b5
to
5466cb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, when cache.GetFn was called on a file rather than a
directory, two cache entries would be added (the file + its parent)
I wonder if this is the actual bug - adding two entries.
The thinking behind caching two entries was that the backend instance returned when you call NewFs("remote:dir")
and NewFs("remote:dir/file.txt")
is the same.
However this is not true - the result of the Root()
call is different for the two backends, so they should be treated as two backends not one.
So I think a better fix for this might be to remove the caching of the parent in the pointing to a file case.
What do you think?
PS I've cherry-picked the two hasher fixes for the release. This fix can go in the release if done in time or in a point release.
I agree! Adding two entries never totally made sense to me. |
Do you want to have a go at that change? Then you can see if it fixes your hasher oddities. |
Sure! Looking at it now. |
I am realizing a slight problem here, which is that |
Also, it looks to me like this used to be true, but was changed very recently in c69eb84 As a result, if all you have is |
This reverts commit c4ac032.
Before this change, when cache.GetFn was called on a file rather than a directory, two cache entries would be added (the file + its parent) but only one of them would get pinned if the caller then called Pin(f). This left the other one exposed to expiration if the ci.FsCacheExpireDuration was reached. This was problematic because both entries point to the same Fs, and if one entry expires while the other is pinned, the Shutdown method gets erroneously called on an Fs that is still in use. An example of the problem showed up in the Hasher backend, which uses the Shutdown method to stop the bolt db used to store hashes. If a command was run on a Hasher file (ex. `rclone md5sum --download hasher:somelargefile.zip`) and hashing the file took longer than the --fs-cache-expire-duration (5m by default), the bolt db was stopped before the hashing operation completed, resulting in an error. This change fixes the issue by ensuring that: 1. only one entry is added to the cache (the file's parent, not the file). 2. future lookups correctly find the entry regardless of whether they are called with the parent name or one of its children. 3. fs.ErrorIsFile is returned when (and only when) fsString points to a file (preserving the fix from rclone@8d5bc7f). Note that f.Root() should always point to the parent dir as of rclone@c69eb84
e59f5e6
to
6c808e6
Compare
Here's an alternate fix -- see what you think. 6c808e6 It eliminates the duplicate entry, but the one it keeps is the parent rather than the file. This is because of what I mentioned above -- that after c69eb84 the This version should ensure that:
If we wanted to do the reverse and cache the file but not its parent, I think what we'd probably have to do is either revert c69eb84 and have |
Wow, that is quite complicated! Probably too complicated for a point release - I'm really allergic to breaking stuff in a point release!
Note that commit is just one of many fixing the same problem in lots of backends!
I think those seem correct. Can we do it more simply? One thing I was thinking is that if we changed the contents of the cache from struct {
f fs.Fs
files []string
} Then when inserting a
I don't want to undo c69eb84 as it fixed a small but important data loss bug. The whole returning fs.ErrorIsFile is not very pleasant, but I haven't thought of anything better! One idea I had was that you always open a backend at the root with |
I would argue that it is already broken and this patch un-breaks it 😄 But I hear you! It seems plausible that the same issue could be breaking
My first version (c4ac032) is still an option. It's much simpler. (Leave the duplicate but Pin/Unpin them in lockstep)
That's very similar to what my newest version is doing. I think it would work -- but it might mean having to call
I don't think we need to.
Yeah I don't have any perfect solutions either. One idea might be to have My overall observation as someone still learning all of this is that paths are much easier to reason about on the CLI than in the code -- |
What is the purpose of this change?
Three related bug fixes identified on https://forum.rclone.org/t/hasher-with-gdrive-backend-does-not-return-sha1-sha256-for-old-files/44680/9?u=nielash
fs/cache
: fix parent not getting pinned when remote is a filehasher
: fix error from trying to stop an already-stopped dbhasher
: look for cached hash if passed hash unexpectedly blank2
and3
are specific tohasher
, but1
might possibly explain some unexplained behavior in other backends.Before this change, when
cache.GetFn
was called on a file rather than a directory, two cache entries would be added (the file + its parent) but only one of them would get pinned if the caller then calledPin(f)
. This left the other one exposed to expiration if theci.FsCacheExpireDuration
was reached. This was problematic because both entries point to the same Fs, and if one entry expires while the other is pinned, theShutdown
method gets erroneously called on an Fs that is still in use.An example of the problem showed up in the
Hasher
backend, which uses theShutdown
method to stop the bolt db used to store hashes. If a command was run on a Hasher file (ex.rclone md5sum --download hasher:somelargefile.zip
) and hashing the file took longer than the--fs-cache-expire-duration
(5m by default), the bolt db was stopped before the hashing operation completed, resulting in an error.This change fixes the issue by ensuring that the Pin/Unpin functions also Pin/Unpin any aliases in lockstep, so that pinning a file also pins the cache entry for its parent.
Was the change discussed in an issue or in the forum before?
Checklist