Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fewer lookup requests when inode type might be known #12

Open
jamesbornholt opened this issue Oct 29, 2022 · 3 comments
Open

Make fewer lookup requests when inode type might be known #12

jamesbornholt opened this issue Oct 29, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@jamesbornholt
Copy link
Member

Right now our lookup implementation does two concurrent ListObjectsV2 requests. List requests are expensive, both cost-wise and performance-wise. We should think about a better strategy here, perhaps starting with a HeadObject request. If there's an existing inode, we could also use that as a hint as to which requests are most likely to succeed (only as a hint, because the bucket might have changed).

@jamesbornholt jamesbornholt added this to the beta milestone Oct 29, 2022
jamesbornholt added a commit that referenced this issue Feb 1, 2023
Our current `lookup` does two concurrent ListObjects requests. After
thinking about it a bit more carefully, one of them can be replaced with
a cheaper, faster HeadObject request. The "unsuffixed" request we were
doing was purely to discover whether an object of the exact looked-up
name existed, which is what HeadObject does. Switching to HeadObject
reduces the request costs of a lookup.

One disadvantage of HeadObject is when looking up directories. The
unsuffixed ListObjects we're replacing here could discover a common
prefix and return it immediately without waiting for the other request
to complete. But in practice, the two requests were dispatched
concurrently, so the customer still pays for both requests, and the
latency is the minimum latency of two concurrently ListObjects. Now,
the latency for a directory lookup will be the maximum of a concurrent
ListObjects and HeadObject.

An issue in this change is that we expect HeadObject to return 404 when
doing directory lookups, but right now the way our error types are
structured gives us no way to distinguish 404s from other errors. For
now, I'm just swallowing all errors on the HeadObject request, and I'll
follow up with a broader change to fix our error handling story to make
this work.

This is a partial fix for #12, but in future we can do better for
lookups against objects we've seen before by remembering their type.

Signed-off-by: James Bornholt <bornholt@amazon.com>
jorajeev pushed a commit that referenced this issue Feb 6, 2023
Our current `lookup` does two concurrent ListObjects requests. After
thinking about it a bit more carefully, one of them can be replaced with
a cheaper, faster HeadObject request. The "unsuffixed" request we were
doing was purely to discover whether an object of the exact looked-up
name existed, which is what HeadObject does. Switching to HeadObject
reduces the request costs of a lookup.

One disadvantage of HeadObject is when looking up directories. The
unsuffixed ListObjects we're replacing here could discover a common
prefix and return it immediately without waiting for the other request
to complete. But in practice, the two requests were dispatched
concurrently, so the customer still pays for both requests, and the
latency is the minimum latency of two concurrently ListObjects. Now,
the latency for a directory lookup will be the maximum of a concurrent
ListObjects and HeadObject.

An issue in this change is that we expect HeadObject to return 404 when
doing directory lookups, but right now the way our error types are
structured gives us no way to distinguish 404s from other errors. For
now, I'm just swallowing all errors on the HeadObject request, and I'll
follow up with a broader change to fix our error handling story to make
this work.

This is a partial fix for #12, but in future we can do better for
lookups against objects we've seen before by remembering their type.

Signed-off-by: James Bornholt <bornholt@amazon.com>
@jamesbornholt
Copy link
Member Author

#69 improved this by replacing one ListObjects with a HeadObject. What's left to do is to use cached state as a hint to optimize the requests here—if we already suspect something is a file (or directory), we can start with the HeadObject (or ListObjects) and only do the other request if it fails.

@jamesbornholt jamesbornholt changed the title More frugal lookup requests Make lookup requests more frugal when inode type might be known Feb 7, 2023
@jamesbornholt jamesbornholt modified the milestones: beta, 1.0 Feb 7, 2023
@passaro passaro removed this from the General Availability (1.0) milestone Jul 24, 2023
@jamesbornholt jamesbornholt added the enhancement New feature or request label Aug 5, 2023
@jamesbornholt jamesbornholt changed the title Make lookup requests more frugal when inode type might be known Make fewer lookup requests when inode type might be known Aug 5, 2023
@plundra
Copy link

plundra commented Sep 6, 2023

Not sure if applicable, I'm not familiar with internals of FUSE, however:

In mount.fuse(8) I see two options:

entry_timeout=T
       The timeout in seconds for which name lookups will be cached. The default is 1.0 second. For all the timeout options, it is possible to give fractions of a second as well (e.g. entry_timeout=2.8)
attr_timeout=T
       The timeout in seconds for which file/directory attributes are cached.  The default is 1.0 second.

Which both sounds excellent, if mount-s3 could set/use these?

We have a very static object structure, so caching would be great and work very well.
Did some measurements and for one test it's 30-50% list-type requests, which are costly both in round trip time, but also money :-)

(Thanks for a very exciting and promising project btw!)

@jamesbornholt
Copy link
Member Author

Yeah, we actually do set those timeouts, here:

impl Default for CacheConfig {
fn default() -> Self {
// We want to do as little caching as possible, but Linux filesystems behave badly when the
// TTL is exactly zero. For example, results from `readdir` will expire immediately, and so
// the kernel will immediately re-lookup every entry returned from `readdir`. So we apply
// small non-zero TTLs. The goal is to be small enough that the impact on consistency is
// minimal, but large enough that a single cache miss doesn't cause a cascading effect where
// every other cache entry expires by the time that cache miss is serviced. We also apply a
// longer TTL for directories, which are both less likely to change on the S3 side and
// checked more often (for directory permissions checks).
let file_ttl = Duration::from_millis(100);
let dir_ttl = Duration::from_millis(1000);
Self { file_ttl, dir_ttl }
}
}

We set them very low because we want to preserve S3's strong consistency model by default. But we know for some workloads, the bucket doesn't change very much/at all, and so we could cache that metadata much longer. We're tracking that as a roadmap item in #255.

This issue is tracking a smaller improvement we could make: if we've listed a file/directory previously, then when the cache expires we could speculate that it's still a file/directory when we try to look it up from S3 again. I think with the way that lookup works right now, that would allow us to skip some HeadObject requests, but probably not any ListObjects requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants