Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HeadObject for lookup #69

Merged
merged 1 commit into from
Feb 6, 2023
Merged

Use HeadObject for lookup #69

merged 1 commit into from
Feb 6, 2023

Conversation

jamesbornholt
Copy link
Member

Our current lookup does two concurrent ListObjects requests. After
thinking about it a bit more carefully, one of them can be replaced with
a cheaper, faster HeadObject request. The "unsuffixed" request we were
doing was purely to discover whether an object of the exact looked-up
name existed, which is what HeadObject does. Switching to HeadObject
reduces the request costs of a lookup.

One disadvantage of HeadObject is when looking up directories. The
unsuffixed ListObjects we're replacing here could discover a common
prefix and return it immediately without waiting for the other request
to complete. But in practice, the two requests were dispatched
concurrently, so the customer still pays for both requests, and the
latency is the minimum latency of two concurrently ListObjects. Now,
the latency for a directory lookup will be the maximum of a concurrent
ListObjects and HeadObject.

An issue in this change is that we expect HeadObject to return 404 when
doing directory lookups, but right now the way our error types are
structured gives us no way to distinguish 404s from other errors. For
now, I'm just swallowing all errors on the HeadObject request, and I'll
follow up with a broader change to fix our error handling story to make
this work.

This is a partial fix for #12, but in future we can do better for
lookups against objects we've seen before by remembering their type.

Signed-off-by: James Bornholt bornholt@amazon.com


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Our current `lookup` does two concurrent ListObjects requests. After
thinking about it a bit more carefully, one of them can be replaced with
a cheaper, faster HeadObject request. The "unsuffixed" request we were
doing was purely to discover whether an object of the exact looked-up
name existed, which is what HeadObject does. Switching to HeadObject
reduces the request costs of a lookup.

One disadvantage of HeadObject is when looking up directories. The
unsuffixed ListObjects we're replacing here could discover a common
prefix and return it immediately without waiting for the other request
to complete. But in practice, the two requests were dispatched
concurrently, so the customer still pays for both requests, and the
latency is the minimum latency of two concurrently ListObjects. Now,
the latency for a directory lookup will be the maximum of a concurrent
ListObjects and HeadObject.

An issue in this change is that we expect HeadObject to return 404 when
doing directory lookups, but right now the way our error types are
structured gives us no way to distinguish 404s from other errors. For
now, I'm just swallowing all errors on the HeadObject request, and I'll
follow up with a broader change to fix our error handling story to make
this work.

This is a partial fix for #12, but in future we can do better for
lookups against objects we've seen before by remembering their type.

Signed-off-by: James Bornholt <bornholt@amazon.com>
@jamesbornholt jamesbornholt marked this pull request as ready for review February 1, 2023 23:14
// TODO: 404s currently become client errors, but they are expected when looking
// up a directory, so we just swallow all errors for now. Fix when we model
// service errors correctly.
if let Ok(result) = result.map_err(|e| InodeError::ClientError(e.into())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why you're doing a map_err when you just care about the Ok case. Maybe just warn! with the error code and say we're ignoring the errors for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is just leftover from the old code, which mapped the error so it could return it with ?. I’d prefer not to add the warning since it will be very noisy (every directory lookup) and I’m fast-following this change with the one that handles this case correctly, so this change is really a temporary state.

@jorajeev jorajeev merged commit 2c2c23c into main Feb 6, 2023
@jamesbornholt jamesbornholt deleted the lookup-head-object branch February 6, 2023 18:28
jamesbornholt added a commit that referenced this pull request Feb 14, 2023
Our current ObjectClient allows each implementer to provide its own
error types for each request. This is nice and flexible, but prevents
callers of an ObjectClient from being generic if they want to detect
common service errors like NoSuchKey -- they must know the concrete
error type of the particular client they're using to match on these
errors. We've been getting away with this until #69, where we need to be
able to distinguish (expected) 404 errors from other errors on
HeadObject.

This change refactors ObjectClient to provide a common service error
type for each operation. ObjectClients now return an error that is
*either* a modeled service error like NoSuchKey *or* a client-specific
error. This allows callers to be generic over the ObjectClient and still
discriminate on the interesting error types, where by "interesting" I
mean things I think it's likely a caller might want to know about.

The diff was getting pretty big so I'm splitting this into two commits.
This is Part 1, which just does the refactoring, but doesn't change our
S3CrtClient to return the new modeled service errors. That means this
change shouldn't cause any functional change -- every error will be a
client error, like it was before this commit.  I'll follow up with Part
2 that adds the service errors to S3CrtClient (so does XML parsing etc).

Signed-off-by: James Bornholt <bornholt@amazon.com>
jamesbornholt added a commit that referenced this pull request Feb 14, 2023
Our current ObjectClient allows each implementer to provide its own
error types for each request. This is nice and flexible, but prevents
callers of an ObjectClient from being generic if they want to detect
common service errors like NoSuchKey -- they must know the concrete
error type of the particular client they're using to match on these
errors. We've been getting away with this until #69, where we need to be
able to distinguish (expected) 404 errors from other errors on
HeadObject.

This change refactors ObjectClient to provide a common service error
type for each operation. ObjectClients now return an error that is
*either* a modeled service error like NoSuchKey *or* a client-specific
error. This allows callers to be generic over the ObjectClient and still
discriminate on the interesting error types, where by "interesting" I
mean things I think it's likely a caller might want to know about.

The diff was getting pretty big so I'm splitting this into two commits.
This is Part 1, which just does the refactoring, but doesn't change our
S3CrtClient to return the new modeled service errors. That means this
change shouldn't cause any functional change -- every error will be a
client error, like it was before this commit.  I'll follow up with Part
2 that adds the service errors to S3CrtClient (so does XML parsing etc).

Signed-off-by: James Bornholt <bornholt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants