Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align GETs to multi-part object boundaries #73

Closed
jamesbornholt opened this issue Feb 8, 2023 · 1 comment
Closed

Align GETs to multi-part object boundaries #73

jamesbornholt opened this issue Feb 8, 2023 · 1 comment
Assignees
Labels
good first issue Good for newcomers

Comments

@jamesbornholt
Copy link
Member

jamesbornholt commented Feb 8, 2023

It's an S3 performance best practice to align range GETs to multi-part upload boundaries. Our current prefetcher doesn't do a good job of this: it's not aware of the underlying part size, and in its current configuration:

https://github.com/awslabs/s3-file-connector/blob/875508253753e071ed532192194adc35ce607916/s3-file-connector/src/prefetch.rs#L47

it will never line up with the part boundaries because 256k is not a multiple of the part size.

We could do arbitrarily fancy things here—in principle a multi-part object could have arbitrary part boundaries—by querying the object attributes to discover the parts. That's probably overkill. But we should at least improve the common case: assume that the object's part boundaries are the part size the connector is configured with (currently 8MB), and align our range GETs to those boundaries.

Concretely for our current prefetcher, once the sequential prefetch window gets bigger than the part size:

https://github.com/awslabs/s3-file-connector/blob/875508253753e071ed532192194adc35ce607916/s3-file-connector/src/prefetch.rs#L101-L102

we should transition to aligning reads on the part boundary (possibly requiring a single weirdly-sized GET to shift out the offset to the next part boundary). Or maybe we should just change our prefetching config to be aware of the part size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants