You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's an S3 performance best practice to align range GETs to multi-part upload boundaries. Our current prefetcher doesn't do a good job of this: it's not aware of the underlying part size, and in its current configuration:
it will never line up with the part boundaries because 256k is not a multiple of the part size.
We could do arbitrarily fancy things here—in principle a multi-part object could have arbitrary part boundaries—by querying the object attributes to discover the parts. That's probably overkill. But we should at least improve the common case: assume that the object's part boundaries are the part size the connector is configured with (currently 8MB), and align our range GETs to those boundaries.
Concretely for our current prefetcher, once the sequential prefetch window gets bigger than the part size:
we should transition to aligning reads on the part boundary (possibly requiring a single weirdly-sized GET to shift out the offset to the next part boundary). Or maybe we should just change our prefetching config to be aware of the part size?
The text was updated successfully, but these errors were encountered:
It's an S3 performance best practice to align range GETs to multi-part upload boundaries. Our current prefetcher doesn't do a good job of this: it's not aware of the underlying part size, and in its current configuration:
https://github.com/awslabs/s3-file-connector/blob/875508253753e071ed532192194adc35ce607916/s3-file-connector/src/prefetch.rs#L47
it will never line up with the part boundaries because 256k is not a multiple of the part size.
We could do arbitrarily fancy things here—in principle a multi-part object could have arbitrary part boundaries—by querying the object attributes to discover the parts. That's probably overkill. But we should at least improve the common case: assume that the object's part boundaries are the part size the connector is configured with (currently 8MB), and align our range GETs to those boundaries.
Concretely for our current prefetcher, once the sequential prefetch window gets bigger than the part size:
https://github.com/awslabs/s3-file-connector/blob/875508253753e071ed532192194adc35ce607916/s3-file-connector/src/prefetch.rs#L101-L102
we should transition to aligning reads on the part boundary (possibly requiring a single weirdly-sized GET to shift out the
offset
to the next part boundary). Or maybe we should just change our prefetching config to be aware of the part size?The text was updated successfully, but these errors were encountered: