Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): constrain multipart upload size to fixed length #2606

Merged
merged 3 commits into from
Jun 19, 2024

Conversation

abhiaagarwal
Copy link
Contributor

Description

Object stores expected fixed lengths for all multipart upload parts right up until the last part. The original logic just flushed when it exceeded the threshold. Now, it flushes when the threshold is met exclusively with the same fixed buffer, unless we're completing the transaction, in which case the last piece is allowed to be smaller.

Bumps the constant to reflect that the minimum expected size by most object stores is 5MiB. Also adds a UserWarning if a constant is specified to be less.

Also releases the GIL in more places by moving the flushing logic to a free function.

Related Issue(s)

Closes #2605

Documentation

See: MultipartUpload docs

@github-actions github-actions bot added the binding/python Issues for the Python package label Jun 16, 2024
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@abhiaagarwal abhiaagarwal changed the title fix(python): Constrain multipart upload size to fixed length fix(python): constrain multipart upload size to fixed length Jun 16, 2024
@ion-elgreco
Copy link
Collaborator

@abhiaagarwal looks good, just one remark on the buffer size warning

@abhiaagarwal
Copy link
Contributor Author

@ion-elgreco don't merge yet — I found about an edge case in my logic that won't work (what happens when remaining = max_buffer_length = len(chunk)?) and I want to fix up the test to actually parameterize it + check for the warning message. I also opened an upstream issue in object_store to pull out the "fixed buffer" logic into its own struct so it can live upstream, for the future :)

@rtyler rtyler marked this pull request as draft June 17, 2024 15:37
@rtyler
Copy link
Member

rtyler commented Jun 17, 2024

marking this as a draft because that also helps me of void hitting that big tempting green button

@abhiaagarwal abhiaagarwal marked this pull request as ready for review June 19, 2024 00:42
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankss!

@ion-elgreco ion-elgreco enabled auto-merge (squash) June 19, 2024 06:10
@ion-elgreco ion-elgreco merged commit 6205f00 into delta-io:main Jun 19, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.18.1 reintroduces S3 multipart upload bug
3 participants