Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust chunksize if necessary #49

Merged
merged 2 commits into from Sep 8, 2016

Conversation

JordonPhillips
Copy link
Contributor

This will adjust the chunksize of an upload or copy if it is too small, too
large, or if it looks like max parts will be exceeded. This is unnecessary
for downloads since there are no such restrictions for ranged downloading.

cc @kyleknap @jamesls

This passes in chunksize instead of the entire config, allowing
a custom chunksize to be provided.
@jamesls
Copy link
Member

jamesls commented Aug 31, 2016

What do you all think about not making the adjust chunk size "global"? That's one of the quirks I remember from the CLI/boto3 that this meant you had to update every single test and use "realistic" sized data in unit tests when they don't really matter (for those particular tests).

What if these objects instead took a "part size adjuster". We could then have the tests use the "noop" version for unit tests that don't actually care and in the "real" code we could pass the actual chunk size adjuster.

Also, just a heads up, the travis build failed in case you haven't seen it yet.

@jamesls
Copy link
Member

jamesls commented Sep 6, 2016

Just wanted to capture offline conversation I had with @kyleknap . Let's go ahead and update this to be configurable. I think having a this validation be global was reasonable for the CLI, but I think for this library it makes sense to allow this to be configurable.

@kyleknap let me know if I missed anything.

@JordonPhillips
Copy link
Contributor Author

As we discussed offline, there's some consideration on how we would want to open this up in the future. In the meantime, we'll settle for leaving it open to configuration by allowing an adjuster to be provided with main_kwargs.

self.min_size = min_size
self.max_parts = max_parts

def adjust_chunksize(self, current_chunksize, file_size=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we allow file_size to be None? Do we actually need that behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't know the file size when uploading non-seekable files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It would be good to document what None implies here though.

@jamesls
Copy link
Member

jamesls commented Sep 7, 2016

I'll probably defer to @kyleknap on this, but after seeing this version, I'm a little hesitant about this new concept of default values in task submission. These are essentially params that will never be invoked by "real" code (i.e through the TransferManager public API).

If we're not going to offer a public API right now, I'm not sure if there's much benefit in plumbing this in right now.

I suppose that puts me at -0 but I can be convinced.

# If we try to upload a 5TB file, we'll need to use 896MB part
# sizes.
new_size = self.adjuster.adjust_chunksize(chunksize, file_size)
self.assertEqual(new_size, 896 * (1024 ** 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test for when the file size is None.

@kyleknap
Copy link
Member

kyleknap commented Sep 7, 2016

Yeah I would prefer holding off on making the ChunksizeAdjuster an optional parameter you can provide to _submit() mainly because it is not really used in the code at all.

Otherwise it looks good. Just had comments on the tests.

@JordonPhillips
Copy link
Contributor Author

Updated

num_parts = file_size / new_size
self.assertLessEqual(num_parts, MAX_PARTS)

def test_unknown_file_size(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a test when we don't know the file size and we actually make an adjustment (either above/below the thresholds).

@jamesls
Copy link
Member

jamesls commented Sep 8, 2016

Looks good to me, just had a small request for a test.

@JordonPhillips
Copy link
Contributor Author

Tests added

@kyleknap
Copy link
Member

kyleknap commented Sep 8, 2016

Thanks. Looks good. 🚢 Also make sure you add a changelog entry.

@jamesls
Copy link
Member

jamesls commented Sep 8, 2016

:shipit:

This will adjust the chunksize of an upload or copy if it is too
small, too large, or if it looks like max parts will be exceeded.
This is unnecessary for downloads since there are no such
restrictions for ranged downloading.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants