Config multipart chunk size in s3 etag and checksumming buffered reader by mweiden · Pull Request #2 · HumanCellAtlas/dcplib

mweiden · 2018-08-01T16:29:27Z

Connects to HumanCellAtlas/dcp-cli#149
Blocks HumanCellAtlas/dcp-cli#150

codecov-io · 2018-08-01T17:23:50Z

Codecov Report

Merging #2 into master will increase coverage by 0.04%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master       #2      +/-   ##
==========================================
+ Coverage   92.69%   92.73%   +0.04%     
==========================================
  Files          10       10              
  Lines         301      303       +2     
==========================================
+ Hits          279      281       +2     
  Misses         22       22

Impacted Files	Coverage Δ
dcplib/checksumming_io/checksumming_sink.py	`92.3% <100%> (ø)`	⬆️
...ib/checksumming_io/checksumming_buffered_reader.py	`91.3% <100%> (ø)`	⬆️
dcplib/checksumming_io/s3_etag.py	`67.85% <71.42%> (+2.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dbb0ff...6554c47. Read the comment docs.

ttung · 2018-08-01T20:56:58Z


 class ChecksummingBufferedReader:

    def __init__(self, *args, **kwargs):


why not just make read_file_size a formal parameter?

ttung · 2018-08-01T20:59:32Z


    def __init__(self, *args, **kwargs):
+        """
+        :param read_file_size: optional file size for correctly setting the s3 etag chunk size


so if this is optional, what you probably want to do is to track the total number of bytes sent to ChecksummingBufferedReader. if it would have warranted a larger chunk size, it should throw an exception.

In what case would it merit a larger chunk size?

At the end, if the chunk size is unspecified, you could call get_s3_multipart_chunk_size(..) with the number of bytes that got processed. If that is not S3Etag.default_chunk_size, then the checksum calculated is bad.

Or you could just make the parameter required. :)

Making it required. This might be a pain in some cases, but it will make it easier to understand.

ttung · 2018-08-01T22:34:50Z

-    def __init__(self, *args, **kwargs):
+    def __init__(self, file_handler, read_chunk_size, *args, **kwargs):
+        """
+        :param read_file_size: optional file size for correctly setting the s3 etag chunk size;


remove optional.

Good catch ✅

ttung · 2018-08-01T22:35:02Z

+        """
+        :param read_file_size: optional file size for correctly setting the s3 etag chunk size;
+            defaults to S3Etag default if None
+        :param file_handler: the file handler to read from


reorder this above the docblock for read_file_size

ttung · 2018-08-01T22:35:26Z

-    def __init__(self):
+    default_chunk_size = 64 * 1024 * 1024
+
+    def __init__(self, chunk_size=None):


Pls make chunk_size mandatory.

ttung

small fixes to s3_etag pls, but otherwise lgtm.

ttung · 2018-08-01T23:54:10Z

-    etag_stride = 64 * 1024 * 1024

-    def __init__(self):
+    default_chunk_size = 64 * 1024 * 1024


this is not necessary any more, right?

Doh. Yes. ✅

ttung · 2018-08-01T23:54:22Z

        self._etag_bytes = 0
        self._etag_parts = []
        self._etag_hasher = hashlib.md5()
+        self._chunk_size = chunk_size or self.default_chunk_size


this should just be self._chuck_size = chunk_size

mweiden requested review from sampierson and ttung August 1, 2018 16:29

mweiden force-pushed the mweiden-dcp-cli-issue149 branch 2 times, most recently from b7ea160 to 1b30f0a Compare August 1, 2018 17:23

mweiden changed the title ~~Config multipart chunk size in s3 etag and checksumming buffered reader~~ [WIP] Config multipart chunk size in s3 etag and checksumming buffered reader Aug 1, 2018

mweiden force-pushed the mweiden-dcp-cli-issue149 branch 2 times, most recently from 9774de6 to bc20de4 Compare August 1, 2018 18:07

mweiden changed the title ~~[WIP] Config multipart chunk size in s3 etag and checksumming buffered reader~~ Config multipart chunk size in s3 etag and checksumming buffered reader Aug 1, 2018

mweiden mentioned this pull request Aug 1, 2018

Use s3 multipart constants as defined in dcplib HumanCellAtlas/dcp-cli#150

Merged

mweiden force-pushed the mweiden-dcp-cli-issue149 branch from bc20de4 to 2bfd956 Compare August 1, 2018 18:51

ttung requested changes Aug 1, 2018

View reviewed changes

mweiden force-pushed the mweiden-dcp-cli-issue149 branch 3 times, most recently from 12c6ea2 to 57f346e Compare August 1, 2018 21:59

ttung requested changes Aug 1, 2018

View reviewed changes

mweiden force-pushed the mweiden-dcp-cli-issue149 branch from 57f346e to 5dfba2e Compare August 1, 2018 22:59

ttung approved these changes Aug 1, 2018

View reviewed changes

mweiden added 2 commits August 2, 2018 08:03

Config multipart chunk size in s3 etag and checksumming buffered reader

23499a9

Bump version to 1.3.2

6554c47

mweiden force-pushed the mweiden-dcp-cli-issue149 branch from 5dfba2e to 6554c47 Compare August 2, 2018 15:04

ttung approved these changes Aug 2, 2018

View reviewed changes

mweiden merged commit 574640c into master Aug 2, 2018

mweiden deleted the mweiden-dcp-cli-issue149 branch August 2, 2018 21:33


		class ChecksummingBufferedReader:

		def __init__(self, args, *kwargs):

Conversation

mweiden commented Aug 1, 2018

Uh oh!

codecov-io commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mweiden Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ttung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Aug 1, 2018 •

edited

Loading

mweiden Aug 1, 2018 •

edited

Loading