Config multipart chunk size in s3 etag and checksumming buffered reader#2
Config multipart chunk size in s3 etag and checksumming buffered reader#2
Conversation
b7ea160 to
1b30f0a
Compare
Codecov Report
@@ Coverage Diff @@
## master #2 +/- ##
==========================================
+ Coverage 92.69% 92.73% +0.04%
==========================================
Files 10 10
Lines 301 303 +2
==========================================
+ Hits 279 281 +2
Misses 22 22
Continue to review full report at Codecov.
|
9774de6 to
bc20de4
Compare
bc20de4 to
2bfd956
Compare
|
|
||
| class ChecksummingBufferedReader: | ||
|
|
||
| def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
why not just make read_file_size a formal parameter?
|
|
||
| def __init__(self, *args, **kwargs): | ||
| """ | ||
| :param read_file_size: optional file size for correctly setting the s3 etag chunk size |
There was a problem hiding this comment.
so if this is optional, what you probably want to do is to track the total number of bytes sent to ChecksummingBufferedReader. if it would have warranted a larger chunk size, it should throw an exception.
There was a problem hiding this comment.
In what case would it merit a larger chunk size?
There was a problem hiding this comment.
At the end, if the chunk size is unspecified, you could call get_s3_multipart_chunk_size(..) with the number of bytes that got processed. If that is not S3Etag.default_chunk_size, then the checksum calculated is bad.
Or you could just make the parameter required. :)
There was a problem hiding this comment.
Making it required. This might be a pain in some cases, but it will make it easier to understand.
12c6ea2 to
57f346e
Compare
| def __init__(self, *args, **kwargs): | ||
| def __init__(self, file_handler, read_chunk_size, *args, **kwargs): | ||
| """ | ||
| :param read_file_size: optional file size for correctly setting the s3 etag chunk size; |
| """ | ||
| :param read_file_size: optional file size for correctly setting the s3 etag chunk size; | ||
| defaults to S3Etag default if None | ||
| :param file_handler: the file handler to read from |
There was a problem hiding this comment.
reorder this above the docblock for read_file_size
| def __init__(self): | ||
| default_chunk_size = 64 * 1024 * 1024 | ||
|
|
||
| def __init__(self, chunk_size=None): |
57f346e to
5dfba2e
Compare
ttung
left a comment
There was a problem hiding this comment.
small fixes to s3_etag pls, but otherwise lgtm.
| etag_stride = 64 * 1024 * 1024 | ||
|
|
||
| def __init__(self): | ||
| default_chunk_size = 64 * 1024 * 1024 |
There was a problem hiding this comment.
this is not necessary any more, right?
| self._etag_bytes = 0 | ||
| self._etag_parts = [] | ||
| self._etag_hasher = hashlib.md5() | ||
| self._chunk_size = chunk_size or self.default_chunk_size |
There was a problem hiding this comment.
this should just be self._chuck_size = chunk_size
5dfba2e to
6554c47
Compare
Connects to HumanCellAtlas/dcp-cli#149
Blocks HumanCellAtlas/dcp-cli#150