Conversation
|
Successful retry & resume. |
|
maxing out on attempts. |
Codecov Report
@@ Coverage Diff @@
## tonytung-break #101 +/- ##
==================================================
- Coverage 87.93% 86.56% -1.37%
==================================================
Files 29 29
Lines 978 1020 +42
==================================================
+ Hits 860 883 +23
- Misses 118 137 +19
Continue to review full report at Codecov.
|
|
|
||
| for chunk in response.iter_content(chunk_size=1024*1024): | ||
| if chunk: | ||
| fh.write(chunk) |
There was a problem hiding this comment.
Use checksummingio.ChecksummingSink to compute the checksum as you download.
e129216 to
b388f1e
Compare
|
Modified to use |
ac2fc54 to
aaae6f3
Compare
|
Is there a way to get the checksummer to only compute sha256? I'm a bit concerned about the extra overhead of the other checksums. |
|
|
||
| while consume_bytes > 0: | ||
| bytes_to_read = min(consume_bytes, 1024*1024) | ||
| content = response.iter_content(chunk_size=bytes_to_read) |
There was a problem hiding this comment.
Is there a reason why you use response.iter_content instead of response.raw.read?
There was a problem hiding this comment.
It's a documented API. response.raw.read is not.
Also, the implementation of response.iter_content seems to branch on whether the backend is urllib3 or not, and in the case of urllib3, it doesn't even call response.raw.read.
There was a problem hiding this comment.
Hmm ok. I think it is documented (http://docs.python-requests.org/en/master/api/#requests.Response.raw and http://urllib3.readthedocs.io/en/latest/reference/index.html#urllib3.response.HTTPResponse.read) and I'm vaguely unhappy with the extra indirection happening here for no obvious benefit, but I guess it's OK.
There was a problem hiding this comment.
So as an example, iter_content(..) in requests calls raw.stream(decode_content=True) for urllib3. If I just call read(..), it will not call with decode_content=True.
This allows us to do ranged gets rather than restart the entire download. Verify the resulting download for extra correctness. Test plan: Start a download, yank the ethernet cable, plug it back in, and watch the download succeed.
updated to just use |
kislyuk
left a comment
There was a problem hiding this comment.
LGTM. For reference, the retry management logic is also available via https://github.com/shazow/urllib3/blob/master/urllib3/util/retry.py, for example it's being used here: https://github.com/HumanCellAtlas/dcp-cli/blob/master/hca/dss/__init__.py#L45-L58 - you may want to consider using that instead of rolling your own.
I tried it, but in my crude experimentation with Network Line Conditioner, the number of retries can quickly be exhausted and the transfer fails. My approach is more like TCP, where some data making it through increases forgiveness. |
This allows us to do ranged gets rather than restart the entire download.
Verify the resulting download for extra correctness.
Test plan: Start a download, yank the ethernet cable, plug it back in, and watch the download succeed.