New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix resumable downloads crash when gzip #6996
fix resumable downloads crash when gzip #6996
Conversation
@@ -120,19 +120,17 @@ def get_total_length(): | |||
progress.update(read_response(chunk_size)), | |||
file_path | |||
) | |||
|
|||
gzip = (response.headers.get("content-encoding") == "gzip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also consider other possible compression algorithms?
gzip = (response.headers.get("content-encoding") == "gzip") | |
is_compressed = (response.headers.get("content-encoding") in {"gzip", "deflate", "br"}) |
You could test them e.g. here: https://httpbin.org/#/Response_formats
As an alternative you could compute the total_length with len(response.content)
in those cases, ignoring the Content-Length from headers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just "revert" that regression, and we could open a new ticket for improving this for next releases. These other compressions types haven't been an issue so far (they would have raised exceptions in the past, just by using them), so I think this is ok as-is.
It looks like we are not the only ones with this problem and it is something we cannot fix from the client-side [1] (unless we do not request gzip
from the client?). We can merge this one if we are going to release 1.25
to fix the bug/regression (excluding other compression types as @flashdagger suggests probably is needed).
So my understanding is this: You can only check if you received the full data requested if:
If the content body is sent compressed (e.g. using gzip) Content-Length represents the compressed size (what was sent over the wire) or is even omitted. In that case transmission should fail anyway if data is missing. Not a case we should consider. So we cannot resume compressed responses since we don't read from the raw socket and let the requests library do the decompression. |
Changelog: Bugfix: Resumable download introduced a bug when there is a fronted (like Apache) to Artifactory or other server that gzips the returned files, returning an incorrect
Content-Length
header that doesn't match the real content length.Docs: Omit
Fix: #6984
This is a fix to #6984, but it would be good to implement something more robust that handles the compression from servers in a better way.