Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix resumable downloads crash when gzip #6996

Merged

Conversation

memsharded
Copy link
Member

@memsharded memsharded commented May 11, 2020

Changelog: Bugfix: Resumable download introduced a bug when there is a fronted (like Apache) to Artifactory or other server that gzips the returned files, returning an incorrect Content-Length header that doesn't match the real content length.
Docs: Omit

Fix: #6984

This is a fix to #6984, but it would be good to implement something more robust that handles the compression from servers in a better way.

@memsharded memsharded added this to the 1.25.1 milestone May 11, 2020
@@ -120,19 +120,17 @@ def get_total_length():
progress.update(read_response(chunk_size)),
file_path
)

gzip = (response.headers.get("content-encoding") == "gzip")
Copy link
Contributor

@flashdagger flashdagger May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider other possible compression algorithms?

Suggested change
gzip = (response.headers.get("content-encoding") == "gzip")
is_compressed = (response.headers.get("content-encoding") in {"gzip", "deflate", "br"})

You could test them e.g. here: https://httpbin.org/#/Response_formats

As an alternative you could compute the total_length with len(response.content) in those cases, ignoring the Content-Length from headers...

Copy link
Member Author

@memsharded memsharded May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just "revert" that regression, and we could open a new ticket for improving this for next releases. These other compressions types haven't been an issue so far (they would have raised exceptions in the past, just by using them), so I think this is ok as-is.

jgsogo
jgsogo approved these changes May 12, 2020
Copy link
Member

@jgsogo jgsogo left a comment

It looks like we are not the only ones with this problem and it is something we cannot fix from the client-side [1] (unless we do not request gzip from the client?). We can merge this one if we are going to release 1.25 to fix the bug/regression (excluding other compression types as @flashdagger suggests probably is needed).


[1] https://serverfault.com/questions/183843/content-length-not-sent-when-gzip-compression-enabled-in-apache

@flashdagger
Copy link
Contributor

@flashdagger flashdagger commented May 12, 2020

So my understanding is this:

You can only check if you received the full data requested if:

  • Content-Encoding is "identity" (the response default)
  • Content-Length is sent in response

If the content body is sent compressed (e.g. using gzip) Content-Length represents the compressed size (what was sent over the wire) or is even omitted. In that case transmission should fail anyway if data is missing. Not a case we should consider. So we cannot resume compressed responses since we don't read from the raw socket and let the requests library do the decompression.

@memsharded memsharded merged commit 1216c50 into conan-io:release/1.25 May 12, 2020
2 checks passed
@memsharded memsharded deleted the fix/resumable_download_gzip branch May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants