Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_decompress=False modified binary data somehow #3182

Closed
bmbouter opened this issue Aug 9, 2018 · 7 comments
Closed

auto_decompress=False modified binary data somehow #3182

bmbouter opened this issue Aug 9, 2018 · 7 comments
Labels
invalid This doesn't seem right outdated

Comments

@bmbouter
Copy link
Contributor

bmbouter commented Aug 9, 2018

Long story short

Use aiohttp with auto_decompress=False and try to download binary data. Here is a Python snippet example also in this gist.

import asyncio
import aiohttp

async def my_download():
    with open('/tmp/somefileintmp', 'wb') as the_file:
        async with aiohttp.ClientSession(auto_decompress=False) as session:
            async with session.get('https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf_reader-0.1-py2-none-any.whl') as resp:
                while True:
                    chunk = await resp.content.read(1024 * 1024)
                    if not chunk:
                        break  # the download is done
                    the_file.write(chunk)

loop = asyncio.get_event_loop()
loop.run_until_complete(my_download())
loop.close()

Now let's download it with wget and do some sha256 comparisons. Run these shell commmands:

$ sha256sum /tmp/somefileintmp 
ecaca570978364bcec8e0a4c2796361aecd6600c0d9f949cefba3b2afbdf4141  /tmp/somefileintmp
$ wget https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf_reader-0.1-py2-none-any.whl
$ sha256sum shelf_reader-0.1-py2-none-any.whl
2eceb1643c10c5e4a65970baf63bde43b79cbdac7de81dae853ce47ab05197e9  shelf_reader-0.1-py2-none-any.whl

Expected behaviour

I expect when auto_decompress=False for the data to be read without alteration.

Actual behaviour

The data is being transformed into a gzip file. At least that is what the file utility says. When I run file on both downloaded version I get:

$ file /tmp/somefileintmp 
/tmp/somefileintmp: gzip compressed data, from Unix
$ file ~/shelf_reader-0.1-py2-none-any.whl 
/home/vagrant/shelf_reader-0.1-py2-none-any.whl: Zip archive data, at least v2.0 to extract

Steps to reproduce

Run the scripts above in a Python 3.5 environment.

Your environment

I'm using aiohttp v3.3.2 as a client.

@bmbouter
Copy link
Contributor Author

When the server responds with response headers, e.g. 'Content-Type': 'application/octet-stream', the aiohttp written data checksums match. e.g. this url.

When the server responds with 'Content-Encoding': 'gzip' the checksum does not match. e.g. this url.

@bmbouter
Copy link
Contributor Author

bmbouter commented Aug 10, 2018

In my debugging I've confirmed that in the troubling case (compression='gzip' and auto_decompress=False), the DeflateBuffer is not used by verifying this line is not called.

Since the compressions part of the payload handling is not being called, the difference must be some kind of bug in the feed_data() handler.

@webknjaz
Copy link
Member

I'm not sure what's wrong here. What else would you expect from writing raw gzipped bytes to disk?

@asvetlov
Copy link
Member

You download raw gzipped content by aiohttp but uncompressed one by wget.
ungzip a file downloaded by aiohttp and you'll get the same result as wget.

@asvetlov asvetlov added the invalid This doesn't seem right label Aug 10, 2018
@bmbouter
Copy link
Contributor Author

Thank you for looking at this. Now that I have found the root cause, I can confirm that (for me) all the aiohttp features are working as expected.

The root cause was that the test server was serving .tar.gz files and incorrectly setting content-encoding: gzip even though the tar.gz data is pre-compressed. Here is the response from the server:

[bmbouter@localhost pulp_python]$ curl -I https://repos.fedorapeople.org/pulp/pulp/fixtures/python-pypi/packages/shelf-reader-0.1.tar.gz
HTTP/1.1 200 OK
Date: Mon, 13 Aug 2018 18:55:49 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Last-Modified: Mon, 13 Aug 2018 02:53:10 GMT
ETag: "4a99-57348319d1186"
Accept-Ranges: bytes
Content-Length: 19097
Cache-Control: max-age=1800
Expires: Mon, 13 Aug 2018 19:25:49 GMT
X-GitProject: (null)
AppTime: D=191
AppServer: people02.fedoraproject.org
Content-Encoding: gzip
Content-Type: text/plain; charset=UTF-8

Also this blog post seems to describe this webserver's behavior exactly.

@asvetlov
Copy link
Member

Good!

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
invalid This doesn't seem right outdated
Projects
None yet
Development

No branches or pull requests

3 participants