Skip to content

When compressing responses with deflate encoding, output doesn't include Zlib headers. #4506

Closed
@PaulJuliusMartinez

Description

Long story short

The wbits argument to zlib.compressobj in this method should be zlib.MAX_WBITS when using the deflate encoding, rather than -zlib.MAX_WBITS.

Expected behaviour

The deflate encoding as specified in RFC2616 states the following:

   deflate
        The "zlib" format defined in RFC 1950 [31] in combination with
        the "deflate" compression mechanism described in RFC 1951 [29].

The zlib website attempts to clear up the confusion:

What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?
"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specification in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.

RFC7230 additionally notes:

Note: Some non-conformant implementations send the "deflate" compressed data without the zlib wrapper.

The documentation for Python zlib library explains how to produce different outputs:

zlib.compressobj(level=-1, method=DEFLATED, wbits=MAX_WBITS, memLevel=DEF_MEM_LEVEL, strategy=Z_DEFAULT_STRATEGY[, zdict])

...

The wbits argument controls the size of the history buffer (or the “window size”) used when compressing data, and whether a header and trailer is included in the output. It can take several ranges of values, defaulting to 15 (MAX_WBITS):

  • +9 to +15: The base-two logarithm of the window size, which therefore ranges between 512 and 32768. Larger values produce better compression at the expense of greater memory usage. The resulting output will include a zlib-specific header and trailer.

  • −9 to −15: Uses the absolute value of wbits as the window size logarithm, while producing a raw output stream with no header or trailing checksum.

  • +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the window size logarithm, while including a basic gzip header and trailing checksum in the output.

I've added emphasis to the key points regarding the use of the 9 to 15 range, vs the -9 to -15 range; using a negative value for wbits produces output with no headers, while using a positive value will produce the zlib-specific headers, as specified in the actual HTTP spec.

Actual behaviour

aiohttp is returning raw deflate output, which is unable to be parsed by spec conforming clients. As the updated HTTP RFC and the zlib website comment note, there are a lot of servers that return data in this format, and as such, many clients gracefully handle it anyway, notably, major browsers, curl (workaround), and Python's urllib3 with an explicit try/catch.

Unfortunately, many Ruby clients will not be able to process the response because Ruby's built-in HTTP library does not fallback to attempting to interpret the data as a raw deflate encoding. This issue has been open for over four years.

Steps to reproduce

The following script emphasizes the difference between the encodings:

import zlib

zlib_compressor = zlib.compressobj(wbits=zlib.MAX_WBITS)
zlib_deflate_output = zlib_compressor.compress(b'hello')
zlib_deflate_output += zlib_compressor.flush()

zlib.decompress(zlib_deflate_output)
# b'hello'

raw_compressor = zlib.compressobj(wbits=-zlib.MAX_WBITS)
raw_deflate_output = raw_compressor.compress(b'hello')
raw_deflate_output += raw_compressor.flush()

zlib.decompress(raw_deflate_output)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# zlib.error: Error -3 while decompressing data: incorrect header check

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions