HTTP: respect gzip in serialization#4957
Conversation
0b939bd to
e6a14ee
Compare
| end | ||
|
|
||
| def self.serialize_headers_and_string_body(io, headers, body) | ||
| if headers.includes_word?("Content-Encoding", "gzip") |
There was a problem hiding this comment.
We should compare with == instead of includes_word? here to match what we do when decompressing (we only accept "gzip" exactly). This is because Content-Encoding can represent multiple encodings applied in sequence. We only want to support this when the only encoding is gzip - not just when gzip is part of the encoding sequence. THis is very rare in practice, but should be in consideration.
There was a problem hiding this comment.
Nice! You are right. I'll fix it. Thanks.
| if headers.includes_word?("Content-Encoding", "gzip") | ||
| body = IO::Memory.new.tap do |buf| | ||
| gzip = Gzip::Writer.new(buf) | ||
| gzip.print body |
There was a problem hiding this comment.
I think this is a bit confusing with the two bodies and the tap, could you clean this up?
There was a problem hiding this comment.
Yup. Finally I got two candidates about this.
body = IO::Memory.new.tap do |buf|
Gzip::Writer.open buf, &.print body
endand
buf = IO::Memory.new
Gzip::Writer.open buf, &.print body
body = bufI like the latter. Because although the code is a bit redundant, it's easy to read due to no nested bodies. Right?
There was a problem hiding this comment.
I would format it a bit nicer though, the whitespace is very confusing here.
buf = IO::Memory.new
Gzip::Writer.open(buf, &.print(body))
body = buf|
I think there's actually a bunch of cases where we use |
|
|
||
| it "gzip body if has 'gzip' in Content-Encoding header" do | ||
| gzipped_body = String.build do |io| | ||
| Gzip::Writer.new(io).tap { |io| io.print "Hello"; io.close } |
There was a problem hiding this comment.
You can do:
Gzip::Writer.open io, &.print("Hello")|
I'm not sure this is OK. What if I send an already compressed gzip to HTTP::Client and add a |
|
asterite:
Yes. I understood what you are worried about. Indeed, there can be a case where the http server already has a compressed body and set We must care it. |
|
@maiha raven.cr does exactly that. |
|
I realized that the essence of the problem is that there would be a contradiction between
So it'd be better
Here is an image for https://github.com/crystal-lang/crystal/blob/master/src/http/common.cr#L45-L48 encoding = headers["Content-Encoding"]?
case encoding
when "gzip"
body = Gzip::Reader.new(body, sync_close: true)
+ headers.delete("Content-Encoding")Thought? |
|
I like the idea of the headers being consistent with the way the HTTP body appears to the reader, instead of being the original HTTP headers. Although keeping the original HTTP headers around would probably be a good idea. It has the possibility of being confusing though with 2 sets of HTTP headers. It should be well-documented. |
|
Go seems to remove the header but set a boolean in the response indicating that this happened. Check here https://golang.org/pkg/net/http/ for Go doesn't automatically compress content, and neither should we. |
|
Thinking about it further, we already have the |
|
Yup, I agreed that we should remove headers but should not compress body in |
|
Can we either simply force-push this PR with the new implementation, or close this now? We tend to accumulate a lot of "resolved" issues and i'd like to be more proactive with closing. |
|
I see. Close this first. I'll create an another PR when it's ready. Thanks. |
This allows
HTTP::Client::Responseto provide transparent transformations betweento_ioandfrom_ioabout gzip.case of
Content-Encoding: gzipAssume that API server returned a http response with
binary(gzipped data)body.from_ioto_iofrom_iotextInvalid gzip headerto_iocan't restore original response data.This PR
to_iogenerates gzippedbinarywhenContent-Encoding: gzip, otherwise generatestextin the same way as before.from_iobinaryto_iobinaryfrom_iobinarybinaryto_iobinaryThanks.