Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large (~13.8kb) HTTP gzip responses aren't decompressed #11252

Open
syeopite opened this issue Sep 28, 2021 · 8 comments
Open

Large (~13.8kb) HTTP gzip responses aren't decompressed #11252

syeopite opened this issue Sep 28, 2021 · 8 comments
Labels
help wanted This issue is generally accepted and needs someone to pick it up topic:stdlib:networking

Comments

@syeopite
Copy link
Contributor

syeopite commented Sep 28, 2021

Bug Report

If I have this server:

require "http/server"
require "compress/gzip"

server = HTTP::Server.new do |context|
  rand_ = Random.new(1234567899).hex(7071)
  io = IO::Memory.new(rand_)

  Compress::Gzip::Writer.open(context.response) do |gzip|
    IO.copy(io, gzip)
  end

  context.response.headers["content-encoding"] = "gzip"
  context.response.content_type = "text/plain"
end

address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen

And a client requesting it like so:

require "http"
require "json"
require "uri"

client = HTTP::Client.new(URI.parse("http://localhost:8080"))
client.compress = true
results = client.get("/")

puts results.body

I'd expect Crystal to decompress the response automatically, and give me the resulting data. While this is the behavior for smaller responses, it seems like anything larger than ~13.8kb causes this step to be skipped. The above code would just return random gibberish binary data, instead of the long hex string.

@ysbaddaden
Copy link
Contributor

I reported a similar issue to @straight-shoota yesterday where HTTP:Client response's body is binary. I have yet to determine whether it's gibberish or if it didn't deflate the gzip encoding. It's very intermittent, and happens for small data (1kb) for a specific HTTP domain (api.airtable.com). Other domains used are fine.

@straight-shoota
Copy link
Member

straight-shoota commented Sep 28, 2021

The issue here is that you write the content to the HTTP response before setting the Content-Encoding header. That means the content is sent without that header and the client has no idea that it is supposed to decompress it. You can test that with any HTTP client.

It works with smaller content sizes because the data is buffered.

You have to move the configuration lines setting content encoding and content type above the IO copy.

@ysbaddaden This appears to be unrelated to your problem.

@straight-shoota
Copy link
Member

This is not a bug, but I believe we can improve the developer experience by trying to raise an error on incorrect usage.

Response#content_type= could easily raise an error if headers have already been written (we already have check_headers for that).

This is not easy to implement with setting arbitrary headers (such as in response.headers["content-encoding"] = "gzip") because the mutation happens in the Headers object which has no reference to the request and thus can't know whether it still accepts headers.

Maybe we could consider adding set_header and get_header methods to Response which could check for headers written like content_type.

@syeopite
Copy link
Contributor Author

syeopite commented Sep 28, 2021

The issue here is that you write the content to the HTTP response before setting the Content-Encoding header. That means the content is sent without that header and the client has no idea that it is supposed to decompress it. You can test that with any HTTP client.

Oh thanks! But, the server code above was actually an attempt to replicate the issue I was getting locally, and it looks like I messed up on that. Oops 😅. From your explanation, the actual problem I'm getting might be more related to what @ysbaddaden is experiencing.

But the gist of it is the same; HTTP::Client doesn't seem to decompress correctly. Here's a example source code that relies on the website I was querying:

require "http"
require "json"
require "uri"

client = HTTP::Client.new(URI.parse("https://www.youtube.com"))
client.compress = true

post_data = {"context" => {"client" => {"hl" => "en", "gl" => "US", "clientName" => "WEB",
                                        "clientVersion" => "2.20210721.00.00", "clientScreen" => "WATCH_FULL_SCREEN"}},
"continuation" => "4qmFsgJIEhhVQ1h1cVNCbEhBRTZYdy15ZUpBMFR1bncaLEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUE2Z01JUTJkU1JGRlZSVGs9",
}.to_json

yt_headers = HTTP::Headers{
  "Content-Type"    => "application/json",
  "Accept-Encoding" => "gzip",
}

results = client.post("/youtubei/v1/browse?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8", body: post_data,
  headers: yt_headers)

# Should be string but instead it's some random binary data
results.body

This returns gibberish binary data, but replicating the same query on curl does result in the correct (and decompressed) JSON response.

@straight-shoota
Copy link
Member

straight-shoota commented Sep 28, 2021

You're actively opting out of the automatic decompression by setting the Accept-Encoding header.
This is documented here: https://crystal-lang.org/api/1.1.1/HTTP/Client.html#compression

You don't need to assign client.compress = true because that's the default. But you must not set a custom Accept-Encoding header or the client won't automatically decompress.

@SamantazFox
Copy link

You're actively opting out of the automatic decompression by setting the Accept-Encoding header. This is documented here: https://crystal-lang.org/api/1.1.1/HTTP/Client.html#compression

Though, it is not explained that passing the Accept-Encoding header disables the automatic decompression!

@straight-shoota
Copy link
Member

If [...] no ˋAccept-Encodingˋ header is explicitly specified, ...

@BrucePerens
Copy link

BrucePerens commented Nov 12, 2021

I am hitting this or #11354, In the body the first bytes in the returned string are reported as "\u001F\x8B". I'm a lttle confused about whether this is reporting an extra zero byte or not. The Conttent-Encoding header says "gzip". It's intermittent - does not always occurr on the same server response.

@straight-shoota straight-shoota added the help wanted This issue is generally accepted and needs someone to pick it up label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted This issue is generally accepted and needs someone to pick it up topic:stdlib:networking
Projects
None yet
Development

No branches or pull requests

5 participants