Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic decompression does not remove related response headers #5182

Closed
michael-o opened this issue Apr 4, 2020 · 18 comments
Closed

Automatic decompression does not remove related response headers #5182

michael-o opened this issue Apr 4, 2020 · 18 comments

Comments

@michael-o
Copy link
Contributor

@michael-o michael-o commented Apr 4, 2020

This comes from https://stackoverflow.com/a/61011806/696632.

Running on:

curl 7.69.1 (amd64-portbld-freebsd12.1) libcurl/7.69.1 OpenSSL/1.1.1e zlib/1.2.11 nghttp2/1.40.0
Release-Date: 2020-03-11
Protocols: file http https smtp smtps
Features: AsynchDNS GSS-API HTTP2 HTTPS-proxy Kerberos Largefile libz SPNEGO SSL UnixSockets

Consider the following:

$ curl 'https://deblndw011x.ad001.siemens.net/~osipovmi/random.txt'  --verbose -s -H "Accept-Encoding: gzip" --compressed | head
* Uses proxy env variable NO_PROXY == 'localhost .siemens.net .siemens.com .siemens.de'
* Connected to deblndw011x.ad001.siemens.net (147.54.64.17) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs/
...
*  SSL certificate verify ok.
} [5 bytes data]
> GET /~osipovmi/random.txt HTTP/1.1
> Host: deblndw011x.ad001.siemens.net
> User-Agent: curl/7.69.1
> Accept: */*
> Accept-Encoding: gzip
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 04 Apr 2020 12:17:57 GMT
< Server: Apache/2.4.41 (FreeBSD) OpenSSL/1.1.1e-freebsd PHP/7.2.29 SVN/1.10.6 mod_auth_gssapi/1.6.1 mod_wsgi/4.7.0 Python/3.7
< X-Frame-Options: SAMEORIGIN
< Last-Modified: Sat, 04 Apr 2020 12:01:06 GMT
< ETag: "31e6c-5a275c9efbb51-gzip"
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Content-Length: 56426
< Content-Type: text/plain
<
{ [10 bytes data]
Quasi eos necessitatibus corrupti enim illo. Exercitationem quaerat eligendi dignissimos praesentium cumque cum. Quibusdam nulla architecto iusto et velit tempora vel. Reprehenderit et et minima nihil. Est nobis maiores voluptatibus veritatis accusamus autem quo. Maiores voluptate reiciendis eum.

Architecto sed eos totam at unde aspernatur reprehenderit. Aut consectetur ut est necessitatibus voluptatem eum voluptas dolores. Autem quis qui eaque.

A consequuntur sapiente illum. Quam quis et deleniti qui vel cumque in harum. Minima hic reprehenderit autem quibusdam error magnam quam quod.

Veritatis esse molestiae quo. Suscipit quos nihil et. Sed rerum alias ut beatae doloremque rerum. Autem ut magnam id. Rem ut sit veniam praesentium.

Tempora laborum optio illo perspiciatis amet. Nemo natus facilis et exercitationem. Recusandae quasi sunt consequatur et aut quia.

* Failed writing body (0 != 9405)
 14 56426   14  8106    0     0   129k      0 --:--:-- --:--:-- --:--:--  129k
* Closing connection 0
} [5 bytes data]
* TLSv1.3 (OUT), TLS alert, close notify (256):
} [2 bytes data]
curl: (23) Failed writing body (0 != 9405)

What happens here:

  1. I do request HTTPd to compress the random.txt which is generated by Text::Lorem.
  2. I request curl to automatically decompress
  3. The headers related to (gzip) compression are retained although the client receives completely decompressed content. So the compression is not visible for the client. Apache HttpClient removes them.

I'd expect curl to do the same. (a bug?)

WDYT?

@bagder
Copy link
Member

@bagder bagder commented Apr 4, 2020

the compression is not visible for the client

Isn't curl the client? Who's the client that needs this to "be visible" ?

I'd expect curl to do the same

curl has no logic to ever remove headers from the output and it has never done this. I wouldn't call it a bug.

It's like when you have curl handle chunked-encoding, the chunked headers are still in the output even though curl "decoded" the body....

@bagder bagder added the HTTP label Apr 4, 2020
@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 4, 2020

the compression is not visible for the client

Isn't curl the client? Who's the client that needs this to "be visible" ?

My bad, this should have been "consumer which uses curl".

I'd expect curl to do the same

curl has no logic to ever remove headers from the output and it has never done this. I wouldn't call it a bug.

The problem is that if you compare the content length header value and the real value on disk, they do not match. Some consumers might trip here because they will consider this as a transport error.

It's like when you have curl handle chunked-encoding, the chunked headers are still in the output even though curl "decoded" the body....

That is true, but no transformation happens here and no content length is provided which might trick the client.

@bagder
Copy link
Member

@bagder bagder commented Apr 4, 2020

no transformation happens here and not content length is provided which might trick the client.

The headers and the body don't match, which is my point. The headers would imply something that the body isn't syntactically correct for. Very similar to the compression case. I think we can come up with more examples. curl doesn't ever inhibit or "filter" individual headers. Headers are provided exactly as sent from the server.

@accountantM
Copy link

@accountantM accountantM commented Apr 4, 2020

The problem is that if you compare the content length header value and the real value on disk, they do not match. Some consumers might trip here because they will consider this as a transport error.

The Content-Length header is not the size of the decompressed response on the desk, it's just a header sent from the server, and as a consumer of an httpclient I expect to see my headers when I say something like myHttpClient.showMeAllHeaders(), I don't expect my httpClient to remove them or filter them. Or at least give me an access to the raw headers received from the server :(

The size of the response on disk after decompression is something else and the consumer is aware that it has nothing to do with the http response headers.

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 5, 2020

no transformation happens here and not content length is provided which might trick the client.

The headers and the body don't match, which is my point. The headers would imply something that the body isn't syntactically correct for. Very similar to the compression case. I think we can come up with more examples. curl doesn't ever inhibit or "filter" individual headers. Headers are provided exactly as sent from the server.

I don't think that content encoding and transfer encoding can be compared. TE is a inheritable part of HTTP. You'd never expose the chunked stream to a consumer, would you? But CE is perfactly valid because the client asks for it with Accept-Encoding. You cannot ask for transfer encoding or a specific content length.

@bagder
Copy link
Member

@bagder bagder commented Apr 5, 2020

It doesn't matter if you don't think they compare, this is how curl works and the delivery of the headers exactly verbatim as the server sent them was always intended. Changing this behavior would be a modified behavior that would need to be opted in by users.

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 5, 2020

It doesn't matter if you don't think they compare, this is how curl works and the delivery of the headers exactly verbatim as the server sent them was always intended. Changing this behavior would be a modified behavior that would need to be opted in by users.

Don't they do that by supplying --compressed?

@bagder
Copy link
Member

@bagder bagder commented Apr 5, 2020

Request a compressed response using one of the algorithms curl supports, and save the uncompressed document.

There's not a word about stripping or updating headers there and since we haven't done that for the 17 years we've supported that command line option so far, I think we can say that the current behavior is the established that we can't break without careful consideration.

@bagder
Copy link
Member

@bagder bagder commented Apr 8, 2020

This is working as intended. Closing.

@bagder bagder closed this as completed Apr 8, 2020
@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 8, 2020

This is working as intended. Closing.

I concur, at least documentation should be updated for this.

@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented Apr 9, 2020

I concur, at least documentation should be updated for this.

Why? The documentation should mention any modification done to the headers (and there's none). It's kinda reasonable to assume no other modifications than specified. Curl promises to only to decode the body.

@bagder
Copy link
Member

@bagder bagder commented Apr 9, 2020

at least documentation should be updated for this.

Please suggest!

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 9, 2020

I concur, at least documentation should be updated for this.

Why? The documentation should mention any modification done to the headers (and there's none). It's kinda reasonable to assume no other modifications than specified. Curl promises to only to decode the body.

The modification happens on the body.

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 9, 2020

at least documentation should be updated for this.

Please suggest!

Working on it.

@jzakrzewski
Copy link
Contributor

@jzakrzewski jzakrzewski commented Apr 9, 2020

The modification happens on the body.

Yep. And is documented to happen under certain conditions. There is nothing documented to happen with the incoming headers.

But I guess if there's still someone confused, it shouldn't hurt anybody to add "headers remain untouched" line to the documentation.

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 9, 2020

Here is my proposal:

(HTTP) Request a compressed response using one of the algorithms curl supports, and save the uncompressed resource. The response headers Content-Encoding and Content-Length will not correspond to the uncompressed resource in such a case. If this option is used and the server sends an unsupported encoding, curl will report an error.

@bagder
Copy link
Member

@bagder bagder commented Apr 9, 2020

This is pretty clear, but the way it is written makes the reader maybe think the headers are updated in other cases and not just for this. How about...

Associated received response headers may of course not match the processed, uncompressed body, in regards to content size etc.

@michael-o
Copy link
Contributor Author

@michael-o michael-o commented Apr 9, 2020

This is pretty clear, but the way it is written makes the reader maybe think the headers are updated in other cases and not just for this. How about...

Associated received response headers may of course not match the processed, uncompressed body, in regards to content size etc.

Fully acceptable.

bagder added a commit that referenced this issue Apr 11, 2020
bagder added a commit that referenced this issue Apr 11, 2020
bagder added a commit that referenced this issue Apr 12, 2020
Suggested-by: Michael Osipov
Assisted-by: Jay Satiro
Bug: #5182 (comment)
Closes #5217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants