Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace reqwest's auto-decompression with a custom one #241

Merged
merged 13 commits into from
Apr 16, 2022

Conversation

ducaale
Copy link
Owner

@ducaale ducaale commented Mar 13, 2022

Doing so will allow us to preserve Content-Encoding and Content-Length for compressed responses. This also opens the door for handling responses from servers that misinterpret the deflate encoding as deflate instead of deflate + zlib headers e.g xh example.org accept-encoding:deflate.

One side effect from this PR is that the --download flag no longer decompresses responses but since we are explicitly sending accept-identity: identity and HTTPie doesn't handle it perfectly either, maybe we can ignore it?

$ http httpbin.org/gzip -d

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Encoding: gzip
Content-Length: 216
Content-Type: application/json
Date: Sun, 13 Mar 2022 14:58:59 GMT
Server: gunicorn/19.9.0

Downloading 216.00 B to "gzip.json"
 |
http: error: Incomplete download: size=216; downloaded=279

Done. 279.00 B in 0.00061s (448.50 kB/s)

Fixes #210

@blyxxyz
Copy link
Collaborator

blyxxyz commented Mar 13, 2022

One side effect from this PR is that the --download flag no longer decompresses responses

Neither do wget, curl -O, and aria2c, so maybe it's acceptable. But I don't feel I understand it well enough yet.

@ducaale
Copy link
Owner Author

ducaale commented Mar 19, 2022

I need to figure out how to implement a fallback for decompress_stream so responses with incorrect content-encoding can be handled e.g normal response with content-encoding: gzip.

Edit: This doesn't seem to be a trivial thing so maybe we should look into it in the future.

@ducaale ducaale requested a review from blyxxyz March 26, 2022 11:57
src/printer.rs Outdated Show resolved Hide resolved
@@ -423,14 +428,16 @@ impl Printer {

pub fn print_response_body(
&mut self,
mut response: Response,
response: &mut Response,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this is just the change I needed in #240 i.e not consuming Response so response_meta can be accessed later.

@ducaale ducaale requested a review from blyxxyz April 10, 2022 10:04
Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great! Just a few small suggestions left. Sorry for taking a while to review.

src/printer.rs Outdated Show resolved Hide resolved
src/printer.rs Outdated Show resolved Hide resolved
src/printer.rs Outdated
let encoding = encoding.or_else(|| get_charset(&response));
let encoding = encoding.or_else(|| get_charset(response));
let compression_type = get_compression_type(response.headers());
let mut body = decompress_stream(response, compression_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error for failing to decode changes from error decoding response body: unexpected end of file to failed to fill whole buffer, which is a little worse.

HTTPie's is ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check')), which is messy but informative.

I don't think this is a blocking problem and I don't know how to solve it, but I figured I'd make a note.

Copy link
Owner Author

@ducaale ducaale Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did notice the change in the error message, but I can't remember the steps to reproduce it. Did you set up a server that returns the wrong content-encoding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, with flask:

from flask import Flask

app = Flask(__name__)

@app.route('/badgz')
def badgz():
    return 'bad', {'Content-Encoding': 'gzip'}

Copy link
Owner Author

@ducaale ducaale Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So the piece I was missing is that small responses lead to unexpected end of file failed to fill whole buffer while anything else will return the underlying error i.e corrupt deflate stream or invalid gzip header.

Slightly related, I was thinking of improving decompression error messages by modifying decompress() to return a Decompress/Decode wrapper that implements Read and prepends errors with error decoding response body. However, only io errors can be returned from the Read trait's methods so that is a dead end (I think).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a custom message to an io::Error.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I will try using that now.

tests/cli.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call no_gzip and friends in case the reqwest features get re-enabled somehow.


One side effect from this PR is that the --download flag no longer decompresses responses

How hard would this be to fix? Could we wrap decompress_stream around pb.wrap_read(response) or is there another obstacle I'm missing?

We could even make copy_largebuf return the number of bytes actually written, and print that as an extra message if it doesn't match the progress bar. (But that might be overkill for something the server shouldn't do to begin with.)

tests/cli.rs Outdated Show resolved Hide resolved
@ducaale ducaale force-pushed the disable-reqwest-decompression branch from ac17d1a to cdf9591 Compare April 15, 2022 15:11
@ducaale
Copy link
Owner Author

ducaale commented Apr 15, 2022

We could even make copy_largebuf return the number of bytes actually written, and print that as an extra message if it doesn't match the progress bar. (But that might be overkill for something the server shouldn't do to begin with.)

I will be leaving out this part since I am not sure what is the best way to display the extra information.

@ducaale ducaale requested a review from blyxxyz April 15, 2022 16:48
@ducaale ducaale force-pushed the disable-reqwest-decompression branch from ef4aa70 to 9a6f6fe Compare April 15, 2022 18:49
src/utils.rs Outdated
match self {
Decoder::PlainText(decoder) => decoder.read(buf),
Decoder::Gzip(decoder) => decoder.read(buf).map_err(|e| {
io::Error::new(e.kind(), format!("error decoding response body: {}", e))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention the compression type here?

src/utils.rs Outdated
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match self {
Decoder::PlainText(decoder) => decoder.read(buf),
Decoder::Gzip(decoder) => decoder.read(buf).map_err(|e| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To tell the difference between a decompression error and an I/O error this scheme would be possible:

  • Add another wrapper between GzDecoder and R
  • If R returns an error, set a bool field in that wrapper
  • If GzDecoder returns an error, use get_ref() to check that field
    • If it's false, the error happened while decompressing

I don't know if that's worth the effort.

@ducaale ducaale force-pushed the disable-reqwest-decompression branch from a9763f1 to 7d1a47f Compare April 16, 2022 16:12
Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful!

@ducaale ducaale merged commit 5f2e7a3 into develop Apr 16, 2022
@ducaale ducaale deleted the disable-reqwest-decompression branch April 16, 2022 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants