Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --preserve-encoding option to keep response body encoding #294

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

trueb2
Copy link

@trueb2 trueb2 commented Nov 29, 2022

Changes

Add a --download-encoding flag to choose an Accept-Encoding header other than "identity" during a download request. During the download, the "decompress" function that reads the response and writes it to the output will passthrough data in whatever Content-Encoding was received if --download-encoding was specified.

Motivation

When downloading a large file, I want to be able to transfer it compressed and leave it compressed. For example, stream download a 500MiB CSV file in the gzip encoding, saving it to a data.csv.gz file.

jwtrueb@jmbp xh % cargo run --release -- get http://localhost:3000/download/csv/asdf -d --download-encoding gzip
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Type: application/csv
Date: Tue, 29 Nov 2022 23:17:12 GMT
Transfer-Encoding: chunked
Vary: accept-encoding

Downloading to "asdf"
Done. 147B in 0.00177s (80.92KiB/s)
jwtrueb@jmbp xh % gzcat asdf
ts,user_id,device_id,source,pressure,temperature,humidity
2022-11-28T20:13:12.704030+00:00,2,5,1,99999.0,25.3,16.4
2022-11-28T20:14:02.446438+00:00,2,5,1,99699.0,25.1,16.2

@trueb2 trueb2 changed the title Add request and keep encoding for download body flag Add flag to request and keep encoding for a download body Nov 29, 2022
@blyxxyz
Copy link
Collaborator

blyxxyz commented Nov 30, 2022

One reason to use Accept-Encoding: identity is to be able to resume downloads (with --continue). We'd have to figure out how these features interact. Maybe that should be an error message.

This option seems too specific. Maybe we should allow overriding the encoding via a normal header argument and add a --preserve-encoding/--no-decode/--verbatim-body/whatever option? Then you'd be able to run xh -d --preserve-encoding :3000/download/csv/asdf accept-encoding:gzip.

We have to coordinate this with HTTPie, that's where we got the current behavior. Are you interested in making an issue?

@trueb2
Copy link
Author

trueb2 commented Nov 30, 2022

One reason to use Accept-Encoding: identity is to be able to resume downloads (with --continue). We'd have to figure out how these features interact. Maybe that should be an error message.

This option seems too specific. Maybe we should allow overriding the encoding via a normal header argument and add a --preserve-encoding/--no-decode/--verbatim-body/whatever option? Then you'd be able to run xh -d --preserve-encoding :3000/download/csv/asdf accept-encoding:gzip.

We have to coordinate this with HTTPie, that's where we got the current behavior. Are you interested in making an issue?

Sure whatever works. I couldn't figure out how to get HTTPie to do what I wanted as easily as xh. Looks like it wouldn't be too bad to switch this to a --preseve-encoding and a manual header. That still seems to be in the spirit of the CLI.

I don't really care to use --continue for downloads, so I did not consider how that interacts with anything other than identity. I think an error message makes sense if both --continue and --preserve-encoding are used.

I could just implement that instead of making an issue. Is it a problem to have features in xh that aren't in HTTPie?

@ducaale
Copy link
Owner

ducaale commented Nov 30, 2022

Is it a problem to have features in xh that aren't in HTTPie?

We are open to adding features not supported in HTTPie. But we don't want to end up in a situation where the same feature is implemented differently in xh and HTTPie. Many xh users are coming from HTTPie, and some of them might even rely on HTTPie docs while using xh.

Let's either open a new issue or maybe leave a comment on httpie/cli#178, and then we can continue with --preserve-encoding/--no-decode/--verbatim-body/something else.

@trueb2
Copy link
Author

trueb2 commented Dec 5, 2022

@ducaale That is quite the old issue you dug up! I see it is related to #130

I think we could comment on the xh approach in the httpie issue, then implement an approach. It seems like this isn't a very hot feature request, and I don't want it to end up as another issue that goes 9 years with an implementation as easy as this. What is your preferred implementation/flag?

@ducaale
Copy link
Owner

ducaale commented Dec 13, 2022

We can go with the proposed implementation in #294 (comment) and use --preserve-encoding (cURL uses --raw but that option is already used in HTTPie/xh).

Note that we currently don't allow the user to set accept-encoding in download mode (see #141). We can roll back that behavior and error when --continue is used with a custom accept-encoding.

tests/cli.rs Outdated Show resolved Hide resolved
tests/cli.rs Outdated Show resolved Hide resolved
src/cli.rs Outdated Show resolved Hide resolved
@trueb2 trueb2 changed the title Add flag to request and keep encoding for a download body Add --preserve-encoding option to keep response body encoding Jan 3, 2023
@@ -501,6 +509,7 @@ fn run(args: Cli) -> Result<i32> {
printer.print_request_body(&mut request)?;
}

let preserve_encoding = args.preserve_encoding;
Copy link
Owner

@ducaale ducaale Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you move this line to after L503 and replace all instances of args.preserve_encoding with preserve_encoding?

let response_charset = args.response_charset;
let response_mime = args.response_mime.as_deref();
let preserve_encoding = args.preserve_encoding;

Comment on lines +366 to +369
} else {
request_builder =
request_builder.header(ACCEPT_ENCODING, HeaderValue::from_static("identity"));
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the else branching so that ACCEPT_ENCODING is set to a default value when args.resume && encoding != HeaderValue::from_static("identity") is false?

Suggested change
} else {
request_builder =
request_builder.header(ACCEPT_ENCODING, HeaderValue::from_static("identity"));
}
}
request_builder =
request_builder.header(ACCEPT_ENCODING, HeaderValue::from_static("identity"));

This doesn't matter that much since it will be overridden later with header values provided by the user. However, reducing possible branches helps with any potential debugging needed in the future.

Comment on lines +437 to +442
let (may_decode, compression_type) = if !preserve_encoding {
let compression_type = get_compression_type(response.headers());
(true, compression_type)
} else {
(false, None)
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the same approach you took in download.rs?

Suggested change
let (may_decode, compression_type) = if !preserve_encoding {
let compression_type = get_compression_type(response.headers());
(true, compression_type)
} else {
(false, None)
};
let compression_type = if !preserve_encoding {
get_compression_type(response.headers())
} else {
None
};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is possible since the logic below assumes that the content has been decompressed before colorizing, formatting, or decoding. We wouldn't want that logic to run on potentially encoded data. The other branches call code like decode_stream to interpret UTF8 or other format types.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Although we have some logic for binary detection (see decode_blob and BinaryGuard), they might not cover all cases.

At the moment, plain text is treated as binary if preserve_encoding is true. Can you handle that case?

// TODO: rename preserve_encoding to skip_decompression
let (mut body, is_binary) = if preserve_encoding {
    let compression_type = get_compression_type(response.headers());
    let body = decompress(response, None);
    (body, compression_type.is_some())
} else {
    let compression_type = get_compression_type(response.headers());
    let body = decompress(response, compression_type);
    (body, false)
};

Also, the term decode is a bit overloaded in this module. Can we rename this function's preserve_encoding parameter to skip_decompression?

Comment on lines +162 to +168
/// During download, keep the raw encoding of the body. Intended for use with download or
/// to debug an encoded response body.
///
/// For example, set Accept-Encoding: gzip and use --preserve-encoding to skip decompression.
#[clap(long)]
pub preserve_encoding: bool,

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's shorten the first line to fit within ~80 chars. Any extended help can go in subsequent lines.

(to see the difference, try cargo run -- --help and cargo run -- help)

/// Skip decompressing the response body
///
/// longer help

Also, should we move this option to after response_mime to improve discoverability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants