Support overwriting response's mime and charset #184

ducaale · 2021-10-24T14:18:46Z

Ticks another item from #4

blyxxyz

HTTPie crashes (quickly) if it can't parse the encoding.
HTTPie recognizes some encoding names encoding_rs doesn't, notably utf16. (It has to be spelled utf-16.)

First one shouldn't be too hard, just change Args to response_charset: Option<Encoding> and do the parsing earlier.

The second one is tricky:

HTTPie defers to Python (this table), we defer to encoding-rs (this table).
encoding-rs doesn't support all the encodings Python does. That's too bad if you find a server that serves up UTF-7 but there's not much we can do about it.
Python normalizes the names. It seems to ignore case as well as most non-alphanumeric ASCII characters (like - and * and \x7F, but not . for some reason, and not unicode characters). So --response-charset=~~~~UtF////16@@ just works.
Python supports some aliases we don't (and vice versa, but that's less important). For example u8/u16 for UTF-8/UTF-16.

I think we can get quite far by first normalizing it by removing all ASCII non-alphanumeric characters except for _, - and : (lowercasing isn't necessary, encoding-rs does that) and then:

Try it as-is.
Check if it's one of the aliases u8/u16/u32/utf.
Try it with all - and _ removed.
Try it with all _ turned into -.
Try it with all - turned into _.
Try it with all - and _ removed and a - inserted before the first digit.

If all that fails, we can put a link to the table in the error message.

src/printer.rs

ducaale · 2021-10-25T14:52:59Z

src/cli.rs

+
+    match normalized_encoding.as_str() {
+        "u8" | "utf" => return Ok(encoding_rs::UTF_8),
+        "u16" => return Ok(encoding_rs::UTF_16LE),


encoding_rs associates the label utf-16 with UTF_16LE but I am not sure if it is the same in Python.

I also don't think that encoding_rs supports utf-32

On my phone (ARM) it's little-endian:

>>> 'a'.encode('utf_16_be').decode('utf16') '愀' >>> 'a'.encode('utf_16_le').decode('utf16') 'a'

I wouldn't be surprised if it depended on the machine's architecture. x86 is also little-endian so we'd agree on most machines.
In any case, encoding-rs is made for the web, so if it disagrees with Python then it's probably Python which is wrong.

ducaale · 2021-10-25T14:54:23Z

src/cli.rs

+    fn parse_encoding_label() {
+        assert_eq!(
+            parse_encoding("~~~~UtF////16@@").unwrap(),
+            encoding_rs::UTF_16LE
+        );
+        assert_eq!(parse_encoding("utf_8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(parse_encoding("utf8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(parse_encoding("utf-8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(
+            parse_encoding("iso8859_6").unwrap(),
+            encoding_rs::ISO_8859_6
+        );
+    }


Are there any test cases you think are worth adding here?

I'd also try u8 (for the alias), utf16, iso_8859-2:1987 (for the colon), l1, elot-928 (the original has a required underscore), utf_16_be, utf16be, utf-16-be.

And maybe "notreal" and "" to round it off?

I guess that's a lot of cases. You could put them in a const &[(&str, &Encoding)] and test in a loop.

blyxxyz · 2021-10-25T18:03:14Z

src/cli.rs

+    Err(Error::with_description(
+        &format!(
+            "{} is not a supported encoding, please refer to https://encoding.spec.whatwg.org/#names-and-labels\
+             for supported encodings",
+            encoding
+        ),
+        ErrorKind::InvalidValue,
+    ))


This is missing a space after the link, and you get the bold red "error:" twice because structopt wraps another clap error around this one:

error: Invalid value for '--response-charset ': error: utf32 is not a supported encoding, please refer to https://encoding.spec.whatwg.org/#names-and-labelsfor supported encodings

I think an error type of &'static str would work.

There are flags other than --response-charset that are currently returning a clap error instead of a &str. I fix those as well.

blyxxyz · 2021-10-25T18:08:35Z

src/cli.rs

+    fn parse_encoding_label() {
+        assert_eq!(
+            parse_encoding("~~~~UtF////16@@").unwrap(),
+            encoding_rs::UTF_16LE
+        );
+        assert_eq!(parse_encoding("utf_8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(parse_encoding("utf8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(parse_encoding("utf-8").unwrap(), encoding_rs::UTF_8);
+        assert_eq!(
+            parse_encoding("iso8859_6").unwrap(),
+            encoding_rs::ISO_8859_6
+        );
+    }


I'd also try u8 (for the alias), utf16, iso_8859-2:1987 (for the colon), l1, elot-928 (the original has a required underscore), utf_16_be, utf16be, utf-16-be.

And maybe "notreal" and "" to round it off?

I guess that's a lot of cases. You could put them in a const &[(&str, &Encoding)] and test in a loop.

ducaale added 2 commits October 24, 2021 16:03

support HTTPie's --response-charset flag

8abdaf3

support HTTPie's --response-mime flag

06bb6d4

ducaale changed the title ~~Support overriding response's content-type and charset~~ Support overwriting response's mime and charset Oct 24, 2021

ducaale mentioned this pull request Oct 24, 2021

HTTPie feature parity checklist #4

Open

28 tasks

blyxxyz requested changes Oct 24, 2021

View reviewed changes

src/printer.rs Outdated Show resolved Hide resolved

ducaale added 2 commits October 24, 2021 22:08

get rid of the unsafe block

a356dbb

more robust encoding label parsing

2618a6d

ducaale commented Oct 25, 2021

View reviewed changes

ducaale requested a review from blyxxyz October 25, 2021 14:55

blyxxyz requested changes Oct 29, 2021

View reviewed changes

ducaale added 3 commits October 29, 2021 22:42

return anyhow error instead of clap error

e8ed077

add more test cases for parse_encoding

6cfb234

return anyhow error instead of clap error

8e7fd1f

blyxxyz approved these changes Nov 8, 2021

View reviewed changes

ducaale merged commit 1f0d775 into develop Nov 8, 2021

ducaale deleted the custom-encoding branch November 8, 2021 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support overwriting response's mime and charset #184

Support overwriting response's mime and charset #184

ducaale commented Oct 24, 2021

blyxxyz left a comment

ducaale Oct 25, 2021

ducaale Oct 25, 2021

blyxxyz Oct 25, 2021

ducaale Oct 25, 2021

blyxxyz Oct 25, 2021

blyxxyz Oct 25, 2021

ducaale Oct 29, 2021

blyxxyz Oct 25, 2021

Support overwriting response's mime and charset #184

Support overwriting response's mime and charset #184

Conversation

ducaale commented Oct 24, 2021

blyxxyz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment