Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool_writeout_json: fix JSON encoding of non-ascii bytes #12434

Closed
wants to merge 1 commit into from

Conversation

emanuele6
Copy link
Contributor

@emanuele6 emanuele6 commented Dec 1, 2023

When the JSON encoding code was refactored to be reused for {{var:json}} variables were redeclared as char* instead of unsigned char*.

char variables if unspecified can be either signed or unsigned depending on the platform according to the C standard; in most platforms, they are signed.

This meant that the *i<32 was always true for bytes with the top bit set. So they were always getting encoded as \uXXXX, and then since they were also signed negative, they were getting extended with 1s causing '\xe2' to be expanded to \uffffffe2, for example:

$ curl --variable 'v=“' --expand-write-out '{{v:json}}\n' file:///dev/null
\uffffffe2\uffffff80\uffffff9c

I fixed this bug by making the code use explicitly unsigned char* variables instead of char* variables.

I also added an explicit (unsigned) cast for correctness since the curlx_dyn_addf() function takes a va_list, and expects a unsigned while *i is an unsigned char.

Reported by iconoclast_hero on the #bash IRC channel of libera.chat.

@iconoclasthero
Copy link

Cool!

@emanuele6
Copy link
Contributor Author

Actually this has always been bugged, ever since that function was added to support -w %{json}
04c0341

I assumed it used to not be bugged because the jsonString() used by trurl correctly uses unsigned variables, and I thought it was copied from curl.

src/tool_writeout_json.c Outdated Show resolved Hide resolved
@emanuele6
Copy link
Contributor Author

emanuele6 commented Dec 1, 2023

I need to rewrite the test to read data from a file, because it seems cmd.exe is passing \x93 to curl instead of \xe2\x80\x9c on Windows.

  test 0268...[JSON encoding of unicode string]
  
   268: protocol FAILED:
  --- log/check-expected	2023-12-01 01:44:12.016402600 +0000
  +++ log/check-generated	2023-12-01 01:44:12.016402600 +0000
  @@ -2,7 +2,7 @@
   Host: 127.0.0.1:1060[CR][LF]
   User-Agent: curl/8.5.0-DEV[CR][LF]
   Accept: */*[CR][LF]
  -Content-Length: 3[CR][LF]
  +Content-Length: 1[CR][LF]
   Content-Type: application/x-www-form-urlencoded[CR][LF]
   [CR][LF]
  -%e2%80%9c
  +%93

@emanuele6 emanuele6 force-pushed the fixjson branch 2 times, most recently from 82fc413 to 8dbf0e4 Compare December 1, 2023 04:55
@jay
Copy link
Member

jay commented Dec 1, 2023

For windows you may have to read the data from a file instead of the command line. The reason is because in Windows you can build curl with or without unicode support. Even with unicode builds I get weird output with your pr. for example:

> curld --variable "v=®" --expand-write-out "{{v:json}}\n" file:///NUL
®

that should output an ® but it doesn't

edit: this is due to incomplete windows utf-8 support in the curl tool. the hex output is c2 ae 0a. c2 ae is a utf-8 encoded ®. windows unicode builds of curl print write-out output in the local codepage but the arguments are read as utf-8. so it's basically printing ® in utf-8 encoding even though that is not supported (yet).

char variables if unspecified can be either signed or unsigned depending
on the platform according to the C standard; in most platforms, they are
signed.

This meant that the  *i<32  waas always true for bytes with the top bit
set. So they were always getting encoded as \uXXXX, and then since they
were also signed negative, they were getting extended with 1s causing
'\xe2' to be expanded to \uffffffe2, for example:

  $ curl --variable 'v=“' --expand-write-out '{{v:json}}\n' file:///dev/null
  \uffffffe2\uffffff80\uffffff9c

I fixed this bug by making the code use explicitly unsigned char*
variables instead of char* variables.

Reported-by: iconoclasthero
@bagder bagder closed this in 6c7da81 Dec 1, 2023
@bagder
Copy link
Member

bagder commented Dec 1, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants