tool_writeout_json: fix JSON encoding of non-ascii bytes#12434
tool_writeout_json: fix JSON encoding of non-ascii bytes#12434emanuele6 wants to merge 1 commit into
Conversation
|
Cool! |
|
Actually this has always been bugged, ever since that function was added to support I assumed it used to not be bugged because the |
|
I need to rewrite the test to read data from a file, because it seems |
82fc413 to
8dbf0e4
Compare
|
For windows you may have to read the data from a file instead of the command line. The reason is because in Windows you can build curl with or without unicode support. Even with unicode builds I get weird output with your pr. for example: that should output an ® but it doesn't edit: this is due to incomplete windows utf-8 support in the curl tool. the hex output is c2 ae 0a. |
char variables if unspecified can be either signed or unsigned depending
on the platform according to the C standard; in most platforms, they are
signed.
This meant that the *i<32 waas always true for bytes with the top bit
set. So they were always getting encoded as \uXXXX, and then since they
were also signed negative, they were getting extended with 1s causing
'\xe2' to be expanded to \uffffffe2, for example:
$ curl --variable 'v=“' --expand-write-out '{{v:json}}\n' file:///dev/null
\uffffffe2\uffffff80\uffffff9c
I fixed this bug by making the code use explicitly unsigned char*
variables instead of char* variables.
Reported-by: iconoclasthero
|
Thanks! |
When the JSON encoding code was refactored to be reused for{{var:json}}variables were redeclared aschar*instead ofunsigned char*.charvariables if unspecified can be either signed or unsigned depending on the platform according to the C standard; in most platforms, they are signed.This meant that the
*i<32was always true for bytes with the top bit set. So they were always getting encoded as\uXXXX, and then since they were also signed negative, they were getting extended with 1s causing'\xe2'to be expanded to\uffffffe2, for example:I fixed this bug by making the code use explicitly
unsigned char*variables instead ofchar*variables.I also added an explicit(unsigned)cast for correctness since thecurlx_dyn_addf()function takes ava_list, and expects aunsignedwhile*iis anunsigned char.Reported by iconoclast_hero on the #bash IRC channel of libera.chat.