-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tool_writeout_json: fix JSON encoding of non-ascii bytes #12434
Conversation
Cool! |
Actually this has always been bugged, ever since that function was added to support I assumed it used to not be bugged because the |
I need to rewrite the test to read data from a file, because it seems
|
82fc413
to
8dbf0e4
Compare
For windows you may have to read the data from a file instead of the command line. The reason is because in Windows you can build curl with or without unicode support. Even with unicode builds I get weird output with your pr. for example:
that should output an ® but it doesn't edit: this is due to incomplete windows utf-8 support in the curl tool. the hex output is c2 ae 0a. |
char variables if unspecified can be either signed or unsigned depending on the platform according to the C standard; in most platforms, they are signed. This meant that the *i<32 waas always true for bytes with the top bit set. So they were always getting encoded as \uXXXX, and then since they were also signed negative, they were getting extended with 1s causing '\xe2' to be expanded to \uffffffe2, for example: $ curl --variable 'v=“' --expand-write-out '{{v:json}}\n' file:///dev/null \uffffffe2\uffffff80\uffffff9c I fixed this bug by making the code use explicitly unsigned char* variables instead of char* variables. Reported-by: iconoclasthero
Thanks! |
When the JSON encoding code was refactored to be reused for{{var:json}}
variables were redeclared aschar*
instead ofunsigned char*
.char
variables if unspecified can be either signed or unsigned depending on the platform according to the C standard; in most platforms, they are signed.This meant that the
*i<32
was always true for bytes with the top bit set. So they were always getting encoded as\uXXXX
, and then since they were also signed negative, they were getting extended with 1s causing'\xe2'
to be expanded to\uffffffe2
, for example:I fixed this bug by making the code use explicitly
unsigned char*
variables instead ofchar*
variables.I also added an explicit(unsigned)
cast for correctness since thecurlx_dyn_addf()
function takes ava_list
, and expects aunsigned
while*i
is anunsigned char
.Reported by iconoclast_hero on the #bash IRC channel of libera.chat.