Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help with serving JSON with UTF8 string property #302

Closed
mosolovsa opened this issue Dec 16, 2021 · 2 comments · Fixed by #304
Closed

Need help with serving JSON with UTF8 string property #302

mosolovsa opened this issue Dec 16, 2021 · 2 comments · Fixed by #304
Labels
bug Something isn't working

Comments

@mosolovsa
Copy link

mosolovsa commented Dec 16, 2021

Hi! I need help with serving JSON with UTF8 string.

Currently as far as I understood json response serialized by crow::json::dump_internal:

inline void dump_internal(const wvalue& v, std::string& out) const

which in turn calls crow::json::escape for string:

case type::String: dump_string(v.s, out); break;

inline void dump_string(const std::string& str, std::string& out) const

inline void escape(const std::string& str, std::string& ret)

Commit cdd6139 removed 0 <= c && and changed char to unsigned char, so the logic stays the same - escape invisible chars 0 < ch < 32.
Commit df41cbe by @lcsdavid changes unsigned char to auto (e.g. char).

char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type).

I'm not sure what exactly problem that was supposed to solve, but now not only invisible chars are escaped now (0 < char < 0x20 https://www.asciitable.com/), but the Unicode sequences may be escaped too from now on (if auto -> char -> signed char is true on this architecture).

All bytes of multibyte utf8 codepoints contain the most significant bit on (e.g. 0x80), so signed char with the leading bit on is always negative for a two's complement (almost any architecture), and ch < 0x20 would be now true for any Unicode symbol.
https://en.wikipedia.org/wiki/UTF-8#Encoding

The original project took solution to store UTF8 sequences in std::string:
ipkn/crow#189

But with the mentioned commit that solution can't be applied.

Middleware just adds UTF8 headers to text and I'm not sure if the middleware is the right place to cancel mentioned escapes.
#202

I have almost no grasp at the codebase, but for me, it seems like it would be nice to have a customization point for defining escape function somehow or to introduce a new JSON value type raw_string that would not be escaped later. Maybe default

I could make a PR with high-level guidance about what may be acceptable in this situation.

For now, I just revert to unsigned char and that's totally fine for me, could someone kindly explain why that's wrong? And elaborating on how it must be done would be even greater! Thanks!

@The-EDev
Copy link
Member

First off thanks for the detailed description. I really appreciate it.

could someone kindly explain why that's wrong?

The problem was that using unsigned char on Windows would cause incorrect characters to be dumped when using multi-byte characters.

Since the problem seems to be with values lower than 0, we can just change the check from c < 0x20 to c > 0 && c < 0x20. Would that work?

mosolovsa pushed a commit to mosolovsa/Crow that referenced this issue Dec 17, 2021
CrowCpp#302
Escape only the invisible characters from 0 to 31 inclusive
Motivation: do not escape UTF8 encoding bytes
@mosolovsa
Copy link
Author

Thanks for the quick reply and clarification!

Since the problem seems to be with values lower than 0, we can just change the check from c < 0x20 to c > 0 && c < 0x20. Would that work?

Yes, that's fit my use case. I've send the PR #304
And I believe we want to escape the null-symbol too since it's invisible, so it's c >= 0 && c < 0x20.

@The-EDev The-EDev added the bug Something isn't working label Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants