New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First-class string type in serialization specification #13
Comments
any news on this? |
Packing all strings as raw byte arrays makes it very easy to figure out how to unpack them correctly: as Buffers/byte[]s/.... Libraries could have
instead of |
@andrewschaaf The issue is more along the lines of dealing with cross-system messages. For example one system may have a native in-memory representation of strings as UTF-16, another may user UTF-8 ... since UTF-8 is usually the most effecient, it would make sense to have a string type that is always UTF-8 encoded without a BOM. |
For that matter, you could just put the UTF-8 encoded Byte Order Marker (BOM) at the beginning of your raw data, when reading out, you'll "know" that it's a UTF-8 string. |
The discussion of this issue is just exploding in #121 (And I'll plug http://tools.ietf.org/html/draft-bormann-apparea-bpack here, too.) |
Well, it seems we are continuing the technical discussion in #128 today. |
See the new spec |
Packing all strings as raw byte arrays makes it very difficult to figure out how to unpack them correctly. In particular, it is impossible to know what encoding was used when encoding the string as a sequence of bytes. To address this, it would be nice to have a first-class MSGPACK_OBJECT_STRING type with a mandatory encoding (say, UTF-8).
The text was updated successfully, but these errors were encountered: