Compat with msgpack-js #6
Comments
|
The only hard question is what do to when encoding a lua string and sending it to a javascript process. The lua string doesn't have any text encoding and could be binary data. It should be sent over msgpack as the raw type since it's the most common, but should the javascript assume it'f utf8 encoded text or assume it's char* data? Going the other way, js strings should be lua strings and js buffers should become "uint8_t[?]" cdata I think? |
|
Regarding NaN, +Inf and -Inf: I used the same conventions as Kengo Nakajima (you may know his lua-msgpack-native). Technically I think we should all try to support standard IEEE floats. +Inf and -Inf are easy since they have a single representation. NaN-s are harder: there are two kinds of NaN-s (QNaN-s and SNaN-s - see standard) and both have several valid binary values. When we pack data we should use QNaN-s; I chose the same as Kenko: Decoding of NaN-s is harder. I think in therory all QNaN-s and SNaN-s should be decoded as NaN. However this is not the case with my current implementation, probably because LuaJIT uses some NaN values internally, so I would have to change my decoding implementation to support these cases. I am not sure I want to do it though because it may result in a speed penalty for the decoding of all floats and doubles. Anyway, the values I have used for NaN are decoded correctly. Regarding strings: MessagePack does not have a "string" type. IMO the raw type should be decoded to the thing used to represent binary data in the target language, converting to a string type if needed is left to the user. I have chosen to decode the raw type to Lua strings (not You may or may not think this is a deficiency in the MessagePack format that it does not have an UTF-8 type. If you want to agree on a convention to store UTF-8 strings you could use some of the reserved prefixes, for instance 0xd8 for |
|
Interesting. I didn't know that Inf, -Inf, and Nan were encoded as part of floats. I understand that only supporting raw as lua strings makes perfect sense for lua, but for JavaScript it's not as clear. For the same reason you say that returning an FFI type to lua users is confusing, returning a Buffer or ArrayBuffer to js users is confusing. Msgpack has almost the same semantics as JSON (except for raw vs strings) and most JSON is unicode strings. In node programs it's very useful to be able to pass either a unicode string or a binary buffer in one side and get the same thing out on the other side. Would you be willing to extend your protocol to decode 0xd8 and 0xd9 as ffi char* arrays and encode cdata types using the same? I agree that the common type (raw) should be represented as lua strings. Also if you add 0xc4 to decode to nil that could be useful. (though it's not critical, I'm not sure how much value there is in knowing the difference between js |
|
These changes only affect reserved prefixes, so I can do that. If the "official" MessagePack spec ever decides to use them for something else I will implement that instead though, but it is unlikely. I will implement these new prefixes when I find the time. |
|
Found the time :) I used |
|
Perfect! |
I maintain a pure javascript implementation of msgpack for node.js and the browser. I've extended the msgpack format to allow the js type
undefinedas well as binary blobs (since js strings are unicode and can't hold arbitrary binary data).I noticed that you extended the format to support
nan,inf, and-inf. Since JavaScript also has these values, (and I want to use your library for my luvit project) I propose we merge our extensions to become compatible.What do you think?
The text was updated successfully, but these errors were encountered: