Skip to content
This repository has been archived by the owner on Jan 30, 2023. It is now read-only.

Compat with msgpack-js #6

Closed
creationix opened this issue Aug 27, 2012 · 6 comments
Closed

Compat with msgpack-js #6

creationix opened this issue Aug 27, 2012 · 6 comments

Comments

@creationix
Copy link

I maintain a pure javascript implementation of msgpack for node.js and the browser. I've extended the msgpack format to allow the js type undefined as well as binary blobs (since js strings are unicode and can't hold arbitrary binary data).

I noticed that you extended the format to support nan, inf, and -inf. Since JavaScript also has these values, (and I want to use your library for my luvit project) I propose we merge our extensions to become compatible.

What do you think?

@creationix
Copy link
Author

The only hard question is what do to when encoding a lua string and sending it to a javascript process. The lua string doesn't have any text encoding and could be binary data. It should be sent over msgpack as the raw type since it's the most common, but should the javascript assume it'f utf8 encoded text or assume it's char* data?

Going the other way, js strings should be lua strings and js buffers should become "uint8_t[?]" cdata I think?

@catwell
Copy link
Owner

catwell commented Aug 28, 2012

Regarding NaN, +Inf and -Inf: I used the same conventions as Kengo Nakajima (you may know his lua-msgpack-native).

Technically I think we should all try to support standard IEEE floats. +Inf and -Inf are easy since they have a single representation. NaN-s are harder: there are two kinds of NaN-s (QNaN-s and SNaN-s - see standard) and both have several valid binary values. When we pack data we should use QNaN-s; I chose the same as Kenko: 0xff880000 for floats and 0xfff8000000000000 for doubles.

Decoding of NaN-s is harder. I think in therory all QNaN-s and SNaN-s should be decoded as NaN. However this is not the case with my current implementation, probably because LuaJIT uses some NaN values internally, so I would have to change my decoding implementation to support these cases. I am not sure I want to do it though because it may result in a speed penalty for the decoding of all floats and doubles. Anyway, the values I have used for NaN are decoded correctly.

Regarding strings: MessagePack does not have a "string" type. IMO the raw type should be decoded to the thing used to represent binary data in the target language, converting to a string type if needed is left to the user.

I have chosen to decode the raw type to Lua strings (not uint8_t[?]), which is the type used in plain Lua for binary data, because I do not want to confuse users by returning a FFI type (it is not usually expected from a library).

You may or may not think this is a deficiency in the MessagePack format that it does not have an UTF-8 type. If you want to agree on a convention to store UTF-8 strings you could use some of the reserved prefixes, for instance 0xd8 for utf-8 16 and 0xd9 for utf-8 32. That being said Lua does not have the concept of a unicode string so that would not change much...

@creationix
Copy link
Author

Interesting. I didn't know that Inf, -Inf, and Nan were encoded as part of floats.

I understand that only supporting raw as lua strings makes perfect sense for lua, but for JavaScript it's not as clear. For the same reason you say that returning an FFI type to lua users is confusing, returning a Buffer or ArrayBuffer to js users is confusing. Msgpack has almost the same semantics as JSON (except for raw vs strings) and most JSON is unicode strings.

In node programs it's very useful to be able to pass either a unicode string or a binary buffer in one side and get the same thing out on the other side.

Would you be willing to extend your protocol to decode 0xd8 and 0xd9 as ffi char* arrays and encode cdata types using the same? I agree that the common type (raw) should be represented as lua strings.

Also if you add 0xc4 to decode to nil that could be useful. (though it's not critical, I'm not sure how much value there is in knowing the difference between js null and undefined. I might just change my encoder to encode undefined as msgpack Nil.

@catwell
Copy link
Owner

catwell commented Aug 28, 2012

These changes only affect reserved prefixes, so I can do that. If the "official" MessagePack spec ever decides to use them for something else I will implement that instead though, but it is unlikely.

I will implement these new prefixes when I find the time.

@catwell
Copy link
Owner

catwell commented Aug 28, 2012

Found the time :) I used unsigned char* instead of char * for the buffer type. Tell me if you're OK with this (reopen if needed).

@creationix
Copy link
Author

Perfect!

> msgpack = require('msgpack-js')                                                                                                                                                   
{ encode: [Function],                                                                                                                                                               
  decode: [Function: decode] }                                                                                                                                                      
> b = msgpack.encode([null, undefined, 0, 1, "Hello World", new Buffer("Hello World")])                                                                                             
<Buffer 96 c0 c4 00 01 ab 48 65 6c 6c 6f 20 57 6f 72 6c 64 d8 00 0b 48 65 6c 6c 6f 20 57 6f 72 6c 64>                                                                               
> require('fs').writeFileSync('message.msgpack', b)                                                                                                                                     
Welcome to the Luvit repl                                                                                                                                                           
> b = require('fs').readFileSync('message.msgpack')                                                                                                                                 
> msgpack = require('./luajit-msgpack-pure')                                                                                                                                        
> msgpack.unpack(b)                                                                                                                                                                 
31      { [3] = 0, [4] = 1, [5] = "Hello World", [6] = cdata<unsigned char [?]>: 0x40884030 }

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants