Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Transfer Performance #77

Open
MikeInnes opened this issue Sep 26, 2016 · 12 comments
Open

Data Transfer Performance #77

MikeInnes opened this issue Sep 26, 2016 · 12 comments
Labels

Comments

@MikeInnes
Copy link
Contributor

MikeInnes commented Sep 26, 2016

I promised a little writeup on this so here goes.

Right now marshalling all data to decimal and back is going to be pretty expensive. In an ideal world the marshalling would just be (the equivalent of):

xs = Float64[1,2,3]
write(io, xs)
ys = reinterpret(Float64, read(io, 3*4))

In fact, you can probably make things even faster by converting the data to Float16 first. JS can actually support this via the newfangled typed array feature. I don't know of an interchange format that supports these kinds of byte arrays directly, but as luck would have it there's a msgpack implementation which adds it on as an extension.

With a little bit of hacking on MsgPack.jl it should be straightforward to use buffers to transfer numerical data to the frontend, which should make that transfer at least 10dB faster as well as reducing memory footprint.

One snag is that there's no general support for typed arrays in Plotly yet (plotly/plotly.js#860). This will mean converting the buffer to a JS array, which will negate the memory usage benefit but hopefully won't be too much of a performance hit.

@shashi
Copy link

shashi commented Sep 27, 2016

This is good thinking, I believe ProtoBuf supports this kind of exchange as well https://developers.google.com/protocol-buffers/ and there's already ProtoBuf.jl, but I think it needs the schema to be pre-defined unlike MsgPack and JSON which are much more flexible.

A WebSocket can either transfer binary xor text - a socket set up to deliver text will not play well with binary data unless you can base64 encode it (in which case, it becomes 4x as big? kind of defeating the purpose). So in other words you need to go all-in with MsgPack with Blink to do this.

@sglyon
Copy link
Member

sglyon commented Sep 27, 2016

@MikeInnes thanks for writing this up.

I don't have the time to personally push this forward right now, but it is definitely something that I'd really like to see. I think it would make interactive usage much smoother plus help with constructing plots with many data points.

@MikeInnes
Copy link
Contributor Author

@spencerlyon2 That's OK, I think Shashi's right that this is going to be happening at the level of Blink (or possibly as part of his exciting WebIO work).

@shashi Protobuf could be an interesting option if we essentially use it to define a JSON schema with buffers, so it's a good idea to consider it. I don't know that it would have any particular advantages over msgpack otherwise, though.

@sglyon
Copy link
Member

sglyon commented Sep 27, 2016

Sounds good to me!

Really excited for WebIO

@sglyon
Copy link
Member

sglyon commented Oct 31, 2017

Hey @MikeInnes and @shashi have either of you thought about or put any time into this effort in Blink, WebIO, or elsewhere?

@MikeInnes
Copy link
Contributor Author

I've not done any web-y work in ages. But since this issue WebIO is pretty solid and represents the latest thinking on all of this stuff.

@shashi
Copy link

shashi commented Oct 31, 2017

Right now, WebIO uses JSON... This is because IJulia only supports text-based web sockets. But I think we can have a trait system where backends like Mux and Atom can indicate they can transfer binary data. cc @JobJob

@sglyon
Copy link
Member

sglyon commented Oct 31, 2017

It would be great to have in any backends (are mux/Atom/Ijulia backends or front ends)

I know the bokeh python/js library recently implemented binary transport for arrays. I believe their primary front end is jupyter, so it might be worthwhile to see if and how they accomplished that

Ref: bokeh/bokeh#6906

@sglyon sglyon added the perf label Dec 13, 2017
@cstjean
Copy link
Contributor

cstjean commented Oct 22, 2019

plotly/plotly.js#2388

Plotly has had support for typed arrays since last year! This could be a huge performance improvement for us.

@sglyon
Copy link
Member

sglyon commented Oct 22, 2019

Absolutely!

@cstjean would you be willing ot take a first pass at it?

@cstjean
Copy link
Contributor

cstjean commented Oct 22, 2019

I wish, but I can't really give this priority any time soon. Do you have some idea about the modifications that it would entail?

@sglyon
Copy link
Member

sglyon commented Oct 22, 2019

To be totally honest, I don't.

I'm in a similar boat without too many cycles to spare in the short run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants