Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data transfer, using a binary transfer protocol #5984

Closed
volkerjaenisch opened this issue Mar 12, 2017 · 6 comments · Fixed by #6906
Closed

Improve data transfer, using a binary transfer protocol #5984

volkerjaenisch opened this issue Mar 12, 2017 · 6 comments · Fixed by #6906

Comments

@volkerjaenisch
Copy link
Contributor

Currently data is transfered between Bokeh-Server and client as a JSON structure.
While JSON is a established format for AJAX, JSON is not capable as high performance data transfer mechanism for scientific applications.

Currently NP-Arrays that could be transfered in binary mode, were base 64 encoded and transfered as text embedded in a JSON structure.
On the JS side the decoding of base64 and a byte wise shuffling into JS typed Arrays takes lot of time and RAM.

As an example we implemented our own binary protocol for data transfer and it is much faster than Bokeh.
Our home-brew-data-structure to transfer data between Bokeh Server and client is crude but at least a order of magnitude faster than Bokeh.

From the user perspective one defines targets and associates data in a meta data dict:

data_map = {
        'source.data.x': x_data,
        'source.data.index': x_data,
        'source.data.y': y_data,
        'source.data.y_above': y_above,
        'source.data.y_below': y_below,
    }

We align it in a binary data structure:

Length of Metadata
Metadata
dataset1
dataset2

  1. The pos/length of all data structures to be submitted is calculated
  2. The length of the meta data is coded into a string with 8 digits. So we beat endianess problems.
  3. The metadata is JSON coded and added binary.
  4. The metadata maps a Target (Plot1.source.data.X) to a Dataset defined by its position and length in the binary data structure.
  5. A Dataset is a binary representation of a numpy array fit for being 1:1 mapped to as JS typed array.

On the JS side we disentangle the metadata first. Then we can map the binary data of the NP-Arrays to their targets JS typed arrays very fast, while respecting endianess.

This works quite well and we like to implement is to Bokeh.

Volker

@bryevdv
Copy link
Member

bryevdv commented Mar 13, 2017

A few quick comments:

At the lowest level, the Bokeh protocol already supports multi-fragment messages, which was done explicitly with the notion in mind of sending zero-copy binary buffers (e.g. NumPy memory views) as separate fragments, directly over websocket to JS typed arrays. However, this capability has not been put to any real use yet. But, you can see the details here e.g.

https://github.com/bokeh/bokeh/blob/master/bokeh/server/protocol/receiver.py#L21-L34
https://github.com/bokeh/bokeh/blob/master/bokeh/server/protocol/message.py#L80-L121

So, I think the plumbing for this already largely exists in Bokeh, and just needs to be put to use in the specific case of the server. We started with a base64 encoding because by itself it already offered a huge improvement over the previous "JSON all the things" approach, and it also worked for both server and standalone docs out of the box

@bryevdv
Copy link
Member

bryevdv commented Apr 12, 2017

Remarking as feature. Recently noticed that a binary protocol is on the ipywidgets 7.0 roadmap, so this is certainly worth pursuing.

@bryevdv
Copy link
Member

bryevdv commented Apr 12, 2017

@volkerjaenisch just checking in, if you have any code you can contribute or share, even if it is just for reference, that would be helpful.

@sandrarum
Copy link

sandrarum commented May 4, 2017

We just released an example using binary data and JS for bokeh. You can have a look at it: https://github.com/sandrarum/inqbus.graphdemo

@bryevdv
Copy link
Member

bryevdv commented May 4, 2017

@sandrarum @volkerjaenisch this is definitely exciting, and I look forward to looking at the code in that repo to see how this approach might be incorporated directly into Bokeh's protocols! Before doing so, may I ask that you add a LICENSE.txt file to the repo that states what license the code is provided under?

@sandrarum
Copy link

License is added

@bryevdv bryevdv modified the milestones: 0.12.7, 0.12.8 Aug 21, 2017
@bryevdv bryevdv mentioned this issue Sep 8, 2017
3 tasks
@bryevdv bryevdv modified the milestones: 0.12.8, 0.12.9 Sep 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants