-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve data transfer, using a binary transfer protocol #5984
Comments
A few quick comments: At the lowest level, the Bokeh protocol already supports multi-fragment messages, which was done explicitly with the notion in mind of sending zero-copy binary buffers (e.g. NumPy memory views) as separate fragments, directly over websocket to JS typed arrays. However, this capability has not been put to any real use yet. But, you can see the details here e.g. https://github.com/bokeh/bokeh/blob/master/bokeh/server/protocol/receiver.py#L21-L34 So, I think the plumbing for this already largely exists in Bokeh, and just needs to be put to use in the specific case of the server. We started with a base64 encoding because by itself it already offered a huge improvement over the previous "JSON all the things" approach, and it also worked for both server and standalone docs out of the box |
Remarking as |
@volkerjaenisch just checking in, if you have any code you can contribute or share, even if it is just for reference, that would be helpful. |
We just released an example using binary data and JS for bokeh. You can have a look at it: https://github.com/sandrarum/inqbus.graphdemo |
@sandrarum @volkerjaenisch this is definitely exciting, and I look forward to looking at the code in that repo to see how this approach might be incorporated directly into Bokeh's protocols! Before doing so, may I ask that you add a |
License is added |
Currently data is transfered between Bokeh-Server and client as a JSON structure.
While JSON is a established format for AJAX, JSON is not capable as high performance data transfer mechanism for scientific applications.
Currently NP-Arrays that could be transfered in binary mode, were base 64 encoded and transfered as text embedded in a JSON structure.
On the JS side the decoding of base64 and a byte wise shuffling into JS typed Arrays takes lot of time and RAM.
As an example we implemented our own binary protocol for data transfer and it is much faster than Bokeh.
Our home-brew-data-structure to transfer data between Bokeh Server and client is crude but at least a order of magnitude faster than Bokeh.
From the user perspective one defines targets and associates data in a meta data dict:
We align it in a binary data structure:
Length of Metadata
Metadata
dataset1
dataset2
On the JS side we disentangle the metadata first. Then we can map the binary data of the NP-Arrays to their targets JS typed arrays very fast, while respecting endianess.
This works quite well and we like to implement is to Bokeh.
Volker
The text was updated successfully, but these errors were encountered: