Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions when updating plots in new example #4000

Closed
the-neworder opened this issue Mar 7, 2016 · 32 comments
Closed

Exceptions when updating plots in new example #4000

the-neworder opened this issue Mar 7, 2016 · 32 comments

Comments

@the-neworder
Copy link

I created a pretty cool application with bokeh, which shows functionality that is not yet covered by any existing example. The code is attached to this issue and can freely be used as an official example. In the code custom barplots (not charts) for year, month, daily and named data were created. Tapping on a year will trigger a server side select of a month and refreshes the month plot data. The month plot then at the same time selects a day and refreshes the daily data and so on.

You should give it a try it is very cool.

Download and unpack bars_example.zip

Start the bokeh server (iam using bokeh 0.11.1):

bokeh serve

run the example with:

python bars_example.py

Try selecting a year/month/day and see how everything is adapting x_range, y_range, secondary y_range. Also a mechanism was implemented to make it impossible to not have selected any bar. (Try clicking on the white space next to a bar).

The example was created with the goal to be able to continuously add more data and have the barplots adapt accordingly live. E.g. adding data from year 2017 will create the bar 2017 and directly show it if it does not exist yet.

A secondary mode was created which continuously adds days, which will create new month and even years. If new data was created it is directly selected and displayed. You can execute this mode by:

python bars_example.py --follow

The mode works pretty well but if you wait a few minutes the following exception will appear. I tried it on multiple machines .. if you wait long enough the exception always appears.

Sometimes no exception appears and cpu usage by bokeh server is 100% and firefox and chrome usage also goes to 100%, while the plots stop updating.

The Exception that shows up is:

ERROR:bokeh.server.protocol.server_handler:error handling message Message 'PATCH-DOC' (revision 1): KeyError('references',)
DEBUG:bokeh.server.protocol.server_handler:  message header {u'msgid': u'718900fc-f196-4b36-84b9-f1ee7a3654bc', u'msgtype': u'PATCH-DOC'} content {}
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/bokeh/server/protocol/server_handler.py", line 38, in handle
    work = yield handler(message, connection)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1008, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1017, in run
    yielded = self.gen.send(value)
  File "/usr/local/lib/python2.7/dist-packages/bokeh/server/session.py", line 42, in _needs_document_lock_wrapper
    result = yield yield_for_all_futures(func(self, *args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/bokeh/server/session.py", line 212, in _handle_patch
    message.apply_to_document(self.document)
  File "/usr/local/lib/python2.7/dist-packages/bokeh/server/protocol/messages/patch_doc.py", line 85, in apply_to_document
    doc.apply_json_patch(self.content)
  File "/usr/local/lib/python2.7/dist-packages/bokeh/document.py", line 836, in apply_json_patch
    references_json = patch['references']
KeyError: 'references'

And sometimes directly after starting the program:

ERROR:bokeh.server.views.ws:Bokeh Server protocol error: No 'msgtype'
@benoit9126
Copy link

Hi everybody,

I have the same error with a personal application. I regularly update some plots (by updating the underlying ColumnDataSource) and my bokeh server crashes with the following log:

DEBUG:bokeh.server.tornado:Allowed Host headers: ['localhost:5006']
DEBUG:bokeh.server.tornado:These host origins can connect to the websocket: ['localhost:5006']
DEBUG:bokeh.server.tornado:Patterns are: [('/?', <class 'bokeh.server.views.doc_handler.DocHandler'>, {'bokeh_websocket_path': '/ws', 'application_context': <bokeh.server.application_context.ApplicationContext object at 0x000000000784FEF0>}), ('/ws', <class 'bokeh.server.views.ws.WSHandler'>, {'bokeh_websocket_path': '/ws', 'application_context': <bokeh.server.application_context.ApplicationContext object at 0x000000000784FEF0>}), ('/autoload.js', <class 'bokeh.server.views.autoload_js_handler.AutoloadJsHandler'>, {'bokeh_websocket_path': '/ws', 'application_context': <bokeh.server.application_context.ApplicationContext object at 0x000000000784FEF0>}), ('/static/(.*)', <class 'bokeh.server.views.static_handler.StaticHandler'>)]
INFO:bokeh.command.subcommands.serve:Starting Bokeh server on port 5006 with applications at paths ['/']
INFO:bokeh.server.views.ws:WebSocket connection opened
DEBUG:bokeh.server.views.ws:Receiver created for Protocol('1.0')
DEBUG:bokeh.server.views.ws:ServerHandler created for Protocol('1.0')
INFO:bokeh.server.views.ws:ServerConnection created
DEBUG:bokeh.server.session:pushing doc to session 'yUu48Kc0uM5Fnd51uONPcGBdjBxgDTSskD0WqjrPX28B'
INFO:tornado.access:200 GET /?bokeh-session-id=yUu48Kc0uM5Fnd51uONPcGBdjBxgDTSskD0WqjrPX28B (127.0.0.1) 7.00ms
INFO:bokeh.server.views.ws:WebSocket connection opened
DEBUG:bokeh.server.views.ws:Receiver created for Protocol('1.0')
DEBUG:bokeh.server.views.ws:ServerHandler created for Protocol('1.0')
INFO:bokeh.server.views.ws:ServerConnection created
DEBUG:bokeh.server.session:Sending pull-doc-reply from session 'yUu48Kc0uM5Fnd51uONPcGBdjBxgDTSskD0WqjrPX28B'
DEBUG:bokeh.server.tornado:[pid 6780] 2 clients connected
DEBUG:bokeh.server.tornado:[pid 6780] 2 clients connected
ERROR:bokeh.server.protocol.server_handler:error handling message Message 'PATCH-DOC' (revision 1): KeyError('references',)
DEBUG:bokeh.server.protocol.server_handler:  message header {'msgtype': 'PATCH-DOC', 'msgid': '4bca87ea-e13e-44bc-8f72-aa7212f435a2'} content {}
Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\bokeh\server\protocol\server_handler.py", line 38, in handle
    work = yield handler(message, connection)
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1008, in run
    value = future.result()
  File "C:\Anaconda3\lib\site-packages\tornado\concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "C:\Anaconda3\lib\site-packages\tornado\gen.py", line 1017, in run
    yielded = self.gen.send(value)
  File "C:\Anaconda3\lib\site-packages\bokeh\server\session.py", line 42, in _needs_document_lock_wrapper
    result = yield yield_for_all_futures(func(self, *args, **kwargs))
  File "C:\Anaconda3\lib\site-packages\bokeh\server\session.py", line 212, in _handle_patch
    message.apply_to_document(self.document)
  File "C:\Anaconda3\lib\site-packages\bokeh\server\protocol\messages\patch_doc.py", line 80, in apply_to_document
    doc.apply_json_patch(self.content)
  File "C:\Anaconda3\lib\site-packages\bokeh\document.py", line 815, in apply_json_patch
    references_json = patch['references']
KeyError: 'references'
ERROR:bokeh.server.protocol:Bad header with no msgtype was: {'references': [], 'events': [{'new': {'color_b': ['#ffffbf', '#ffffff', '#ffffbf', '#ffffff', '#ffffbf'], 'radius_c': [20.0, 20.0, 20.0, 20.0, 20.0], 'voltage_a': [230.0, 229.97965629991504, 'NaN', 'NaN', 230.04979730787235], 'color_a': ['#ffffbf', '#ffffbf', '#ffffff', '#ffffff', '#ffffbf'], 'color_c': ['#ffffbf', '#ffffff', '#ffffff', '#ffffbf', '#ffffff'], 'text': ['1', '2', '3', '4', '5, 6'], 'voltage_c': [230.0, 'NaN', 'NaN', 229.98484834643682, 'NaN'], 'voltage_b': [230.0, 'NaN', 229.8752867995332, 'NaN', 229.80124330585198], 'radius_a': [40.0, 40.0, 40.0, 40.0, 40.0], 'radius_b': [30.0, 30.0, 30.0, 30.0, 30.0], 'id_eb': [1, 3, 4, 6, 7], 'y': [5, 3, 3, 1, 1], 'x': [2, 1, 3, 1, 3]}, 'model': {'id': '8ca74995-5d53-40ef-a74e-41a12dd5500c', 'type': 'ColumnDataSource'}, 'attr': 'data', 'kind': 'ModelChanged'}]}
ERROR:bokeh.server.views.ws:Bokeh Server protocol error: No 'msgtype' in header, closing connection
ERROR:bokeh.server.protocol:Bad header with no msgtype was: {'references': [], 'events': [{'new': {'color_b': ['#ffffbf', '#ffffff', '#ffffbf', '#ffffff', '#ffffbf'], 'radius_c': [20.0, 20.0, 20.0, 20.0, 20.0], 'voltage_a': [230.0, 229.97965629991504, 'NaN', 'NaN', 230.04979730787235], 'color_a': ['#ffffbf', '#ffffbf', '#ffffff', '#ffffff', '#ffffbf'], 'color_c': ['#ffffbf', '#ffffff', '#ffffff', '#ffffbf', '#ffffff'], 'text': ['1', '2', '3', '4', '5, 6'], 'voltage_c': [230.0, 'NaN', 'NaN', 229.98484834643682, 'NaN'], 'voltage_b': [230.0, 'NaN', 229.8752867995332, 'NaN', 229.80124330585198], 'radius_a': [40.0, 40.0, 40.0, 40.0, 40.0], 'radius_b': [30.0, 30.0, 30.0, 30.0, 30.0], 'id_eb': [1, 3, 4, 6, 7], 'y': [5, 3, 3, 1, 1], 'x': [2, 1, 3, 1, 3]}, 'model': {'id': '8ca74995-5d53-40ef-a74e-41a12dd5500c', 'type': 'ColumnDataSource'}, 'attr': 'data', 'kind': 'ModelChanged'}]}
ERROR:bokeh.server.views.ws:Bokeh Server protocol error: No 'msgtype' in header, closing connection
ERROR:bokeh.server.protocol:Bad header with no msgtype was: {'references': [], 'events': [{'new': {'color_b': ['#ffffbf', '#ffffff', '#ffffbf', '#ffffff', '#ffffbf'], 'radius_c': [20.0, 20.0, 20.0, 20.0, 20.0], 'voltage_a': [230.0, 229.97965629991504, 'NaN', 'NaN', 230.04979730787235], 'color_a': ['#ffffbf', '#ffffbf', '#ffffff', '#ffffff', '#ffffbf'], 'color_c': ['#ffffbf', '#ffffff', '#ffffff', '#ffffbf', '#ffffff'], 'text': ['1', '2', '3', '4', '5, 6'], 'voltage_c': [230.0, 'NaN', 'NaN', 229.98484834643682, 'NaN'], 'voltage_b': [230.0, 'NaN', 229.8752867995332, 'NaN', 229.80124330585198], 'radius_a': [40.0, 40.0, 40.0, 40.0, 40.0], 'radius_b': [30.0, 30.0, 30.0, 30.0, 30.0], 'id_eb': [1, 3, 4, 6, 7], 'y': [5, 3, 3, 1, 1], 'x': [2, 1, 3, 1, 3]}, 'model': {'id': '8ca74995-5d53-40ef-a74e-41a12dd5500c', 'type': 'ColumnDataSource'}, 'attr': 'data', 'kind': 'ModelChanged'}]}

I am using Bokeh 0.11.0.

@the-neworder
Copy link
Author

I personally think that some kind of race condition in the bokeh server is occuring. It seems that if you update a plot from the server side, the javascript realizes that something has changed and triggers an 'on change' to make the server aware that something happened. Kind of a vicious circle. Most of the time this thing is properly handled by the server but sometimes it even leads to an endless loop between the browser and the bokeh server like I pointed out in my initial post.

@Joe136
Copy link

Joe136 commented Mar 10, 2016

Hey guys, I also experienced exactly the same issue a few month ago and it actually prevented bokeh from being used in our company in a production environment.

@the-neworder
Copy link
Author

Hey Joe I suggest next time you encounter such an issue you better directly post it here since it could have been fixed by now. In my experience the bokeh community is pretty fast dealing with issues especially @bryevdv is very active.

@the-neworder
Copy link
Author

@benoit9126 when exactly does your Exception occur? Does it happen when you update multiple ColumnDataSources on the server side at the same time? This is what I am doing.

@benoit9126
Copy link

@the-neworder : yes exactly. It happens with a plot which is built from two ColumnDataSource that I consecutively update on the the server side with a command like

my_column_data_source.data.update(**my_new_data)

@the-neworder
Copy link
Author

Ah I think we are getting to the core of the problem. In my example from above I have three ColumnDataSources per plot (one for the rects, one for the green line and one for the baseline) all of them get updated. Also I do not only have one but four plots so 4*3 data sources which can get updated at the same time.

@benoit9126
Copy link

In my case, I have two plots. Each of them has two ColumnDataSource. I plot three renderers (circles) from the first ColumnDataSource and four others (multi-lines) from the second ColumnDataSource. The two plots are gathered in a GridPlot.
It seems that there is a conflict between two consecutive patches but I do not know why the missing field called references is not built properly in such a situation...

@bryevdv
Copy link
Member

bryevdv commented Mar 10, 2016

@the-neworder thanks very much for the kind words, I certainly try to do my best to be responsive. I will admit as the interest in Bokeh has recently grown this has become more of a challenge. (The scaling properties of individuals are not great!) I'm excited to see some new people interested in more significant development involvement starting to show up.

Anyway, there is definitely code specifically in place to prevent propagation of updates resulting from the latest patch message, to prevent ping-ponging. It's possible there is a situation that slips through, of course. Or perhaps it is something else. The first step to being able to fix this, and also put in tests to ensure it does not creep back in, is to find a way to reliably reproduce the problem. Any help along these lines would be extremely useful and appreciated. I do think the two of you have narrowed it down nicely, which is a great start.

@benoit9126
Copy link

@bryevdv Here is an example which fails after a few tenth of seconds.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
from threading import Thread

import numpy as np
from bokeh.client import push_session
from bokeh.document import Document
from bokeh.models import ColumnDataSource
from bokeh.plotting import Figure


class Test(Thread):
    def __init__(self, data1, data2):
        super(Test, self).__init__(daemon=True)
        self.data1 = data1
        self.data2 = data2
        self.cds1 = ColumnDataSource(data=data1)
        self.cds2 = ColumnDataSource(data=data2)
        self.fig = Figure(plot_width=500, plot_height=500, toolbar_location='left')
        self.fig.circle_x(source=self.cds1, x='x', y='y', size='size')
        self.fig.circle_cross(source=self.cds2, x='x', y='y', size='size')

    def run(self):
        while True:
            print('ok')
            self.data1['y'] = np.roll(self.data1['y'], 1)
            self.data2['y'] = np.roll(self.data2['y'], 1)
            self.data1['size'] = np.random.random(10) * 20
            self.data2['size'] = np.random.random(10) * 20
            self.cds1.data.update(**self.data1)
            self.cds2.data.update(**self.data2)
            time.sleep(0.1)


data1 = dict(x=np.arange(10), y=np.arange(10), size=np.random.random(10) * 20)
data2 = dict(x=np.arange(10), y=np.arange(10, 0, -1), size=np.random.random(10) * 20)

test = Test(data1=data1, data2=data2)

doc = Document()
doc.add_root(test.fig)
session = push_session(document=doc)
session.show()

test.start()
session.loop_until_closed()

It is not beautiful but it does the work. 🎉

Update: if you try to zoom with the wheel, it sometimes trigger the bug.

@the-neworder
Copy link
Author

@benoit9126 your example is not executable unless i remove daemon=True.
i then did

test.daemon = True

also your code does not produce the error on my machine

@benoit9126
Copy link

I am using python 3.4 and the daemon argument has been added in python 3.3.

This example is pretty unpredictable. Sometimes, it works fine and sometimes it fails (generally quickly)...

@the-neworder
Copy link
Author

iam also trying to create an example but need some time to make it small

@benoit9126
Copy link

I think the quantity of data is a key point...

@the-neworder
Copy link
Author

@benoit9126 can you try my example?
bars_minimal_error.zip

execute

python bars_example_minimal.py --follow

iam using python 2.7
if it works i will try to clean it up
for me this fails < 1 sec

@bryevdv
Copy link
Member

bryevdv commented Mar 10, 2016

You guys are great :) I will commit to running both of these today and try confirm if they reproduce the problem.

Also ping @havocp just in case you have any quick thoughts. I immediately am suspicious that both examples above seem to use threads, and AFAIK none of our examples do. Is there an explicit document lock that ought to be acquired at some place?

@the-neworder
Copy link
Author

I will try to create a threadless version which uses periodic callback

@benoit9126
Copy link

@the-neworder Your code fails more than mine which is good!
@bryevdv In my application, the following threads/processes are created:

  1. The main process starts a bokeh server in an other process using multiprocessing library. (it does not seem to be the cause of the problem)
  2. The main thread (of the main process) creates a thread which updates the ColumnDataSource
  3. The main thread (of the main process) executes session.loop_until_closed()

So there is only one thread which modifies the data...

@the-neworder
Copy link
Author

ok it seems that if data is added through add_periodic_callback, that this synchronized everything a lot better. I cannot get it to fail anymore even if I do:

document.add_periodic_callback(cb, 1)

also it still works if I do:

x = threading.Thread(target=session.loop_until_closed)
x.daemon = True
x.start()

time.sleep(100)

instead of

session.loop_until_closed()

To summarize it:

We get this error independet of using threads if we modify the datasource outside of the periodic callback.

@havocp
Copy link
Contributor

havocp commented Mar 10, 2016

I haven't reviewed the whole discussion, but item 2 above is not allowed; you can't loop in one thread and update data in another. All touching of the document (or any part of bokeh really) needs to be in the ioloop thread. To get data from the worker thread to the ioloop thread, you could add a next tick callback; I'm not sure if we've actually made it safe to add a next tick callback from a thread but we should if not. That's the ONLY operation that tornado itself supports doing from another thread.

Another possible approach is to hide the thread behind a function which returns a future (aka a coroutine) and then call that from the main ioloop thread and handle the future's result when it arrives (in the main thread).

tldr - there are absolutely no thread locks or thread safety attempts in Bokeh that I'm aware of. It is categorically not thread safe.

Doesn't mean threads are the cause of that exception, but I'd expect attempts to use bokeh from multiple threads to explode in all sorts of ways.

@havocp
Copy link
Contributor

havocp commented Mar 10, 2016

fwiw the exceptions here look like corrupted protocol which could certainly happen if two threads both try to modify models (or one touches a model while another receives messages from the client). In that case you have uncontrolled random interleaving of reads/writes from the websocket.

@the-neworder
Copy link
Author

hey @havocp thx for clarification. I now changed my initial example and i can now add data every millisecond and update 4 plots at the same time and it never fails. Probably it should be strongly highlighted somewhere that modifying the data outside of the ioloop thread leads to exceptions. The first thing that i did was to put the loop_until_closed into a thread and I bet a lot of other also do.

@bryevdv
Copy link
Member

bryevdv commented Mar 10, 2016

Yes there will absolutely be documentation support added as a result of this (very productive) discussion. I'm also trying to investigate the next tick callback a bit more, I have not used it but an example might go a long way as well.

@bryevdv
Copy link
Member

bryevdv commented Mar 10, 2016

For reference here is a link to Tornado's add_callback docs:

http://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop.add_callback

It explicitly mentions that it "may be used to transfer control from other threads to the IOLoop‘s thread." so perhaps the task is to make an appropriately simple (or at least predictable and documentable) pattern for users to follow.

@havocp
Copy link
Contributor

havocp commented Mar 10, 2016

fwiw I think you could put loop_until_closed in a thread, as long as every other use of Bokeh is also in that same thread. But kind of just a technicality ;-) not useful.

The deal with next tick callback is just that we have our own data structures around it (on Document and ClientSession / ServerSession) which would need locking to make document.add_next_tick_callback threadsafe. If we add that locking, then I think Tornado supports adding a next tick callback from another thread.

In the meantime as a workaround, it would probably work to use Tornado directly to call IOLoop.add_callback and get into the ioloop thread.

Check out the docs at http://tornadokevinlee.readthedocs.org/en/latest/ioloop.html#tornado.ioloop.IOLoop.add_callback

"It is safe to call this method from any thread at any time, except from a signal handler. Note that this is the only method in IOLoop that makes this thread-safety guarantee; all other interaction with the IOLoop must be done from that IOLoop‘s thread. add_callback() may be used to transfer control from other threads to the IOLoop‘s thread."

So only next tick callbacks can be added from another thread - but of course you can always add a next tick callback which in turn adds another kind of callback or does anything else. The next tick callback is just a way to move code into the ioloop thread.

For most applications I think the better option may be to use futures as discussed here:
https://groups.google.com/a/continuum.io/d/msg/bokeh/qVVfCUu_eow/cDZjQcVbBQAJ

We really should have a nice API for that like the server_context.add_session_updater or similar suggested in that post.

Another useful link perhaps about ways to do concurrency/threads with tornado:
https://github.com/tornadoweb/tornado/wiki/Threading-and-concurrency

@benoit9126
Copy link

Thanks a lot for all this documentation. I have never used Tornado IOLoop directly. Tomorrow, I will try to use the add_next_tick_callback method to "plan" data modification instead of roughly modifying them.

@the-neworder
Copy link
Author

Indeed a very productive discussion. I know now how to proceed. Thx a lot.

@benoit9126
Copy link

It seems to work fine with add_next_tick_callback. 👍 Thanks a lot.

@bryevdv
Copy link
Member

bryevdv commented Mar 11, 2016

Awesome! OK I will aim to write up some docs while I am traveling early next week. If there is a small-ish self-contained example that anyone would like to contribute or have added, that would certainly be good. A few questions:

@havocp would it work to just wrap (internally) all callback additions in a next tick callback that actually makes the IOloop call to add the real callback? If I understand this could be a way to allow callback adding methods to be called from any thread? Are there any downsides?

If that is not workable, @the-neworder @benoit9126 do y'all have any ideas or suggestions about either a thin convenience APIs or common patterns we could endorse and promote?

@bryevdv
Copy link
Member

bryevdv commented Jun 25, 2016

I think this can be closed, there is an example in the server docs.

@bryevdv bryevdv closed this as completed Jun 25, 2016
@damianavila damianavila added this to the 0.12 milestone Jun 28, 2016
@OttoAbil
Copy link

Which is example on the server is this referring to? Thanks a lot for making this contribution!

@bryevdv
Copy link
Member

bryevdv commented Jan 23, 2018

https://bokeh.pydata.org/en/latest/docs/user_guide/server.html#updating-from-threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants