New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing memory consumption of bokeh server #7468

Closed
LEMUEGGE opened this Issue Jan 29, 2018 · 24 comments

Comments

Projects
None yet
4 participants
@LEMUEGGE

LEMUEGGE commented Jan 29, 2018

mprof

After noticing, that the memory was not released when the session document was destroyed, I tested it with above memoryprofiler on the bokeh example app slider.py and it seems that the memory is not released. (similar to #6744)

My packages are:
asn1crypto 0.22.0 py36h265ca7c_1
biopython 1.69 np113py36_0 anaconda
bleach 2.0.0 py36h688b259_0
bokeh 0.12.13 py36h2f9c1c0_0
bzip2 1.0.6 3 anaconda
ca-certificates 2017.08.26 h1d4fec5_0
certifi 2017.11.5 py36hf29ccca_0
cffi 1.10.0 py36had8d393_1
chardet 3.0.4 py36h0f667ec_1
click 6.7 py36h5253387_0
cloudpickle 0.4.0 py36h30f8c20_0
colorcet 1.0.0 py36h86301b6_0 bokeh
conda 4.4.7 py36_0
conda-env 2.6.0 h36134e3_1
cryptography 2.0.3 py36ha225213_1
cycler 0.10.0 py36h93f1223_0
dask 0.15.4 py36h31fc154_0
dask-core 0.15.4 py36h7045e13_0
datashader 0.6.2 py36_0 bokeh
datashape 0.5.4 py36h3ad6b5c_0
dbus 1.10.22 h3b5a359_0
decorator 4.1.2 py36hd076ac8_0
distributed 1.19.1 py36h25f3894_0
entrypoints 0.2.3 py36h1aec115_2
expat 2.2.5 he0dffb1_0
fontconfig 2.12.4 h88586e7_1
freetype 2.8 hab7d2ae_1
glib 2.53.6 h5d9569c_2
gmp 6.1.2 hb3b607b_0
gst-plugins-base 1.12.2 he3457e5_0
gstreamer 1.12.2 h4f93127_0
h5py 2.7.1 py36h3585f63_0
hdf5 1.10.1 h9caa474_1
heapdict 1.0.0 py36h79797d7_0
holoviews 1.9.1
html5lib 0.999999999 py36h2cfc398_0
icu 58.2 h211956c_0
idna 2.6 py36h82fb2a8_1
imageio 2.2.0 py36he555465_0
intel-openmp 2018.0.0 h15fc484_7
ipykernel 4.6.1 py36hbf841aa_0
ipython 6.1.0 py36hc72a948_1
ipython_genutils 0.2.0 py36hb52b0d5_0
ipywidgets 7.0.0 py36h7b55c3a_0
jedi 0.10.2 py36h552def0_0
jinja2 2.9.6 py36h489bce4_1
jpeg 9b h024ee3a_2
jsonschema 2.6.0 py36h006f8b5_0
jupyter 1.0.0 py36h9896ce5_0
jupyter_client 5.1.0 py36h614e9ea_0
jupyter_console 5.2.0 py36he59e554_1
jupyter_core 4.3.0 py36h357a921_0
krb5 1.14.2 hcdc1b81_6 anaconda
libedit 3.1 heed3624_0
libffi 3.2.1 h4deb6c0_3
libgcc 7.2.0 h69d50b8_2
libgcc-ng 7.2.0 h7cc24e2_2
libgfortran-ng 7.2.0 h9f7466a_2 anaconda
libpng 1.6.32 hda9c8bc_2
libpq 9.6.6 h4e02ad2_0 anaconda
libsodium 1.0.13 h31c71d8_2
libstdcxx-ng 7.2.0 h7a57d05_2
libtiff 4.0.8 h90200ff_9
libxcb 1.12 h84ff03f_3
libxml2 2.9.4 h6b072ca_5
llvmlite 0.20.0 py36_0
locket 0.2.0 py36h787c0ad_1
lzo 2.10 h49e0be7_2 anaconda
markupsafe 1.0 py36hd9260cd_1
matplotlib 2.1.0 py36hba5de38_0
memory_profiler 0.47 py36_0 chroxvi
mistune 0.8.1 py36h3d5977c_0
mkl 2018.0.0 hb491cac_4
msgpack-python 0.4.8 py36hec4c5d1_0
multipledispatch 0.4.9 py36h41da3fb_0
nbconvert 5.3.1 py36hb41ffb7_0
nbformat 4.4.0 py36h31c9010_0
ncurses 6.0 h06874d7_1
networkx 2.0 py36h7e96fb8_0
nodejs 6.11.2 h3db8ef7_0
notebook 5.2.1 py36h690a4eb_0
numba 0.35.0 np113py36_10
numexpr 2.6.2 py36hc561933_2 anaconda
numpy 1.13.3 py36ha12f23b_0
olefile 0.44 py36h79f9f78_0
openssl 1.0.2n hb7f436b_0
pandas 0.21.0 py36h78bd809_1
pandoc 1.19.2.1 hea2e7c5_1
pandocfilters 1.4.2 py36ha6701b7_1
param 1.5.1 py36_0 ioam
paramnb 2.0.2 py36_0 ioam
partd 0.3.8 py36h36fd896_0
pcre 8.41 hc71a17e_0
pexpect 4.2.1 py36h3b9d41b_0
phantomjs 1.9.7 0 bokeh
pickleshare 0.7.4 py36h63277f8_0
pillow 4.2.1 py36h9119f52_0
pip 9.0.1 py36h8ec8b28_3
prompt_toolkit 1.0.15 py36h17d85b1_0
psutil 5.4.0 py36h84c53db_0
psycopg2 2.7.3.2 py36h2b1659c_0 anaconda
ptyprocess 0.5.2 py36h69acd42_0
pycosat 0.6.3 py36h0a5515d_0
pycparser 2.18 py36hf9f622e_1
pygments 2.2.0 py36h0d3125c_0
pyopenssl 17.2.0 py36h5cc804b_0
pyparsing 2.2.0 py36hee85983_1
pyqt 5.6.0 py36h0386399_5
pysocks 1.6.7 py36hd97a5b1_1
pytables 3.4.2 py36h3b5282a_2 anaconda
python 3.6.3 hc9025b9_1
python-dateutil 2.6.1 py36h88d3b88_1
pytz 2017.2 py36hc2ccc2a_1
pywavelets 0.5.2 py36he602eb0_0
pyyaml 3.12 py36hafb9ca4_1
pyzmq 16.0.2 py36h3b0cf96_2
qt 5.6.2 h974d657_12
qtconsole 4.3.1 py36h8f73b5b_0
readline 7.0 hac23ff0_3
requests 2.18.4 py36he2e5f8d_1
ruamel_yaml 0.11.14 py36ha2fb22d_2
scikit-image 0.13.1 py36h14c3975_1
scikit-learn 0.19.1 py36h7aa7ec6_0
scipy 1.0.0 py36hbf646e7_0 anaconda
selenium 3.0.2 py36_0 bokeh
setuptools 36.5.0 py36he42e2e1_0
simplegeneric 0.8.1 py36h2cb9092_0
sip 4.18.1 py36h51ed4ed_2
six 1.10.0 py36hcac75e4_1
sortedcontainers 1.5.7 py36hdf89491_0
sqlite 3.20.1 h6d8b0f3_1
tblib 1.3.2 py36h34cf8b6_0
terminado 0.6 py36ha25a19f_0
testpath 0.3.1 py36h8cadb63_0
tk 8.6.7 h5979e9b_1
toolz 0.8.2 py36h81f2dff_0
tornado 4.5.2 py36h1283b2a_0
traitlets 4.3.2 py36h674d592_0
urllib3 1.22 py36hbe7ace6_0
wcwidth 0.1.7 py36hdf4376a_0
webencodings 0.5.1 py36h800622e_1
wheel 0.29.0 py36he7f4e38_1
widgetsnbextension 3.0.2 py36hd01bb71_1
xarray 0.9.6 py36_0
xz 5.2.3 h2bcbf08_1
yaml 0.1.7 h96e3832_1
zeromq 4.2.2 hbedb6e5_2
zict 0.1.3 py36h3a3bf81_0
zlib 1.2.11 hfbfcf68_1

on
Debian GNU/Linux 8 (jessie)

I build up the example in a dictionary format for the bokeh server, and started as follows:

mprof run bokeh serve --check-unused-sessions 5000 --unused-session-lifetime 5000 --log-level debug testApp

where testApp is the folder containing main.py with the code from slider.py (https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py)

After monitoring the memory used without any active session for a few minutes, I established ~ 10 sessions which I then disconnected after a few minutes. The bokeh server log showed that bokeh had deleted the documents. However, even after waiting for 40/50 minutes the memory was not released.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Jan 29, 2018

Can you also characterize whether this is a fixed increase, or per-session? If it is some fixed overhead that only happens the when the very first session is created, we will probably not do anything.

@LEMUEGGE

This comment has been minimized.

LEMUEGGE commented Jan 29, 2018

It seems to be per-session. I extended above test and started a second wave where I established 9 new sessions a few minutes after the old sessions where deleted. It doesnt seem like the memory gets freed.
mprof2

With my larger program I did a similar test over several hours and didnt see any release of memory.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 2, 2018

Observations from some experimentation:

  • As best I can tell, by returning the id of the module in Document.delete_modules and comparing afterwards to to every id in gc.get_objects, the session script module is in fact getting collected. If I comment out the bulk of delete_modules then it seems not to be collected (as would be expected).

  • Additionally setting gc.set_debug seems to indicate there is no uncollectible garbage at any point

  • It does not matter if the large objects (i.e. numpy arrays) are referred to or used by Bokeh models in the script. The memory still leaks even if they are just created and never used anywhere.

This leads me to the strange conclusion that something the Bokeh server is holding on to internal things in the script module, but I don't know how this could be possible offhand.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 2, 2018

I think the next thing I will try is to see if this is reproducible using the FunctionHandler. If not, then that would (I think) narrow the problem down to CodeRunner or handlers that depend on it.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 2, 2018

Using FunctionHandler with this example script:

https://gist.github.com/bryevdv/db82635e8d0cec690e177c70a851080c

Also leaks memory. So I now suspect that something is holding on to the Document after the session is destroyed, though I am not sure how to square that with the last observation above (unless it is an unrelated issue)

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 3, 2018

On further closer inspection it seems like unused large numpy arrays do get collected. The remaining leak is more consistent with the FunctionHandler experiment, so that would also be consistent with a Document hanging around being the source of the problem (or near to it)

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 5, 2018

OK, there is some signifcant improvement by adding some code to more thoroughly zero out some document attributes that are probably responsible for cycles:

+        self._document.clear()
+        self._document._all_models_by_name = None
+        self._document._theme = None
+        self._document._template = None
+        self._document._session_context = None

This does not get quite everything, so I am going to spend a bit more time.

@bryevdv bryevdv referenced this issue Mar 6, 2018

Merged

Address leaking memory #7604

1 of 3 tasks complete
@kagacins

This comment has been minimized.

kagacins commented Mar 11, 2018

Can I ask a question about how server memory management is supposed to happen? A partner and I created a local app for the server that reads a couple of GB into some dataframes, runs some calcs, and outputs the results into the browser (and we don't clear the dataframes for fast follow up recalculations). We love it, and Bokeh. Today, I spent a bunch of hours figuring out how to get this onto AWS. I had to get a 4GB EC2 instance to make this work for one person.

So, when I launch the server and go to my website, all the data loads into memory on the virtual machine and outputs my desired results into the browser. Still love it!!

However, even if I close my Bokeh browser tab, anyone else going to the website results in a MemoryError as I watch things unfold on the terminal. It requires that I stop the server and relaunch it in order for anyone to get back to a functional version of the site. On the one hand, I know we need to change our data infrastructure so we're not loading so much stuff into memory on every call to the server, but on the other, is the data in memory supposed to be purged when I close the tab or is there a way to do this in my Python code?

This would enable a goofy "one person at a time" web app, but it would at least prevent needing a complete server restart after any one person hits it. I sincerely love the work the team has put into this and I'm very excited to have this thing running on AWS, but now I'm very unsure about how to scale to any number of small users. Thank you so much.

@LEMUEGGE

This comment has been minimized.

LEMUEGGE commented Mar 12, 2018

@bryevdv

I did some tests and agree that #7604 is definitely improving the memory release. The minimal example of this issue is performing as one would expect it.
However, there still is no improvement for the complete program I am working on. I am trying to track down the leak and come up with a minimal example.
Using the --log-level debug tag I got printed the following message from bokeh server

Module <module 'bk_script_2b690ecf7da746a3bcb2d5785fc50a23' from '/home/jan/Documents/Visualization/main.py'> has extra unexpected referrers!
This could indicate a serious memory leak. Extra referrers: [(<bokeh.application.handlers.code_runner.CodeRunner object at 0x7f331f492588>,
<module 'bk_script_2b690ecf7da746a3bcb2d5785fc50a23' from '/home/jan/Documents/Visualization/main.py'>,
<function CodeHandler.modify_document..post_check at 0x7f331f444378>)]

Do you have any guess what this message could be pointing at?
The structure of my program is that in an external module, there is one "layout" class representing the visualization and several "plot" classes for different kinds of plots. The plots are inside holoviews.DynamicMap containers and the layout class is using the holoviews.renderer to transform the hv.DynamicMaps to bokeh objects. The layout class instance is then building up the webpage layout which is added to the session document via curdoc().add_root .
I checked that every session has its unique plot elements, i.e. there is nothing shared between sessions. Every session has its unique class instances.

Using an empty layout (that means only some bokeh.widgets but no additional plot's) I see that the destructor of the layout class instance is called when the session is discarded. With additional plot classes involved the destructors are not called. I am still checking for some circular dependency that might prevent the garbage collector from working.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 12, 2018

@kagacins @LEMUEGGE I'm glad to hear there is some improvement.

Regarding that message, it means something extra is holding on to the module Bokeh creates for foo.py when you run e.g. bokeh server foo.py or when you pass a function to a FunctionHandler. I'm not sure offhand what could cause that, we have tests to ensure that standard usage only has the expected number of references. However, once Holoviews is in the picture, I don't have any expertise to contribute. cc. @philippjfr @jlstevens

@kagacins

This comment has been minimized.

kagacins commented Mar 12, 2018

I don't know how to profile or contribute to resolving this, but I can say that I'm not using Holoviews - just Bokeh and pandas and the tiniest bit of numpy. All my data is manipulated in pandas, and it seems like that is somehow being retained. In any event, I remain grateful for all the effort.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 12, 2018

@kagacins to answer your earlier question, modulo some minuscule bookkeeping costs, e.g., every session should clean up all resources and memory when the session expires or is closed.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 12, 2018

@LEMUEGGE do you have an MRE with HoloViews code? I think the PR I just merged will address most pure-bokeh usage (though I could be wrong but an MRE for that would also be the best help).

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 12, 2018

All my data is manipulated in pandas, and it seems like that is somehow being retained.

@kagacins do you mean with a previous release? Or master with the PR applied?

@philippjfr

This comment has been minimized.

Contributor

philippjfr commented Mar 12, 2018

Missed this earlier. There is an existing issue about memory leaks in HoloViews, which I was hoping to follow up on later this week (see ioam/holoviews#2111). It's possible that references to bokeh models and HoloViews objects persist after a session is closed. If anyone has any recommendations on how best to go about debugging this and secondly on cleaning up after a bokeh session is closed that would be very helpful.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 12, 2018

@philippjfr well, first things first, you should try your test case with master and see if you get similar results @LEMUEGGE reports

@LEMUEGGE

This comment has been minimized.

LEMUEGGE commented Mar 13, 2018

@philippjfr @bryevdv
Okay so here is a MRE:

https://gist.github.com/LEMUEGGE/4a62a677e33e2cb16e62abfd686f9cea

main.py and external_module.py should be in the same directory (e.g. a folder Visualization).

Then start the app via:

bokeh serve Visualization --show --unused-session-lifetime 2000 --log-level debug

If you comment the lines

plot_obj.set_up_plot_container()
layout_obj.create_layout_holoviews()

and just use the bokeh plot, the destructors of both classes get called on the end of the session ( with the PR at least) .

Using the holoviews plot, only the layout_object instance is destroyed. (at least to my understanding of how the garbage collector in Python 3.6 is working)

@philippjfr

This comment has been minimized.

Contributor

philippjfr commented Mar 13, 2018

Thanks @LEMUEGGE that's helpful, I'll try it out in a bit.

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 13, 2018

@LEMUEGGE can you describe the circumstances when you see the This could indicate a serious memory leak message? Is it when a session times out from inactivity? when you close a browser tab and it is collected later?

@LEMUEGGE

This comment has been minimized.

LEMUEGGE commented Mar 14, 2018

@bryevdv It occured after I closed the browser tab. However I have problems reproducing it, it seems to be dependent on which plots I use (they have different read-in methods and preprocessing involved + different types of plot charts). Before I rule out that I messed something up I would think that it is more a problem on my part.

@philippjfr Maybe this is helpful: https://gist.github.com/LEMUEGGE/86b293c5630b48f080a0e84a9dd131be

I used the fact that the external_module.py is loaded only once in the Bokeh server and then created a class Debug that keeps a weak reference dictionary with all class instances. On session end (in server_lifecycle.py, the garbage collector prints the following to be unreachable: https://gist.github.com/LEMUEGGE/e2205b99067b4f6e23ef94307cfb1a0f

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 14, 2018

OK, I am going to tentatively close this then since that gist definitely seems HV specific. I think there are probably remaining (small) things to track down in Bokeh itself but I believe the big issue was handled by #7604 and it would be more conducive for me to have smaller focused issues than on giant issue that goes on forever.

@bryevdv bryevdv closed this Mar 14, 2018

@bryevdv bryevdv modified the milestones: short-term, 0.12.15 Mar 14, 2018

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 14, 2018

(of course we can re-open if it turns out I am mistaken)

@philippjfr

This comment has been minimized.

Contributor

philippjfr commented Mar 14, 2018

I think it's likely that the remaining issue are specific to HoloViews and I've started investigating thanks to your helpful example @LEMUEGGE. To clarify something, does bokeh provide a way to define lifecycle hooks to clean up after a session is closed without having to define a server_lifecycle.py module?

@bryevdv

This comment has been minimized.

Member

bryevdv commented Mar 14, 2018

Lifecycle hooks depend on having a handler present that defines the lifecycle methods, not generally on server_lifecycle.py (a particular handler looks for that file as a convenience and installs the methods based on it). So, you could define and use your own ApplicationHandler subclass.

But I'm also happy to consider a proposal/PR for facilities more specifically relevant or accessible to downstream libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment