Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more routine garbage collection in distributed? #1516

Open
jrmlhermitte opened this issue Oct 31, 2017 · 5 comments
Open

more routine garbage collection in distributed? #1516

jrmlhermitte opened this issue Oct 31, 2017 · 5 comments

Comments

@jrmlhermitte
Copy link

jrmlhermitte commented Oct 31, 2017

I've noticed the memory seems to increase so I was worried of memory leaks. I haven't noticed any (as I'm sure you were very confident I'd say ;-) ).

However, what I have noticed is that sometimes when a python process is killed the memory usage on the cluster doesn't go to zero right away. This can be problematic if the memory usage is quite large.

For example, let's say we have the following code, called test_distributed.py

from distributed import Client
client = Client("IP:PORT") # put IP and PORT of sched here
import numpy as np

def foo(a):
    return a+1

arr = np.ones((100000000))
arr2 = np.zeros((100000000))

ff = client.submit(foo, arr)
ff2 = client.submit(foo, arr2)

If I manually run in 5 times (python test_distributed.py), I see the following result for the memory usage:
Memory profiles

The memory usage goes up, then comes down when the process terminates, but does not go to zero. When I run the same process, it goes up again, but never exceeds the previous memory usage. So this suggests there is no memory leak.

I figured this could perhaps have something to do with the python garbage collection process, so I went one step further and ran the following script:

from distributed import Client
client = Client("IP:PORT") # put IP and PORT of sched here

def cleanup():
    import gc
    gc.collect()

client.submit(cleanup)

This brought down the memory back to zero.
Memory profiles

My feeling is that the python garbage collector can sometimes be slightly more aggressive with memory.
For long running applications like distributed, I think it could be a good idea to force the garbage collection process every once in a while.

What do you think? Am I correct in my guess, and would there be a way to resolve this on the distributed side? The other obvious solution is for the user to run a cron script sending gc messages to the cluster. However, this is not so clean (and for large intermittent loads may run at very irregular times).

I sort of looked around to see if this was mentioned before, and didn't see anything. I apologize if this is a repost. Thanks!

@mrocklin
Copy link
Member

mrocklin commented Oct 31, 2017 via email

@jrmlhermitte
Copy link
Author

The version is '1.19.3+17.g74cebfb'
I can pull the latest and test this again right now.

@jrmlhermitte
Copy link
Author

I pulled from master (made sure to delete the pip installed distributed. It was a pip install directly from github about a week ago I believe anyway). (Also send a print(distributed.__file__) just to be sure the correct file was submitted).

I see the same result.

In passing, is there a way to see the version of distributed used on the bokeh server? That could be a nice feature.

@mrocklin
Copy link
Member

mrocklin commented Nov 1, 2017

I recommend checking recent pull requests for the term GC. You'll find a few in the last few weeks.

This may interest @ogrisel and @bluenote10 . Any desire to add an infrequent periodic self._throttled_gc call in Worker.memory_monitor?

@mrocklin
Copy link
Member

mrocklin commented Nov 1, 2017

In passing, is there a way to see the version of distributed used on the bokeh server? That could be a nice feature.

If you're interested this could be added easily to the new HTML routes available in the info tab. The templates here are in distributed/bokeh/templates/. I recommend changing workers.html to index.html and including more information, alongside the workers table already there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants