Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler memory just keep increasing in idle mode #5509

Closed
ridhachahed opened this issue Nov 8, 2021 · 5 comments
Closed

Scheduler memory just keep increasing in idle mode #5509

ridhachahed opened this issue Nov 8, 2021 · 5 comments

Comments

@ridhachahed
Copy link

Hello Dask community !

I am running Dask on a linux server and I am very limited by my memory. My main problem is that the the dask scheduler process just keep eating more and more memory even though I don't submit any work to the workers yet ! I would like to understand what's going on and if there is any mitigation to this problem ?

To reproduce it:
from dask.distributed import Client
client = Client(memory_limit='100MB', processes=False, n_workers=4, threads_per_worker=1)

Then check memory with top | grep python

Ps : I succeded in limiting the worker process memory by adding --memory limit to the client call but this doesn't seem to be possible for the scheduler process.

@ian-r-rose
Copy link
Collaborator

Hi @ridhachahed are you perhaps using the dask-labextension? If so, might you be running into dask/dask-labextension#185 ?

@ridhachahed
Copy link
Author

ridhachahed commented Nov 8, 2021

Hello @ian-r-rose , unfortunately I am not using dask-labextension everything is run form the command line without the dashboard. You will find attached the numbers I have where we can the see scheduler memory increasing while the worker memory reaches a plateau at one point (when we don't put a limit on their memory).

image

image

@jrbourbeau jrbourbeau transferred this issue from dask/dask Nov 9, 2021
@jrbourbeau
Copy link
Member

The scheduler has an additional events logging system where certain messages are stored. The size of these logs are limited (the default size of 100k) to prevent unbounded memory usage

self.events = defaultdict(
partial(
deque, maxlen=dask.config.get("distributed.scheduler.events-log-length")
)
)

It could be that you're observing these administrative logs build up on the scheduler. What happens if you set the distributed.scheduler.events-log-length configuration value (which controls the size of these event logs) to be capped at a much smaller value, e.g. 10?

@ridhachahed
Copy link
Author

@jrbourbeau Thanks for your help ! It definitely helped mitigate the problem as you can see on the graphs below, but we are still observing a slower increase. If I understand correctly self.events is only used for debugging purpose right ? So letting the queue size to 10 will not affect performance. How about self.transition_log, self.log and self.computation ? I am having a hard time debugging the scheduler attributes to see how they are modified during communication, do you have any suggestions ?

image
image

@gjoseph92
Copy link
Collaborator

@ridhachahed I see you closed this—what resolution did you come to? Did reducing self.transition_log, self.log and self.computation help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants