-
-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
startup time with Client is long #1399
Comments
It would be nice to find ways to reduce startup time generally though. I agree that this is much longer than ideal. There are a few things to do here, the first of which is probably to characterize costs. |
That doesn't seem to make much difference for me ( I'm willing to help out if you have any suggestions on what to look into. |
One issue is just import time. In [1]: %time import dask.distributed
CPU times: user 480 ms, sys: 64 ms, total: 544 ms
Wall time: 613 ms We use a forkserver (see distributed/utils.py) that imports In [1]: from distributed.utils import mp_context
In [2]: import time
In [3]: %time proc = mp_context.Process(target=time.sleep, args=(0,)); proc.start(); proc.join()
CPU times: user 0 ns, sys: 16 ms, total: 16 ms
Wall time: 1.25 s
In [4]: %time proc = mp_context.Process(target=time.sleep, args=(0,)); proc.start(); proc.join()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 9.84 ms As you can see the first time we do this it's quite slow but the second time around it's quite a bit faster. Your second call to I can imagine starting a process in a background thread when we start the multiprocessing context just to do a warm start. That may cause other issues though, I'm not sure. A big help would be to reduce bokeh import times (cc @bryevdv) In [1]: %time import bokeh.plotting
CPU times: user 1.46 s, sys: 116 ms, total: 1.58 s
Wall time: 1.64 s I find that turning off the Bokeh diagnostics server to be quite helpful to reduce times. In [1]: from dask.distributed import LocalCluster
In [2]: %time cluster = LocalCluster(n_workers=1)
CPU times: user 728 ms, sys: 92 ms, total: 820 ms
Wall time: 2.71 s
In [3]: %time cluster = LocalCluster(n_workers=1)
CPU times: user 68 ms, sys: 12 ms, total: 80 ms
Wall time: 1.06 s In [1]: from dask.distributed import LocalCluster
In [2]: %time cluster = LocalCluster(n_workers=1, diagnostics_port=None)
CPU times: user 60 ms, sys: 16 ms, total: 76 ms
Wall time: 817 ms
In [3]: %time cluster = LocalCluster(n_workers=1, diagnostics_port=None)
CPU times: user 44 ms, sys: 4 ms, total: 48 ms
Wall time: 151 ms |
I was thinking about this just the other day, I am sure it can be improved, and was planning on looking at it in the next week or two. If you happen to know a good way to get better information about where item is spent, please share it. I have some suspicions, but standard |
Yeah, I just tried this and had similar luck. I would ask around though, I'm sure that there are nicer ways to identify issues than guess-and-check. |
There's this: https://github.com/cournape/import-profiler (python2 only I think) |
Thanks @TomAugspurger Based on the output below, there may be less to do than I'd hoped. Pandas and numpy together account for ~720ms. If there is a way to optimize these numbers I am all ears. e.g. Pandas is spending 57ms in pytest? Is there a way to disable that? Perhaps the imports could be moved inside functions. That just shifts the goalpost but if spreading the time around to optimize startup is the priority it might be worthwhile. @mrocklin One thing that could probably be immediately improved is to not compute Other (smaller) hotspots:
|
I've done a bit of work on pandas import time, I have a bit more to do and need to push it, but I want to have that done for the next release (~2 weeks away) |
@mrocklin I am in the middle of a fairly messy PR so switching to make the
then I would merge it ASAP and get you a dev build made. Tho again, this only adds ~half what numpy and pandas account for (still worth doing 400ms is a lot) |
Happy to wait. This is definitely a nuisance but something that we've lived with for a while. It sounds like Pandas is also likely to improve things here medium term. |
@mrocklin can you make Bokeh issue and link to this one? |
@mrocklin bokeh/bokeh#6938 should shave ~200-250ms off. The rest of the 400ms was for |
@bryevdv, perhaps it's possible to import pandas or even numpy lazily (i.e. only when needed)? |
Pandas PR to reduce import time: pandas-dev/pandas#17710 |
@mrocklin can this be closed? |
It does still take a while. In [1]: from dask.distributed import Client
In [2]: %time client = Client()
CPU times: user 460 ms, sys: 53.4 ms, total: 513 ms
Wall time: 1.56 s Especially on machines with many workers, as is described in #2450 This issue doesn't really have a "good enough" target though. I'm inclined to leave it open if it's not bothering anyone. No strong thoughts though. Happy to close due to staleness if you prefer. |
I'm pretty satisfied with how things are now compared to when I originally opened this issue and would be fine closing it. One way to deal with things like this that can take a long time to finish would be adding some sort of optional progress bar so that the user knows things are happening. I'm not sure how easy that would be to implement here, though. |
On my working remote machine client = Client() takes about 15 seconds to start, so either my machine is peculiar or this issue did not deserve to be closed prematurely. |
Creating a new
Client
without any arguments takes several seconds to start up. I suspect the reason for this may be related to the bokeh server since runningdask-scheduler --no-bokeh
boots the scheduler significantly faster thandask-scheduler --bokeh
does. Is there a way to disable the bokeh server when creating a newClient
? Or is there something else going on under the hood that is slowing things down?The particular use case here would be prototyping with a local scheduler before scaling up data analysis on a cluster.
The text was updated successfully, but these errors were encountered: