startup time with Client is long #1399

mivade · 2017-09-15T15:17:26Z

Creating a new Client without any arguments takes several seconds to start up. I suspect the reason for this may be related to the bokeh server since running dask-scheduler --no-bokeh boots the scheduler significantly faster than dask-scheduler --bokeh does. Is there a way to disable the bokeh server when creating a new Client? Or is there something else going on under the hood that is slowing things down?

The particular use case here would be prototyping with a local scheduler before scaling up data analysis on a cluster.

The text was updated successfully, but these errors were encountered:

mrocklin · 2017-09-15T15:19:20Z

Client(diagnostics_port=None)

It would be nice to find ways to reduce startup time generally though. I agree that this is much longer than ideal. There are a few things to do here, the first of which is probably to characterize costs.

mivade · 2017-09-15T15:27:12Z

That doesn't seem to make much difference for me (distributed version 1.17.1 on Python 3.6).

I'm willing to help out if you have any suggestions on what to look into.

mrocklin · 2017-09-15T15:45:35Z

One issue is just import time.

In [1]: %time import dask.distributed
CPU times: user 480 ms, sys: 64 ms, total: 544 ms
Wall time: 613 ms

We use a forkserver (see distributed/utils.py) that imports distributed on startup. This cost can be isolated here:

In [1]: from distributed.utils import mp_context

In [2]: import time

In [3]: %time proc = mp_context.Process(target=time.sleep, args=(0,)); proc.start(); proc.join()
CPU times: user 0 ns, sys: 16 ms, total: 16 ms
Wall time: 1.25 s

In [4]: %time proc = mp_context.Process(target=time.sleep, args=(0,)); proc.start(); proc.join()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 9.84 ms

As you can see the first time we do this it's quite slow but the second time around it's quite a bit faster. Your second call to Client() (or LocalCluster, which it is calling under the hood), will similarly avoid this startup time, though that isn't often very helpful.

I can imagine starting a process in a background thread when we start the multiprocessing context just to do a warm start. That may cause other issues though, I'm not sure.

A big help would be to reduce bokeh import times (cc @bryevdv)

In [1]: %time import bokeh.plotting
CPU times: user 1.46 s, sys: 116 ms, total: 1.58 s
Wall time: 1.64 s

I find that turning off the Bokeh diagnostics server to be quite helpful to reduce times.

In [1]: from dask.distributed import LocalCluster

In [2]: %time cluster = LocalCluster(n_workers=1)
CPU times: user 728 ms, sys: 92 ms, total: 820 ms
Wall time: 2.71 s

In [3]: %time cluster = LocalCluster(n_workers=1)
CPU times: user 68 ms, sys: 12 ms, total: 80 ms
Wall time: 1.06 s

In [1]: from dask.distributed import LocalCluster

In [2]: %time cluster = LocalCluster(n_workers=1, diagnostics_port=None)
CPU times: user 60 ms, sys: 16 ms, total: 76 ms
Wall time: 817 ms

In [3]: %time cluster = LocalCluster(n_workers=1, diagnostics_port=None)
CPU times: user 44 ms, sys: 4 ms, total: 48 ms
Wall time: 151 ms

bryevdv · 2017-09-15T16:29:41Z

A big help would be to reduce bokeh import times

I was thinking about this just the other day, I am sure it can be improved, and was planning on looking at it in the next week or two. If you happen to know a good way to get better information about where item is spent, please share it. I have some suspicions, but standard cProfile was not very helpful.

mrocklin · 2017-09-15T16:35:54Z

Yeah, I just tried this and had similar luck. I would ask around though, I'm sure that there are nicer ways to identify issues than guess-and-check.

TomAugspurger · 2017-09-15T16:37:53Z

There's this: https://github.com/cournape/import-profiler (python2 only I think)

bryevdv · 2017-09-15T16:55:25Z

Thanks @TomAugspurger

Based on the output below, there may be less to do than I'd hoped. Pandas and numpy together account for ~720ms. If there is a way to optimize these numbers I am all ears. e.g. Pandas is spending 57ms in pytest? Is there a way to disable that? Perhaps the imports could be moved inside functions. That just shifts the goalpost but if spreading the time around to optimize startup is the priority it might be worthwhile.

@mrocklin One thing that could probably be immediately improved is to not compute __base_version__ in util.version. AFIACT it's not actually used anywhere and I am guess that accounts for most of those 440ms.

Other (smaller) hotspots:

200ms --- models.py this probably down to a marginal improvement in the properties system making adding up to a bigger different across all the models
45ms --- glyph methods compile themselves to add much more rich auto-generated docstrings, some things about that could probably be improved, or possibly eliminated in -O mode
35ms --- uses of templates can be protected in functions and only used when necessary
20ms --- bokeh.io not sure what's gong on there offhand

In [3]: context.print_info()
  cumtime (ms)    intime (ms)  name
        1669              3.4  bokeh.plotting
         443.5          439.8  +util.version
           3.5            3.3  ++_version
          20.8            0.9  +util
          19.9            2.1  ++settings
          17.7           17.6  +++util.paths
           9              1.8  +
           7.1            1.6  ++urllib2
           5.3            3.6  +++httplib
           1.6            0.7  ++++mimetools
         893.9            0.6  +document
         893.3            5.2  ++document
          35.3            1.4  +++jinja2
          31.4            1.6  ++++jinja2.environment
           9.5            2.4  +++++jinja2
           7              1.9  ++++++jinja2.utils
           3.9            1.8  +++++++markupsafe
           1.5            1.4  ++++++++markupsafe._speedups
          11.4            0.5  +++++jinja2.defaults
           4.3            1.9  ++++++jinja2.filters
           2.2            1.3  +++++++jinja2.runtime
           6.5            0.8  ++++++jinja2.tests
           5.7            4.3  +++++++decimal
           1.3            1.3  ++++++++numbers
           2.4            2.3  +++++jinja2.lexer
           5.4            1.2  +++++jinja2.compiler
           2.6            2.6  ++++++jinja2.optimizer
           1              1    ++++++jinja2.idtracking
           1.8            1.6  ++++jinja2.bccache
         720.5            1.9  +++core.json_encoder
         319.2            1.9  ++++numpy
         282.4            4    +++++
         276.2            1.4  ++++++numpy.lib
         250.2            1.8  +++++++type_check
         247.7            1.2  ++++++++numpy.core.numeric
         211.7          211.7  +++++++++
           4.9            4.9  +++++++++
           7              1.5  +++++++++
           1.7            0.4  ++++++++++numpy.compat
           3.7            3.5  ++++++++++numerictypes
           4.4            1.7  +++++++++
           2.6            1    ++++++++++arrayprint
           1.5            0.9  +++++++++++fromnumeric
           1.2            1.1  +++++++++
          11.7            0.9  +++++++++numpy.testing.nosetester
           7.2            0.5  ++++++++++unittest
           1.5            0.5  +++++++++++result
           2.3            0.8  +++++++++++case
           1.4            1.3  ++++++++++++difflib
           1.4            0.5  +++++++++++main
           2.8            0.6  ++++++++++
           2.3            1.1  +++++++++++utils
           1              1    ++++++++++++numpy.lib.utils
           6.3            0.8  +++++++index_tricks
           2.6            1.5  ++++++++
           2.1            0.9  ++++++++numpy.matrixlib
           1.1            1.1  +++++++++defmatrix
           1.6            1.6  +++++++
           6.5            0.9  +++++++polynomial
           5.5            1.3  ++++++++numpy.linalg
           3.8            0.9  +++++++++linalg
           2.8            2.8  ++++++++++numpy.linalg
           4.6            0.9  +++++++npyio
           1              1    ++++++++_iotools
           4.1            0.4  +++++
           2.6            0.7  ++++++fftpack
           1.3            1.3  +++++++
           8.7            0.9  +++++
           3.1            1.5  ++++++polynomial
           1.1            1    ++++++legendre
          12.7            0.7  +++++
          11.4           11.2  ++++++mtrand
           5.9            0.8  +++++
           4              3.3  ++++++
           1.1            0.9  ++++++
         398.3            4    ++++util.serialization
         393.6            2    +++++pandas
           5              1.9  ++++++pytz
           1.2            1.1  +++++++pytz.tzinfo
          13.1            2.3  ++++++pandas.compat.numpy
           2.8            2.6  +++++++distutils.version
           1.3            0.5  +++++++pandas.compat.chainmap
           6.1            1.3  +++++++dateutil
           1.3            0.7  ++++++++
           3.1            0.4  ++++++++
           2.7            1    +++++++++tz
          27.6            3    ++++++pandas._libs
          13.1           11.9  +++++++tslib
          11.4            4.3  +++++++pandas._libs.lib
           1.3            1.2  ++++++++backports_abc
           5.5            5.3  ++++++++interval
          13.1            2.9  ++++++pandas.core.config_init
           8.1            1.5  +++++++pandas.core.config
           6.5            2.1  ++++++++pandas.io.formats.printing
           4.2            4    +++++++++pandas.core.dtypes.inference
         172.5            0.6  ++++++pandas.core.api
          16.5            1    +++++++pandas.core.algorithms
          10.6            0.9  ++++++++pandas.core.dtypes.cast
           8.8            0.9  +++++++++common
           5.7            5.6  ++++++++++pandas._libs
           1.4            1.3  ++++++++++dtypes
           4.7            1    ++++++++pandas.core
           2.6            0.9  +++++++++pandas.api
          12.5            2.1  +++++++pandas.core.categorical
           9.3            1.6  ++++++++pandas.core.base
           4.9            1.5  +++++++++pandas.util._validators
           2.6            1.1  ++++++++++pandas.core.util.hashing
           1.4            1.3  +++++++++++pandas._libs
           1.6            1.1  +++++++++pandas.core.nanops
         126.4            7.7  +++++++pandas.core.groupby
          43.4            0.4  ++++++++pandas.core.index
          43              1.3  +++++++++pandas.core.indexes.api
          15.5            2.4  ++++++++++pandas.core.indexes.base
           7.5            7.2  +++++++++++pandas._libs
           2.8            1    +++++++++++pandas.core.ops
           1.7            1    ++++++++++++pandas.core.computation.expressions
           1.8            1.6  +++++++++++pandas.core.strings
           1.6            1.4  ++++++++++pandas.core.indexes.category
           1.8            1.6  ++++++++++pandas.core.indexes.multi
           1.4            1.3  ++++++++++pandas.core.indexes.interval
           1.1            1    ++++++++++pandas.core.indexes.numeric
           1.3            1.2  ++++++++++pandas.core.indexes.range
          15.7            1.2  ++++++++++pandas.core.indexes.timedeltas
           8.3            2.8  +++++++++++pandas.tseries.frequencies
           5.2            2.8  ++++++++++++pandas.tseries.offsets
           1.8            1.5  +++++++++++++pandas.core.tools.datetimes
           5.3            1.5  +++++++++++pandas.core.indexes.datetimelike
           3.7            3.5  ++++++++++++pandas._libs.period
           3.3            1.3  ++++++++++pandas.core.indexes.period
           1.8            1.5  +++++++++++pandas.core.indexes.datetimes
          66.9            9    ++++++++pandas.core.frame
          26.5            3.5  +++++++++pandas.core.generic
           1.5            1.2  ++++++++++pandas.core.indexing
           8.1            2.9  ++++++++++pandas.core.internals
           5              2    +++++++++++pandas.core.sparse.array
           2.6            2.5  ++++++++++++pandas._libs.sparse
          13.2            1.6  ++++++++++pandas.io.formats.format
          10.9            1    +++++++++++pandas.io.common
           2.2            0.7  ++++++++++++csv
           1.4            1.4  +++++++++++++_csv
           1.2            1.2  ++++++++++++mmap
           2.3            1.5  ++++++++++++py.path
           3.1            1.6  ++++++++++++py._path.local
           1.1            1    +++++++++++++py._path
          21.9            4.4  +++++++++pandas.core.series
           1.1            1    ++++++++++pandas.core.indexes.accessors
           7.8            3.3  ++++++++++pandas.core
           4.4            4.2  +++++++++++pandas._libs.window
           8.3            1.1  ++++++++++pandas.plotting._core
           1.3            0.8  +++++++++++pandas.plotting
           2.6            0.6  +++++++++++pandas.plotting._misc
           1.4            0.7  ++++++++++++pandas.plotting._style
           3.3            2.9  +++++++++++pandas.plotting._core
           9              0.8  +++++++++pandas.core.computation.eval
           6.4            2.1  ++++++++++pandas.core.computation.expr
           1.1            1    +++++++++++pandas.core.reshape.util
           2.9            1.3  +++++++++++pandas.core.computation.ops
           1.7            0.8  ++++++++++pandas.core.computation.engines
           4              3.6  ++++++++pandas.core.panel
           3.1            3    ++++++++pandas._libs
           2.6            1.9  +++++++pandas.core.panel4d
          10.6            1.2  +++++++pandas.core.reshape.reshape
           7              0.4  ++++++++pandas.core.sparse.api
           3.1            2.3  +++++++++pandas.core.sparse.series
           2.6            2.4  +++++++++pandas.core.sparse.frame
           2.1            2    ++++++++pandas._libs
           2              1.8  +++++++pandas.core.resample
           1.7            0.6  ++++++pandas.stats.api
           1.1            0.9  +++++++pandas.stats.moments
           4.2            0.4  ++++++pandas.core.reshape.api
           1.4            1.1  +++++++pandas.core.reshape.merge
           1.2            1.1  ++++++pandas.tools.plotting
          87.3            0.7  ++++++pandas.io.api
           9.1            3.8  +++++++pandas.io.parsers
           4.4            4.1  ++++++++pandas._libs.parsers
          49.6           47.3  +++++++pandas.io.clipboard.clipboard
           1.3            0.6  ++++++++clipboards
           4.4            2.2  +++++++pandas.io.excel
           1.5            1.4  ++++++++pandas._libs.json
           4.7            3    +++++++pandas.io.pytables
           1.2            1    ++++++++pandas.core.computation.pytables
           3.1            0.5  +++++++pandas.io.json
           2.6            0.9  ++++++++json
           1.2            1.1  +++++++pandas.io.html
           1.2            1.1  +++++++pandas.io.sql
           1.7            1.5  +++++++pandas.io.stata
           1.5            0.6  +++++++pandas.io.pickle
           8.4            0.9  +++++++pandas.io.packers
           5.8            0.9  ++++++++pandas.io.msgpack
           1.7            1.6  +++++++++pandas.io.msgpack._packer
           2.3            2    +++++++++pandas.io.msgpack._unpacker
           1.1            1.1  ++++++++pandas.util._move
          58              0.4  ++++++pandas.util._tester
          57.6            5.2  +++++++pytest
          16.4            3.3  ++++++++_pytest.config
           6.3            0.5  +++++++++_pytest._code
           5              1.2  ++++++++++code
           2.6            0.7  +++++++++++_pytest.compat
           1.8            1.6  ++++++++++++py._io.capture
           2.8            0.7  +++++++++_pytest.hookspec
           2              0.4  ++++++++++_pytest._pluggy
           1.6            1.5  +++++++++++_pytest.vendored_packages.pluggy
           3.4            0.8  +++++++++_pytest.assertion
           1.3            1.2  ++++++++++_pytest.assertion
           4.3            1.1  ++++++++_pytest.fixtures
           2.2            1.6  +++++++++py._code.code
           1.7            1.6  ++++++++_pytest.mark
           1.5            0.8  ++++++++_pytest.skipping
           1.3            1.2  ++++++++_pytest.main
           1.6            1.4  ++++++++_pytest.python
           1.6            0.7  ++++++++_pytest.tmpdir
          13.5           11.9  ++++++++_pytest.junitxml
           1.5            1.4  +++++++++py._xmlgen
           4.4            0.4  ++++++pandas.testing
           4              2.1  +++++++pandas.util.testing
           1.4            1.4  ++++++++pandas._libs
           1.6            1.6  +++core.query
          28.7           28.1  +++core.templates
           2.4            0.6  +++core.validation
           3.3            2.2  +++events
           1              1    ++++util.future
          42.4            0.5  +++themes
          41.8            1.3  ++++theme
          33.3            1.7  +++++yaml
           1.1            1.1  ++++++tokens
          19.9            0.6  ++++++loader
          10.5           10.4  +++++++reader
           1.2            1.1  +++++++scanner
           2              1.9  +++++++constructor
           4.2            4.1  +++++++resolver
           4.2            0.5  ++++++dumper
           2.4            2.1  +++++++representer
           4.5            0.9  ++++++cyaml
           3.6            3.6  +++++++_yaml
           7.3            3.5  +++++core.has_props
           2.4            2.4  ++++++property.containers
           1.3            1.2  +++util.callback_manager
           2.6            2.5  +++events
           1              1    +++locking
          48.2            0.9  +++util
          47.3            3.8  ++++model
          43.2           10.2  +++++core.properties
           4.3            3.8  ++++++colors
           7.8            2.3  ++++++property.bases
           5.4            5.3  +++++++descriptors
          20.6            3.2  ++++++
          17.4           17.3  +++++++
         198.2            1    +models
         132.1           17.4  ++annotations
          45.1            4.6  +++core.property_mixins
          39              2.1  ++++bokeh.models.transforms
           4.5            3.6  +++++util.compiler
          32.2            3.8  +++++sources
           1.3            1.2  ++++++callbacks
           1.7            1.6  ++++++filters
          25.1           25.1  ++++++properties
          48.2            4.2  +++formatters
           2.6            2.4  ++++tickers
           1.8            1.7  ++++uuid
          39             22.8  ++++models.glyphs
           1.3            1.2  +++++encodings.utf_32_be
           2.2            2.1  +++++mappers
           6              5.6  +++++markers
          10.4            5.5  +++renderers
           1.2            1.1  ++++graphs
           1.5            1.4  ++++tiles
           5.1            3.1  +++arrow_heads
           1.4            1.4  ++++copy_reg
           5.2            4.1  ++axes
           1.7            1.4  ++grids
          14.5            3.3  ++layouts
          10.4            4.6  +++embed
           5.6            3.2  ++++resources
           2.3            1.5  +++++util.session_id
          25.7            1.4  ++map_plots
          24.1            6.8  +++plots
           3.4            3    ++++ranges
          11.2            9.4  ++++tools
          17.8            0.7  ++widgets
           4.6            1.8  +++buttons
           2.6            0.7  ++++icons
           1.8            1.8  +++++widget
           1.3            1.2  +++groups
           1.6            1.4  +++inputs
           1.1            1    +++markups
           1.3            1.1  +++panels
           2.9            2.6  +++sliders
           4.2            3.8  +++tables
          19.3            0.8  +io
           1              1    ++bokeh.util.api
           7.7            1.8  ++export
           5.8            1.4  +++saving
           1.8            1.5  ++++state
           2.5            2.5  ++++util
           2.5            2.3  ++notebook
           5.7            1.4  ++showing
           3.3            0.8  +++util.browser
           2.4            2.3  ++++webbrowser
           1              0.9  +++util.notebook
           3.2            2.6  +layouts
          43.5           29.3  +figure
           8.4            6.8  ++helpers
           1.4            1.3  +++transform
          33.4           28.4  +gmap

TomAugspurger · 2017-09-15T16:59:28Z

I've done a bit of work on pandas import time, I have a bit more to do and need to push it, but I want to have that done for the next release (~2 weeks away)

bryevdv · 2017-09-15T17:00:04Z

@mrocklin I am in the middle of a fairly messy PR so switching to make the util.version change is actually a lot of overhead. But if you submit the tiny PR to:

remove those few lines of code from bokeh/util/version.py
remove the transitive import from bokeh/__init__.py
delete the one assert from bokeh/tests/test_bokeh_init.py
add a note to the release notes about this removal and why

then I would merge it ASAP and get you a dev build made. Tho again, this only adds ~half what numpy and pandas account for (still worth doing 400ms is a lot)

mrocklin · 2017-09-15T18:16:38Z

Happy to wait. This is definitely a nuisance but something that we've lived with for a while. It sounds like Pandas is also likely to improve things here medium term.

bryevdv · 2017-09-15T19:21:30Z

@mrocklin can you make Bokeh issue and link to this one?

bryevdv · 2017-09-15T22:38:25Z

@mrocklin bokeh/bokeh#6938 should shave ~200-250ms off. The rest of the 400ms was for __version__ in a dev env where it has to ask GH for the version. In real releases the value is harcoded and takes ~no time.

pitrou · 2017-10-16T11:31:22Z

@bryevdv, perhaps it's possible to import pandas or even numpy lazily (i.e. only when needed)?

mrocklin · 2017-10-16T14:05:41Z

Pandas PR to reduce import time: pandas-dev/pandas#17710

bryevdv · 2019-01-17T03:02:41Z

@mrocklin can this be closed?

mrocklin · 2019-01-17T03:46:42Z

It does still take a while.

In [1]: from dask.distributed import Client                                     

In [2]: %time client = Client()                                                 
CPU times: user 460 ms, sys: 53.4 ms, total: 513 ms
Wall time: 1.56 s

Especially on machines with many workers, as is described in #2450

This issue doesn't really have a "good enough" target though. I'm inclined to leave it open if it's not bothering anyone. No strong thoughts though. Happy to close due to staleness if you prefer.

mivade · 2019-01-17T17:14:48Z

I'm pretty satisfied with how things are now compared to when I originally opened this issue and would be fine closing it. One way to deal with things like this that can take a long time to finish would be adding some sort of optional progress bar so that the user knows things are happening. I'm not sure how easy that would be to implement here, though.

vovavili · 2021-12-23T11:45:03Z

On my working remote machine client = Client() takes about 15 seconds to start, so either my machine is peculiar or this issue did not deserve to be closed prematurely.

bryevdv mentioned this issue Sep 15, 2017

stop computing unused and expensive bokeh.__base_version__ bokeh/bokeh#6938

Merged

1 task

mrocklin mentioned this issue Feb 28, 2018

distributed.nanny cannot start worker process #1770

Closed

GenevieveBuckley added the performance label Oct 15, 2021

GenevieveBuckley closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

startup time with Client is long #1399

startup time with Client is long #1399

mivade commented Sep 15, 2017

mrocklin commented Sep 15, 2017

mivade commented Sep 15, 2017

mrocklin commented Sep 15, 2017

bryevdv commented Sep 15, 2017

mrocklin commented Sep 15, 2017

TomAugspurger commented Sep 15, 2017

bryevdv commented Sep 15, 2017 •

edited

Loading

TomAugspurger commented Sep 15, 2017

bryevdv commented Sep 15, 2017 •

edited

Loading

mrocklin commented Sep 15, 2017

bryevdv commented Sep 15, 2017

bryevdv commented Sep 15, 2017 •

edited

Loading

pitrou commented Oct 16, 2017

mrocklin commented Oct 16, 2017

bryevdv commented Jan 17, 2019

mrocklin commented Jan 17, 2019

mivade commented Jan 17, 2019

vovavili commented Dec 23, 2021

startup time with Client is long #1399

startup time with Client is long #1399

Comments

mivade commented Sep 15, 2017

mrocklin commented Sep 15, 2017

mivade commented Sep 15, 2017

mrocklin commented Sep 15, 2017

bryevdv commented Sep 15, 2017

mrocklin commented Sep 15, 2017

TomAugspurger commented Sep 15, 2017

bryevdv commented Sep 15, 2017 • edited Loading

TomAugspurger commented Sep 15, 2017

bryevdv commented Sep 15, 2017 • edited Loading

mrocklin commented Sep 15, 2017

bryevdv commented Sep 15, 2017

bryevdv commented Sep 15, 2017 • edited Loading

pitrou commented Oct 16, 2017

mrocklin commented Oct 16, 2017

bryevdv commented Jan 17, 2019

mrocklin commented Jan 17, 2019

mivade commented Jan 17, 2019

vovavili commented Dec 23, 2021

bryevdv commented Sep 15, 2017 •

edited

Loading

bryevdv commented Sep 15, 2017 •

edited

Loading

bryevdv commented Sep 15, 2017 •

edited

Loading