New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bryanv/reduce import code #8309

merged 3 commits into from Oct 5, 2018


None yet
2 participants
Copy link

bryevdv commented Oct 5, 2018

This PR claims some low hanging fruit for reducing Bokeh import times:

  • Don't use PackageLoader for loading Bokeh Jinja templates (required pkg_resources import is very expensive)

  • defer non-stdlib imports in sampledata modules (downloading sampledata is rare usage)

  • reduce some dynamic docstring manipulations

All told this shaves ~200ms off import bokeh.plotting on my laptop. The results using:

python3.7 -X importtime -c "import bokeh.plotting" 2> bokeh3.log
tuna bokeh3.log


screen shot 2018-10-05 at 15 16 30

About half of that ~600ms appears to be NumPy and Pandas, which is borne out by this very rough timing:

In [1]: import pandas, numpy

In [2]:

In [2]: %time import bokeh.plotting
CPU times: user 190 ms, sys: 40.8 ms, total: 231 ms
Wall time: 309 ms

It's worth noting that 60-70 ms for computing bokeh.__version__ disappears in real release packages where the version string is hardcoded.

We can't do much about the NumPy/Pandas burden, but there are still some things we can do to reduce things on our end later:

  • defer loading default temlate yaml files until requested (this will take a little more work/care)
  • remove more dynamic module code (e.g dynamic glyph method construction)
  • defer loading Jinja templates until needed
  • move submodules of bokeh.models to bokeh._modules so that individual models can be imported internally without importing everything in

I would estimate ~150ms (relative reference on this laptop) is probably a floor for import bokeh.plotting


This comment has been minimized.

Copy link
Member Author

bryevdv commented Oct 5, 2018

cc @mrocklin possibly of interest to you.


This comment has been minimized.

Copy link

mrocklin commented Oct 5, 2018

Oooh, I'm very glad to hear it :) This benefit will apply to dask-workers twice, so this is pretty substantial :) (we import bokeh, then spawn a process, then import bokeh again)


This comment has been minimized.

Copy link
Member Author

bryevdv commented Oct 5, 2018

Additionally, in 1.0 Bokeh is switching to "simple ids" by default, i.e. a simple monotonically increasing sequence of integers. This is ~2x improvement in time to generate IDs for models, but since you mention separate processes I will note just for completeness: if you are somehow constructing models for a single document across multiple processes (I very much doubt you or anyone is ever doing this), that's the one scenario where you'd need to set BOKEH_SIMPLE_IDS=no to return to uuids (ids just need to be unique, per document)

@bryevdv bryevdv force-pushed the bryanv/reduce_import_code branch from e87222d to 2fcc317 Oct 5, 2018

@bryevdv bryevdv added this to the 1.0 milestone Oct 5, 2018


This comment has been minimized.

Copy link
Member Author

bryevdv commented Oct 5, 2018

image report looks good, changes are small and should be uncontroversial so merging now

@bryevdv bryevdv merged commit be0ae3e into master Oct 5, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
continuous-integration/travis-ci/pr The Travis CI build passed

@bryevdv bryevdv deleted the bryanv/reduce_import_code branch Oct 6, 2018

xavArtley pushed a commit to xavArtley/bokeh that referenced this pull request Oct 15, 2018

Bryanv/reduce import code (bokeh#8309)
* remove docstring formatting and duplication for Figure/figure

* use file system loader to avoid pkg_resources

* defer expensive imports for sampledata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment