Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Speed up import deepchecks by making it lazier #2758

Open
a-recknagel opened this issue Mar 26, 2024 · 0 comments
Open

[FEAT] Speed up import deepchecks by making it lazier #2758

a-recknagel opened this issue Mar 26, 2024 · 0 comments
Labels
linear needs triage Issue needs to be labeled and prioritized

Comments

@a-recknagel
Copy link

a-recknagel commented Mar 26, 2024

Is your feature request related to a problem? Please describe.
Importing deepchecks takes a fair amount of time (~2 seconds on my machine, ~1.6 if I set DISABLE_LATEST_VERSION_CHECK). This might not matter much in a notebook, but e.g. my unit tests run in <0.1 seconds if I skip tests that import deepchecks, so it's quite noticeable.

Describe the solution you'd like
It'd be nice if deepchecks would be a lot lazier. Right now, testing with deepchecks==0.18.1, doing just import deepchecks imports a total of 69 modules from the deepchecks namespace, which in turn import a total of 2024 (!) other third-party modules. As a starting point, I listed all third party packages that get imported explicitly, highlighting those where I feel like it would be beneficial if they could be loaded lazily for someone who only plans to use deepchecks.core and deepchecks.tabular:

  • IPython: 0.166 seconds 🐌
  • bs4: 0.043 seconds
  • decimal: 0.00102 seconds
  • gc: 0.00000009 seconds
  • http: 0.013 seconds
  • importlib: 0.0172 seconds
  • ipykernel: 0.183 seconds 🐌
  • ipywidgets: 0.206 seconds 🐌
  • joblib: 0.124 seconds 🐌
  • json: 0.00109 seconds
  • jsonpickle: 0.211 seconds 🐌
  • matplotlib: 0.284 seconds 🐌
  • multiprocessing: 0.00558 seconds
  • pandas: 0.43 seconds 🐌 (surely unavoidable)
  • pkg_resources: 0.0487 seconds
  • pkgutil: 0.00216 seconds
  • plotly: 0.159 seconds 🐌
  • queue: 0.000632 seconds
  • scipy: 0.295 seconds 🐌 (surely unavoidable)
  • six: 0.000815 seconds
  • sklearn: 0.376 seconds 🐌 (probably unavoidable)
  • tenacity: 0.0344 seconds
  • timeit: 0.000198 seconds
  • tqdm: 0.215 seconds 🐌
  • typing_extensions: 0.00329 seconds
  • unicodedata: 0.000199 seconds
  • urllib: 0.0188 seconds
  • uuid: 0.00205 seconds
  • zipfile: 0.00462 seconds

If everything that has to do with notebook and visualization wouldn't get imported on the top level, I'd imagine that the import time would go down a fair bit.

Additional context
Flamegraph of the import time (with DISABLE_LATEST_VERSION_CHECK set), generated with tuna by running

python -X importtime -c "import deepchecks" 2> deepchecks.log
tuna deepchecks.log

Screenshot 2024-03-26 145604

@github-actions github-actions bot added needs triage Issue needs to be labeled and prioritized linear labels Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear needs triage Issue needs to be labeled and prioritized
Projects
None yet
Development

No branches or pull requests

1 participant