Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test suite execution speed #42

Closed
1 of 4 tasks
connortann opened this issue May 15, 2023 · 2 comments
Closed
1 of 4 tasks

Improve test suite execution speed #42

connortann opened this issue May 15, 2023 · 2 comments
Labels
ci About continuous integration

Comments

@connortann
Copy link
Collaborator

connortann commented May 15, 2023

I think there are a few areas for improvement in the GitHub test suite that we could address to improve the execution speed. Currently the unit tests take almost 20 minutes to run on CI. If we could reduce that it could help reduce the time it takes to validate PRs, improving our effectiveness as reviewers.

TODO

Slowest tests

[Updated] here are the current set of slowest tests:

============================= slowest 20 durations =============================
55.60s call     tests/explainers/test_partition.py::test_translation
48.50s call     tests/explainers/test_partition.py::test_translation_auto
48.44s call     tests/explainers/test_partition.py::test_translation_algorithm_arg
47.71s call     tests/explainers/test_partition.py::test_serialization
46.61s call     tests/explainers/test_partition.py::test_serialization_custom_model_save
43.77s call     tests/explainers/test_partition.py::test_serialization_no_model_or_masker
40.41s call     tests/explainers/test_gradient.py::test_pytorch_mnist_cnn
@connortann connortann changed the title Improve test suite execution and reporting Improve test suite execution speed May 15, 2023
@dsgibbons dsgibbons added the ci About continuous integration label May 18, 2023
@connortann
Copy link
Collaborator Author

connortann commented Jun 1, 2023

Caching dependencies: experiment notes

Comparison of various options I've tried for caching dependencies.

Nb. we can see & manage caches via UI: https://github.com/dsgibbons/shap/actions/caches

Repository caches limited to 10GB.

Timings

Env Baseline 1: Cache pip 2: Cache whole env 3: Cache some libs
py3.7 4m 14s 1m 34s 3m 15s
py3.8 5m 6s 1m 50s 3m 4s
py3.9 4m 25s 4m 34s 2m 25 2m 56s
py3.10 4m 30s 4m 41s 1m 44s 2m 51s
py3.11 4m 42s 5m 17s 2m 42s 2m 51s
Average 4m 35s 4m 50s 2m 3s 3m 35s

Approaches

0. Baseline

Existing approach, just pip-install with no caching.

1. Enable cache in the setup python action.

Caches the wheels, but not the installed environment. As per the action docs.

2. Cache the whole python env

As per this blog

  • Env cache is ~3GB, so this exceeds the 10GB limit for all envs
  • Example run
  • Implementation
  • Result: ~2X speedup, but only space to cache 3 of 5 envs.

3. Cache specific libraries in site-packages

Cache only the libraries which need to be built, such as pyspark. Leave other libs to be pip-installed as before

To decide which packages to cache: we want to save the most time, whilst keeping under ~2GB total cache size per env. Some calculations from experimentation, sorted by those that save the most time for the least space:

Package size (MB) built time (s) s / MB
site-packages/pyspark* 310 12s 0.039
site-packages/nvidia* 1521 40s 0.026
site-packages/torch* 619 13s 0.021
site-packages/tensorflow* 586 12s 0.020
site-packages/xgboost* 200 4s 0.020

So, decide to cache just the first 3 libraries. In future if we drop support for any python versions, we can cache more libraries.

Implementing options on PR #84 .

@connortann
Copy link
Collaborator Author

Ported to shap#3045

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci About continuous integration
Projects
None yet
Development

No branches or pull requests

2 participants