Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Speed-up test suite when using pytest-xdist #16196

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

jdavies-st
Copy link
Contributor

This PR fixes a testing slowdown/bottleneck we see when parallelizing the test suite over all (or many, or most) available cores using pytest-xdist. Other large science test suites have seen the same issue and solved it. See scipy/scipy#14441 and scikit-learn/scikit-learn#25918. Our solution here is similar.

The solution is to make sure that one limits the number of threads that openBLAS (and the others) use when parallelizing a pytest run so that pytest workers are not competing for cores with openBLAS threads. This involves a pytest hook that uses threadpoolctl to set the number of threads to either 1 or some other small value depending on how many of the available cores are being used by pytest-xdist. In the case of -n auto, it will be the equivalent of setting the env varOPENBLAS_NUM_THREADS=1.

Currently on main branch with pytest-xdist turned off, running a small subset of the test suite:

$ tox -e test -- -k lombscargle_multiband
<snip>
===== 960 passed, 51 skipped, 27292 deselected in 31.01s =====

And currently on main branch with pytest-xdist turned on, using all available cores on my 11 core M3 Pro Macbook:

$ tox -e test -- -k lombscargle_multiband -n auto
<snip>
===== 960 passed, 7 skipped in 1807.45s (0:30:07) =====

More than 50 times slower.

With this PR:

$ tox -e test -- -k lombscargle_multiband -n auto
<snip>
===== 960 passed, 7 skipped in 29.40s =====

Much more reasonable when compared to the first case above.

Since pytest-xdist is already a [test] extra dependency, it makes sense to add threadpoolctl there as well, as this is the standard tool for controlling threads from the underlying linear algebra libraries used by numpy when doing parallelization with workers.

Fixes #16195

Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! I'm really happy to learn how to run pytest in parallel locally and to see it fixed in the same day 😄

@jdavies-st
Copy link
Contributor Author

Comparing the Python 3.11 in Parallel with all optional dependencies CI job on this PR with a previous run from earlier today.

Earlier

========= 30234 passed, 289 skipped, 1057 xfailed in 264.30s (0:04:24) =========

https://github.com/astropy/astropy/actions/runs/8265771501/job/22612232492#step:10:3642

This PR

========= 30234 passed, 289 skipped, 1057 xfailed in 240.95s (0:04:00) =========

https://github.com/astropy/astropy/actions/runs/8271697027/job/22632049869?pr=16196#step:10:3643

So a small improvement on CI where we use -n 4. Seems like the larger improvement comes when number of cores goes up.

astropy/conftest.py Outdated Show resolved Hide resolved
@pllim pllim added this to the v6.0.1 milestone Mar 13, 2024
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and I appreciate you post the CI timings too so I don't have to dig. 😆

@pllim
Copy link
Member

pllim commented Mar 13, 2024

Since the only concern is addressed, merging. Thanks, all!

@pllim pllim merged commit 62aaa29 into astropy:main Mar 13, 2024
29 of 33 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request Mar 13, 2024
@jdavies-st jdavies-st deleted the tst-use-threadpoolctl-with-xdist branch March 13, 2024 23:18
pllim added a commit that referenced this pull request Mar 13, 2024
…196-on-v6.0.x

Backport PR #16196 on branch v6.0.x (TST: Speed-up test suite when using pytest-xdist)
d-giles pushed a commit to d-giles/astropy that referenced this pull request Jul 26, 2024
* Use threadpoolctl to limit threads when pytest-xdist in use

* Use threadpoolctl in pytest_configure hook

* Use try/except/else
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: Test suite slowdown with pytest-xdist
4 participants