`pytest-xdist` workers crashing #404

github-actions · 2022-09-29T22:04:45Z

jrbourbeau · 2022-09-30T03:09:09Z

tests/benchmarks/test_csv.py::test_csv_basic and tests/benchmarks/test_dataframe.py::test_dataframe_align caused pytest-xdist workers to crash for some reason. We've started seeing this with other tests on PRs as well. It's not clear why this is happening. It's not always these tests that cause pytest worker crashes -- other tests do as well.

[gw2] node down: Not properly terminated
[gw2] [  4%] FAILED tests/benchmarks/test_csv.py::test_csv_basic 

replacing crashed worker gw2

[gw3] node down: Not properly terminated
[gw3] [  4%] FAILED tests/benchmarks/test_dataframe.py::test_dataframe_align 

replacing crashed worker gw3

jrbourbeau · 2022-10-03T15:20:09Z

We saw something similar in #412 with tests/benchmarks/test_array.py::test_anom_mean

[gw0] node down: Not properly terminated
[gw0] [  4%] FAILED tests/benchmarks/test_array.py::test_anom_mean 

replacing crashed worker gw0

jrbourbeau · 2022-10-03T15:21:41Z

We saw something similar in #413 with tests/benchmarks/test_zarr.py::test_select_scalar

[gw1] node down: Not properly terminated
[gw1] [  4%] FAILED tests/benchmarks/test_zarr.py::test_select_scalar 

replacing crashed worker gw1

ncclementi · 2022-10-06T14:02:44Z

Looking at it quickly, It seems it is a known issue and is still open pytest-dev/pytest-xdist#466
someone else reported here too pytest-dev/pytest-xdist#714 and didn't get much attention either.

Not quite sure how to proceed here. There are other issues opened or closed with no answer related to this.

Slaves crash on win64 with error "Not properly terminated" pytest-dev/pytest-xdist#70
Another potential helpful issue Running tests fails with "node down: Not properly terminated", maybe execnet related? pytest-dev/pytest#3216 that

ian-r-rose · 2022-10-06T15:41:36Z

It may just be that we have too many concurrent xdist workers. The theory was there should not be much work done on the client, so 8 workers is fine. But that theory might not be correct. In particular, I think that package_sync might be kind of expensive for the client. Some support for package_sync being expensive is that in #429 all of the worker crashes happen on the first test of the given module, which would be when the cluster is being spun up.

Two possible ways to alleviate this:

Reduce the number of xdist workers (try six or four?)
Revert Use single job for all test categories #370 and distribute the CI across more runners again.

Thoughts?

ncclementi · 2022-10-06T15:51:57Z

Reduce the number of xdist workers (try six or four?)

We can test this on a branch and run it every day for a few days and see if it fixes it.

github-actions bot added the ci-failure label Sep 29, 2022

This was referenced Sep 30, 2022

⚠️ CI failed ⚠️ #407

Closed

⚠️ CI failed ⚠️ #405

Closed

jrbourbeau mentioned this issue Oct 3, 2022

⚠️ CI failed ⚠️ #412

Closed

jrbourbeau mentioned this issue Oct 3, 2022

⚠️ CI failed ⚠️ #413

Closed

jrbourbeau changed the title ~~⚠️ CI failed ⚠️~~ pytest-xdist workers crashing Oct 3, 2022

jrbourbeau mentioned this issue Oct 3, 2022

⚠️ CI failed ⚠️ #374

Closed

This was referenced Oct 3, 2022

⚠️ CI failed ⚠️ #376

Closed

⚠️ CI failed ⚠️ #418

Closed

This was referenced Oct 4, 2022

⚠️ CI failed ⚠️ #422

Closed

⚠️ CI failed ⚠️ #423

Closed

This was referenced Oct 5, 2022

⚠️ CI failed ⚠️ #426

Closed

⚠️ CI failed ⚠️ #425

Closed

⚠️ CI failed ⚠️ #428

Closed

jrbourbeau mentioned this issue Oct 6, 2022

⚠️ CI failed ⚠️ #429

Closed

ncclementi mentioned this issue Oct 6, 2022

Slaves crash on win64 with error "Not properly terminated" pytest-dev/pytest-xdist#70

Closed

ian-r-rose self-assigned this Oct 6, 2022

ian-r-rose mentioned this issue Oct 6, 2022

CI Fixes #432

Merged

ncclementi closed this as completed in #432 Oct 7, 2022

YuanTingHsieh mentioned this issue May 22, 2024

Fix CI failure NVIDIA/NVFlare#2590

Merged

6 tasks

bongbui321 mentioned this issue Jul 29, 2024

CI: fix CI to run consistent tests 100% of the time commaai/openpilot#33089

Closed

1 task

benHeid mentioned this issue Aug 30, 2024

[MNT] Windows issue on python 3.10-3.12 sktime/pytorch-forecasting#1632

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pytest-xdist` workers crashing #404

`pytest-xdist` workers crashing #404

github-actions bot commented Sep 29, 2022

jrbourbeau commented Sep 30, 2022

jrbourbeau commented Oct 3, 2022

jrbourbeau commented Oct 3, 2022

ncclementi commented Oct 6, 2022 •

edited

Loading

ian-r-rose commented Oct 6, 2022

ncclementi commented Oct 6, 2022

pytest-xdist workers crashing #404

pytest-xdist workers crashing #404

Comments

github-actions bot commented Sep 29, 2022

jrbourbeau commented Sep 30, 2022

jrbourbeau commented Oct 3, 2022

jrbourbeau commented Oct 3, 2022

ncclementi commented Oct 6, 2022 • edited Loading

ian-r-rose commented Oct 6, 2022

ncclementi commented Oct 6, 2022

`pytest-xdist` workers crashing #404

`pytest-xdist` workers crashing #404

ncclementi commented Oct 6, 2022 •

edited

Loading