Do not rely on logging for `SubprocessCluster` #8398

hendrikmakait · 2023-12-12T09:37:33Z

Partially addresses #8392, #8393

This PR relies on the scheduler_file to propagate the scheduler address to the worker.

Note:
Conceptually, the same fix can be applied to the SSHCluster but things get mildly more complicated because one might have to deal with different file systems for the client and the scheduler.

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-12-12T10:28:13Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files ±  0       27 suites ±0 11h 52m 36s ⏱️ + 14m 41s
  3 935 tests +  1   3 819 ✔️ -   2   110 💤 ±0   6 ❌ +3
49 501 runs +31 47 191 ✔️ +29 2 293 💤 - 3 17 ❌ +5

For more details on these failures, see this check.

Results for commit 2c49f6b. ± Comparison against base commit c408b6a.

crusaderky · 2023-12-14T17:23:05Z

The new test is very flaky:
https://github.com/dask/distributed/actions/runs/7210951073/job/19645256909?pr=8404

crusaderky · 2023-12-14T17:24:30Z

distributed/deploy/tests/test_subprocess.py

+    with new_config_file(
+        {"distributed": {"logging": {"distributed": logging.CRITICAL + 1}}}
+    ):
+        await asyncio.wait_for(_start(), timeout=2)


I don't understand what's the purpose of this super-short timeout, and why the timeout integrated in gen_test is not what we want in this case?

I wrote the test before the fix and didn't want to wait the entire 30 seconds while fixing.

crusaderky · 2023-12-14T17:28:46Z

Attempt fix on #8413

Do not rely on logging for subprocess cluster

2c49f6b

hendrikmakait requested review from jacobtomlinson and fjetter as code owners December 12, 2023 09:37

fjetter approved these changes Dec 13, 2023

View reviewed changes

hendrikmakait merged commit f2e1e51 into dask:main Dec 13, 2023
21 of 35 checks passed

hendrikmakait deleted the logless-subprocess branch December 13, 2023 13:48

hendrikmakait mentioned this pull request Dec 14, 2023

test_subprocess_cluster_does_not_depend_on_logging is broken on Python 3.12 #8408

Closed

crusaderky reviewed Dec 14, 2023

View reviewed changes

crusaderky mentioned this pull request Dec 14, 2023

Fix flaky test_subprocess_cluster_does_not_depend_on_logging #8413

Closed

hendrikmakait mentioned this pull request Mar 27, 2024

SubprocessCluster and SSHCluster hang indefinitely if distributed log level is WARN or higher #8393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not rely on logging for `SubprocessCluster` #8398

Do not rely on logging for `SubprocessCluster` #8398

hendrikmakait commented Dec 12, 2023 •

edited

github-actions bot commented Dec 12, 2023

crusaderky commented Dec 14, 2023 •

edited

crusaderky Dec 14, 2023

hendrikmakait Dec 14, 2023 •

edited

crusaderky commented Dec 14, 2023

Do not rely on logging for SubprocessCluster #8398

Do not rely on logging for SubprocessCluster #8398

Conversation

hendrikmakait commented Dec 12, 2023 • edited

github-actions bot commented Dec 12, 2023

Unit Test Results

crusaderky commented Dec 14, 2023 • edited

crusaderky Dec 14, 2023

Choose a reason for hiding this comment

hendrikmakait Dec 14, 2023 • edited

Choose a reason for hiding this comment

crusaderky commented Dec 14, 2023

Do not rely on logging for `SubprocessCluster` #8398

Do not rely on logging for `SubprocessCluster` #8398

hendrikmakait commented Dec 12, 2023 •

edited

crusaderky commented Dec 14, 2023 •

edited

hendrikmakait Dec 14, 2023 •

edited