Run scheduler of `SubprocessCluster` in subprocess #7727

hendrikmakait · 2023-03-30T14:58:21Z

In #7431, we added the SubprocessCluster, which runs a local cluster where all workers run in subprocesses. This PR also moves the scheduler to a subprocess.

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-03-30T16:18:32Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      26 files ±  0       26 suites ±0 14h 21m 40s ⏱️ + 1h 13m 4s
  3 547 tests +  2   3 433 ✔️ -     2   106 💤 +    1 7 ❌ +2 1 🔥 +1
44 872 runs +11 42 623 ✔️ - 110 2 240 💤 +118 8 ❌ +2 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit d07f83b. ± Comparison against base commit 78a926d.

This pull request skips 1 test.

distributed.protocol.tests.test_protocol ‑ test_large_messages

♻️ This comment has been updated with latest results.

distributed/deploy/subprocess.py

gjoseph92 · 2023-04-05T16:14:23Z

distributed/deploy/subprocess.py

+            line = (await self.process.stderr.readline()).decode()
+            if not line.strip():
+                raise RuntimeError("Scheduler failed to start")
+            logger.info(line.strip())


Suggested change

logger.info(line.strip())

sys.stderr.write(line)

We're not redirecting stderr from the workers; seems like output from the scheduler should be treated the same? Subtle, but logging could be configured differently from plain stderr—including to prefix some additional information which could be confusing—so just forwarding to stderr seems more consistent with the worker subprocesses.

See #7727 (comment) for the general discrepancy between the generally configured log level and the one required to retrieve the address.

gjoseph92 · 2023-04-05T16:16:14Z

distributed/deploy/subprocess.py

+        while True:
+            line = (await self.process.stderr.readline()).decode()
+            if not line.strip():
+                raise RuntimeError("Scheduler failed to start")
+            logger.info(line.strip())
+            if "Scheduler at" in line:
+                self.address = line.split("Scheduler at:")[1].strip()
+                break
+        logger.debug(line)


once the Scheduler at: message happens, further logs from the scheduler will be swallowed, right? I think it would be nicer UX to keep forwarding scheduler stderr the whole time, just like stderr from workers will be visible. I'd even wonder if that pipe could get filled up in rare cases, blocking writes on the scheduler side.

That's a little more work though, since you'd probably need to start a Task in this start method that forwards stderr, and clean it up in close.

One issue we face here is that the default log level is WARN, but we need to enable INFO logs on the scheduler to be able to retrieve the scheduler address. I've borrowed this pattern from the SSHCluster implementation, which seemed "good enough". I'm not keen to dive into more work on this for now. , in particular, since it should also entail that we only write logs with the applicable minimum log level.

Yeah, that makes sense.

distributed/deploy/subprocess.py

Co-authored-by: Gabe Joseph <gjoseph92@gmail.com>

hendrikmakait added 4 commits March 30, 2023 13:58

Minor cleanup

d5fae93

Subprocess scheduler

d16932a

log level

05c48e9

Docs

3ceff54

hendrikmakait requested review from jacobtomlinson and fjetter as code owners March 30, 2023 14:58

hendrikmakait added 3 commits March 30, 2023 17:05

Parameters

0b3eec1

Minor

9bfb2f4

PIPE

19f3b47

hendrikmakait self-assigned this Apr 3, 2023

gjoseph92 reviewed Apr 5, 2023

View reviewed changes

hendrikmakait and others added 2 commits April 5, 2023 19:59

Update distributed/deploy/subprocess.py

ea7056b

Co-authored-by: Gabe Joseph <gjoseph92@gmail.com>

cmd

d07f83b

gjoseph92 approved these changes Apr 5, 2023

View reviewed changes

hendrikmakait merged commit e72c309 into dask:main Apr 6, 2023
24 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run scheduler of `SubprocessCluster` in subprocess #7727

Run scheduler of `SubprocessCluster` in subprocess #7727

hendrikmakait commented Mar 30, 2023 •

edited

github-actions bot commented Mar 30, 2023 •

edited

gjoseph92 Apr 5, 2023

hendrikmakait Apr 5, 2023

gjoseph92 Apr 5, 2023

hendrikmakait Apr 5, 2023

gjoseph92 Apr 5, 2023

Run scheduler of SubprocessCluster in subprocess #7727

Run scheduler of SubprocessCluster in subprocess #7727

Conversation

hendrikmakait commented Mar 30, 2023 • edited

github-actions bot commented Mar 30, 2023 • edited

Unit Test Results

gjoseph92 Apr 5, 2023

Choose a reason for hiding this comment

hendrikmakait Apr 5, 2023

Choose a reason for hiding this comment

gjoseph92 Apr 5, 2023

Choose a reason for hiding this comment

hendrikmakait Apr 5, 2023

Choose a reason for hiding this comment

gjoseph92 Apr 5, 2023

Choose a reason for hiding this comment

Run scheduler of `SubprocessCluster` in subprocess #7727

Run scheduler of `SubprocessCluster` in subprocess #7727

hendrikmakait commented Mar 30, 2023 •

edited

github-actions bot commented Mar 30, 2023 •

edited