New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run scheduler of SubprocessCluster
in subprocess
#7727
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 26 files ± 0 26 suites ±0 14h 21m 40s ⏱️ + 1h 13m 4s For more details on these failures and errors, see this check. Results for commit d07f83b. ± Comparison against base commit 78a926d. This pull request skips 1 test.
♻️ This comment has been updated with latest results. |
line = (await self.process.stderr.readline()).decode() | ||
if not line.strip(): | ||
raise RuntimeError("Scheduler failed to start") | ||
logger.info(line.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info(line.strip()) | |
sys.stderr.write(line) |
We're not redirecting stderr from the workers; seems like output from the scheduler should be treated the same? Subtle, but logging could be configured differently from plain stderr—including to prefix some additional information which could be confusing—so just forwarding to stderr seems more consistent with the worker subprocesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #7727 (comment) for the general discrepancy between the generally configured log level and the one required to retrieve the address.
while True: | ||
line = (await self.process.stderr.readline()).decode() | ||
if not line.strip(): | ||
raise RuntimeError("Scheduler failed to start") | ||
logger.info(line.strip()) | ||
if "Scheduler at" in line: | ||
self.address = line.split("Scheduler at:")[1].strip() | ||
break | ||
logger.debug(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the Scheduler at:
message happens, further logs from the scheduler will be swallowed, right? I think it would be nicer UX to keep forwarding scheduler stderr the whole time, just like stderr from workers will be visible. I'd even wonder if that pipe could get filled up in rare cases, blocking writes on the scheduler side.
That's a little more work though, since you'd probably need to start a Task
in this start
method that forwards stderr, and clean it up in close
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue we face here is that the default log level is WARN
, but we need to enable INFO
logs on the scheduler to be able to retrieve the scheduler address. I've borrowed this pattern from the SSHCluster implementation, which seemed "good enough". I'm not keen to dive into more work on this for now. , in particular, since it should also entail that we only write logs with the applicable minimum log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense.
Co-authored-by: Gabe Joseph <gjoseph92@gmail.com>
In #7431, we added the
SubprocessCluster
, which runs a local cluster where all workers run in subprocesses. This PR also moves the scheduler to a subprocess.pre-commit run --all-files