Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay connecting to the remote executor until executorInit #18794

Closed
wants to merge 1 commit into from

Conversation

jmmv
Copy link
Contributor

@jmmv jmmv commented Jun 27, 2023

This change moves the majority of the code in the beforeCommand hook of the RemoteModule to executorInit. This makes the former validate the configuration flags only, and postpones any attempts to connect to the remote executor until it's actually needed.

The reason for delaying the connection to the executor is to ensure that commands that do /not/ require it can run through completion without errors nor pauses.

To make the diff of this change small, I've tried to leave the code where it was by wrapping it in a new function that gets called from executorInit. The indirection is not necessary for anything else though.

Fixes #18760.

@jmmv jmmv requested a review from a team as a code owner June 27, 2023 21:08
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jun 27, 2023
@coeuvre
Copy link
Member

coeuvre commented Jun 28, 2023

I like the way you made the diff small, so clean! Please fix failing tests and we are ready to go.

@jmmv jmmv force-pushed the delayed-remote-init branch 3 times, most recently from ae0a0b3 to 615420d Compare June 28, 2023 22:40
@meisterT meisterT added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jul 13, 2023
@jmmv
Copy link
Contributor Author

jmmv commented Jul 14, 2023

Oops, turns out that while the original "simple" change mostly worked... it boke various tests. I was able to fix some a couple of weeks ago but encountered others that were trickier, and lost the chance to continue looking.

Let me move this to a draft PR in the meantime until I have a chance to address the failures.

... but feel free to take it over if you wish :P

@jmmv jmmv marked this pull request as draft July 14, 2023 16:34
@jmmv jmmv marked this pull request as ready for review August 18, 2023 00:34
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Aug 18, 2023
@jmmv
Copy link
Contributor Author

jmmv commented Aug 18, 2023

Alright. I finally had a chance to get back to this. It was slightly more complicated than anticipated due to all of the side-effects that the RemoteModule lifecycle has... but all tests are passing now.

This change moves the majority of the code in the beforeCommand hook
of the RemoteModule to executorInit.  This makes the former validate
the configuration flags only, and postpones any attempts to connect
to the remote executor until it's actually needed.

The reason for delaying the connection to the executor is to ensure
that commands that do /not/ require it can run through completion
without errors nor pauses.

To make the diff of this change small, I've tried to leave the code
where it was by wrapping it in a new function that gets called from
executorInit.  The indirection is not necessary for anything else
though.

Fixes bazelbuild#18760.
Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@coeuvre coeuvre added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer awaiting-user-response Awaiting a response from the author labels Aug 22, 2023
@coeuvre
Copy link
Member

coeuvre commented Aug 28, 2023

Somehow we need to prioritize #18607, i.e. we want to make handshake parallel to analyze phase. I need to hold the merge of this PR until I have a clear solution for both #18607 and #18760. Sorry.

@coeuvre coeuvre removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Aug 28, 2023
@jmmv
Copy link
Contributor Author

jmmv commented Aug 28, 2023

I don't have the full picture, but I think that to fix #18607, you'll have to start by resolving the same issues that this PR addresses, so it may be beneficial to get it in. RemoteModule initialization is too complex due to its side-effects and hidden dependencies on global state, so initializing it in the background is going to hit the same problems I found. In other words: this PR already resolves the issue of delaying initialization until later but still happens synchronously. If you want initialization to happen in the background, wrapping the new executorInit logic into a future should be easier than trying to deal with what existed before.

Separate question: is there a plan to fix this all for Bazel 7, possibly due to Skymeld? The reason I ask is because this patch is intrusive, so I'm debating whether it makes sense for me to maintain it as an internal patch. If there is a clear path towards deleting it altogether (because it will be integrated "as is" into Bazel or because it will be rewritten in some other way), then the risk is low. But if not... I'll face trouble updating it later on :P

@coeuvre
Copy link
Member

coeuvre commented Aug 28, 2023

I don't have the full picture, but I think that to fix #18607, you'll have to start by resolving the same issues that this PR addresses, so it may be beneficial to get it in. RemoteModule initialization is too complex due to its side-effects and hidden dependencies on global state, so initializing it in the background is going to hit the same problems I found. In other words: this PR already resolves the issue of delaying initialization until later but still happens synchronously. If you want initialization to happen in the background, wrapping the new executorInit logic into a future should be easier than trying to deal with what existed before.

I agree the initialization is too complex especially when other modules rely on the initialization result. What this PR didn't address are:

  1. Properly initialize remote module for BEP uploader and remote downloader.
  2. While we can delay / move the initialization code as is, other modules still call / dependent it in beforeCommand. So it doesn't resolve the problem fundamentally.

The solution I am looking for is probably store a future of channel in RemoteCache / other similar places while it can still correctly propagate the initialization errors.

Separate question: is there a plan to fix this all for Bazel 7, possibly due to Skymeld? The reason I ask is because this patch is intrusive, so I'm debating whether it makes sense for me to maintain it as an internal patch. If there is a clear path towards deleting it altogether (because it will be integrated "as is" into Bazel or because it will be rewritten in some other way), then the risk is low. But if not... I'll face trouble updating it later on :P

I am working on the fix now. I expect to resolve this within this week unless something changed.

@coeuvre
Copy link
Member

coeuvre commented Sep 7, 2023

Closing, in favor of #18607.

@coeuvre coeuvre closed this Sep 7, 2023
@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote exec connection should be established later
3 participants