Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear documentation on how workers treat tool inputs and non-tool inputs #10498

Open
endobson opened this issue Dec 29, 2019 · 14 comments
Open
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-Local-Exec Issues and PRs for the Execution (Local) team type: documentation (cleanup)

Comments

@endobson
Copy link
Contributor

ATTENTION! Please read and follow:

  • if this is a question about how to build / test / query / deploy using Bazel, or a discussion starter, send it to bazel-discuss@googlegroups.com
  • if this is a bug or feature request, fill the form below as best as you can.

Description of the problem / feature request:

With a custom worker, each action can reuse an existing worker process or start a new one. It seems that the current logic doesn't allow specifying what inputs are needed by the worker before it is able to accept requests and what is needed only for handling individual requests. Thus while if code that is used by the worker at startup is changed it means that the old worker processes cannot be safely reused, bazel currently assumes that worker processes can be reused.

Inputs vs tools is not a sufficient distinction here as that is currently for host vs target configuration and a rule may not need the worker restarted if a tool is changed.

This leads to bugs that are able to be fixed by a bazel clean, (which seems to shutdown old workers).

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I don't have one. You can try taking, endobson/minimal-racket@3ad8f44 and mucking with the tools to make it fail to compile and then revert the changes. You should still see issues after reverting the changes, but before running bazel clean.

What operating system are you running Bazel on?

Mac OS

What's the output of bazel info release?

release 2.0.0

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

This question really needs to be reworded as only applying if bazel is a git version, and that you want this run in the development version of the bazel directory.

Have you found anything relevant by searching the web?

Yes that the documentation for this feature (workers) is non-existent and has been so for years, and fixing this has been a P1 for over 6 months. #7997

@irengrig irengrig added team-Local-Exec Issues and PRs for the Execution (Local) team untriaged labels Jan 2, 2020
@rupertks
Copy link
Contributor

rupertks commented Jan 2, 2020

@jmmv

@jmmv
Copy link
Contributor

jmmv commented Jan 9, 2020

Could you provide some example with specific kinds of inputs you are talking about? After a quick look, I cannot understand the problem you are describing. Thanks.

@endobson
Copy link
Contributor Author

endobson commented Jan 9, 2020

I linked the code that is for running the workers, endobson/minimal-racket@3ad8f44. The worker executable reads some of the inputs before it starts accepting requests. These are the libraries it uses in its implementation, as the language runtime I'm using doesn't produce a single native executable. If these change then the worker should produce different results, but that will only happen if the worker process gets restarted, which it doesn't (as that is the point of persistent workers).

This means that using an existing worker versus using bazel clean to shut down the workers produce different results.

@endobson
Copy link
Contributor Author

endobson commented Jan 9, 2020

Maybe here is a better question: Some time when rerunning an action with a persistent worker the worker will need to be restarted due to the worker's code changing. Which inputs correspond to that versus just running the action on the existing persistent worker process?

@jmmv
Copy link
Contributor

jmmv commented Jan 9, 2020

Ah, I see what you mean now. So you are talking about the dependencies of the worker program itself. How do these change during the build though? I'm surprised that's possible.

@endobson
Copy link
Contributor Author

endobson commented Jan 9, 2020

They don't change during regular operation. But when first writing the worker program and debugging issues with it, such changes were very common.

@jmmv
Copy link
Contributor

jmmv commented Jan 9, 2020

Ah, I see!

But... it's pretty much impossible for Bazel to know what the dependencies of an arbitrary binary are, and having a way to manually specify them seems pretty error prone. We could monitor for changes to the binary itself (and we probably should if we don't already), but that doesn't help if dynamic libraries change for example.

Do you have a specific proposal other than the simplistic suggestion above?

@endobson
Copy link
Contributor Author

endobson commented Jan 9, 2020

I want documentation telling me how this is supposed to work, and fairly big warnings on how this can cause inconsistent builds, mechanisms to detect it/avoid it.

For example would having all the dependencies as part of the runfiles of the target used as 'executable' have solved this? Thats doable for me, and seems like a fairly easy to explain rule.

@jmmv jmmv added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: documentation (cleanup) and removed more data needed untriaged labels Jan 16, 2020
@meisterT
Copy link
Member

cc @larsrc-google

@larsrc-google
Copy link
Contributor

Documentation for persistent workers is now available at https://docs.bazel.build/versions/master/persistent-workers.html, though it doesn't address this particular problem.

I don't think adding the dependencies as runfiles would help at the moment. The tools are by and large treated as black boxes - they could be executables created by a third party that doesn't want to give you their source. They are not expected to be Bazel targets. However, if the WorkerKey changes, new workers are created. So if you can somehow update the arguments or environment on recompile, that would get picked up automatically.

Alternatively, when working on the workers themselves, use the --worker_quit_after_build flag.

@larsrc-google
Copy link
Contributor

Reading through this again, it seems to be a bigger issue than just workers. If I understand correctly, your action runs a binary that loads some dynamic library, but this library is not declared as part of the runfiles (or some of its further dependencies aren't declared). But that would mean that your build, regardless of use of workers, is not properly hermetic, since a change that Bazel is not informed of can change what the result is.

@endobson
Copy link
Contributor Author

endobson commented Apr 8, 2021

All of my tools are intended to be hermetic and I use bazel to generate them. The underlying racket vm/runtime is an external repository in my WORKSPACE and I'm using toolchains to find them for the rules (https://github.com/endobson/minimal-racket/blob/master/WORKSPACE).

What I think you are referring to as the dynamic library is the bazel-tools/racket-compiler library that I use here: https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L93

That library is intended to be declared as a dependency of the action here: https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L93 through the toolchain https://github.com/endobson/minimal-racket/blob/3c95378d2c5379225fd8a233988773d97c9c4fc4/BUILD#L29.

IIRC when originally filing this issue I observed that changes to (https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/build_rules/racket-compiler.rkt) would make bazel rebuild my resulting code if I wasn't using workers but would not do the same if I was.

@larsrc-google
Copy link
Contributor

After chatting, we found that the four lines from https://github.com/endobson/minimal-racket/blob/3ad8f443f517b1ece4a1e7bbd1b8ee2b37b7919f/racket.bzl#L108 should go into tools instead of inputs. @endobson will try this out.

@endobson
Copy link
Contributor Author

endobson commented Apr 9, 2021

Haven't yet got to seeing if tools versus inputs will solve my particular problem, but I'm fairly confident we correctly identified the issue. But I spent some more time thinking about it and I think the original point that the distinction between deps and tools is not sufficient for workers solely given their definitions. But it might be made sufficient with documentation on how workers treats them.

I'm going to say that the environment/input that the worker is executed in/on is split into two parts, the components that are to be reused/cached across requests (e.g. worker startup flag, interpreted code) and those that will be different between requests (e.g. filename to compile, file contents of file to compiles). These may take the form of flags or files in the filesystem. It is important for correctness that the worker to be restarted every time one of its reusable inputs changes/for it not to treat the inputs that bazel thinks are changeable as reusable. Thus the specification of what bazel thinks is which is very important for a tool author to know. And I cannot find that in the public documentation:

The internal implementation seems to use only the tools in computing the hash:

This seems like a very reasonable decision because tools correspond to those built for the execution platform where as inputs correspond to things built for the target platform. But the difference between those fields is the platform for which they are built, and I think that if persistent workers are going to use tools for as part of the worker key then documenting that is the right solution. That doesn't let a worker mark target platform dependencies as ones where it wants to be rebuilt, but I cannot find a good use case for that and it can be handled by the worker restarting itself internally when the hash changes on a new WorkRequest.

@larsrc-google larsrc-google self-assigned this Apr 9, 2021
@keertk keertk added the team-Documentation Documentation improvements that cannot be directly linked to other team labels label Jan 10, 2023
@larsrc-google larsrc-google changed the title Workers allow inconsistent builds Unclear documentation on how workers treat tool inputs and non-tool inputs Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Documentation Documentation improvements that cannot be directly linked to other team labels team-Local-Exec Issues and PRs for the Execution (Local) team type: documentation (cleanup)
Projects
None yet
Development

No branches or pull requests

8 participants