-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symlinks inside of TreeArtifacts fail with RBE and BwoB #22504
Comments
cc @tjgq |
I think that might be the problem here. Building without the bytes uses a virtual filesystem to compute the metadata for action outputs, and that virtual filesystem only has access to the action's declared inputs, for both hermeticity and efficiency reasons. (If, on the other hand, you can repro this even when the symlink can be resolved given only access to the action's inputs, I'd consider it a bug.) |
(To further clarify: if I understand the scenario correctly, your build is already non-hermetic; without the input dependency, nothing guarantees that the output pointed to by the symlink is produced first, or even that it's produced at all. Building without the bytes only makes the problem easier to detect.) |
Sorry, i was not arguing if this is bug or not, i am simply trying to understand it and see if it's me who's doing something really horrible or it's a Bazel issue.
I agree, that's indeed the culprit, and i understand why one might consider it not a bug. It's just odd given the action succeeds with other spawn strategies, even with
It is produced because the files which symlinks point to are in the Also since we are talking about hermeticity, do you think it's more appropriate to put these unused inputs into Am i doing something really weird here?, because i don't fully understand what's fundamentally wrong here. FWIW, we have already tested this feature with a few large codebases and it seems to work. |
I agree that it's unfortunate that the behavior depends on the contents of the output tree. We could maybe do a better job of flagging these symlinks as non-hermetic (assuming it can be done with a reasonable amount of overhead).
It's an overlay filesystem that falls back to the actual filesystem; in a purely "without the bytes" build, there's nothing to fall back to. (It should probably only fall back for files that have been determined to be in the input set; this would make the behavior consistent, but is tricky to implement in an efficient manner.)
This is simply not how Bazel works; the outputs of an action must be available when the action finishes, not later. Among other reasons why, Bazel needs to hash the tree artifact contents to be able to determine whether and how to run followup actions that depend on it.
It's different because
It won't help;
I worry that you might be trying too hard to optimize things by creating symlinks (and running into non-intuitive problems as a result). Would it perhaps be acceptable to copy files around instead, and rely on the various caching mechanisms built into Bazel to keep your builds fast? |
Thanks @tjgq
Note that they are often huge. PyTorch with cuda is multiple GB by itself and some clients need to build that into multiple OCI images. But I think it's simply unavoidable given your analysis here |
Description of the bug:
With rules_oci 2.x we implemented a feature where we create symlinks inside of a tree-artifact that points out of the tree-artifact (#20891) to files created by other actions and this works fine with
RBE
andLocal
execution.Note that the files that symlinks point to are not in the inputs of the action that creates the symlink, and i don't think it matters much.
However, when the same action run on RBE with
--remote_download_minimal
flag, it fails with the message below.Here's the Bazel slack channel threads which might be helpful: https://aspect-build.slack.com/archives/C04281DTLH0/p1716397067812139
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
1- clone https://github.com/bazel-contrib/rules_oci/tree/2.x
2- run bazel test //examples/... --remote_download_minimal on RBE
Which operating system are you running Bazel on?
Darwin
What is the output of
bazel info release
?7.1.1
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
I don't think is a regression.
Have you found anything relevant by searching the web?
Probably related #15454
Any other information, logs, or outputs that you want to share?
The text was updated successfully, but these errors were encountered: