-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark "Remote Output Service..." as approved. #311
Conversation
IIRC, we never had an agreement on how Bazel should interact with FUSE and since we are going to enable BwoB by default for 7.0, maybe this can be dropped. But I would like to hear from Ed. |
Yep, I wouldn't merge this before I hear from @EdSchouten (unless I don't hear from him for a few days, for the show must go on!) |
When I initially implemented bb_clientd and the accompanying patchset for Bazel, getting this all to work was my highest priority project. I had a full quarter to make it work. Unfortunately, that quarter is long finished. Nowadays my time is limited to just addressing any of the lingering issues we have with it (e.g., bazelbuild/bazel#17968), as in addition to bb_clientd I also need to:
What also doesn't help is that I only work on Java projects rarely, so the amount of guidance I'd need to make it upstreamable is large. This means that without more active participation from someone on the Bazel team, the window of opportunity has sort of slipped. This is a bit of a shame, as bb_clientd works great at this point. All of our builds use it. It literally saves us a terabyte of disk space on each of our CI workers. This feature also no longer really aligns with how the Bazel team envisions things, meaning the probability of getting this landed is even smaller. The focus has shifted towards perfecting Builds without the Bytes. Though typical users may find this sufficient, I'm pretty sure that there is a silent minority of large/intensive users of Bazel that tend to disagree. Needing to constantly switch between https://github.com/aspect-build/bazel/commits/6.0.0_aspect One big limitation of Bazel, even with Remote Output Service, is that only bazel-out/ is virtualized. Most of the time/bandwidth in our builds is nowadays wasted on dealing with externals (e.g., needlessly fetching SDKs onto the client that are only used remotely, SHA summing full copies of SDKs every time Bazel restarts). The OutputService API doesn't help with that, nor does the Remote Asset API. So in that sense the Remote Output Service proposal only addresses Bazel's performance bottlenecks partially. I really hope the Bazel team is going to provide a vision at some point on how they attempt to tackle this in the grand scheme of things. The question is where this specific proposal should go from here. As mentioned previously, I currently don't have the time to drag this across the finish line. If anyone else wants to take this over, I'd fully support it. If not, then there is only one logical solution: drop it, as @lberki suggested. |
At the very least, the involvement of aspect.dev and the activity on #12823 indicates there is significant need in the community for functionality like this, so I don't want to summarily pull the plug. I have a few questions to determine what should happen:
The reason why I'm asking these is that given the demand and the limitations of BwoB, I'm amenable to merging this, but I'm also worried about potential interactions with the rest of Bazel (e.g. BwoB, tree artifacts, whether this works on Linux/Mac OS/Windows) and the safe thing to do is not to merge it. |
Hey @lberki,
In my opinion we should eventually aim to get to a point where all communication is purely done through gRPC. The reason being that RPCs make it easy to do batching of requests, which cuts down on the number of context switches between Bazel and bb_clientd. gRPC also has richer error semantics than POSIX, of course. It would significantly reduce the pressure on the kernel's inode/directory cache. Right now it's basically a hybrid model, where Bazel uses regular POSIX file operations for the things it can do efficiently (e.g., We also have bazelbuild/bazel#14878, which moves the creation of runfiles directories from POSIX file system calls to gRPC.
Even though the protocol does provide some flexibility here, bb_clientd currently creates a single mount on startup that can be used by n copies of Bazel running on the same system. Bazel is thus not responsible for creating/attaching the mount. This model is pretty flexible, as it allows you to do things like:
I get that. To comment specifically on the topics you raised:
|
Ack. I looked at the change at it's not too invasive at the first glance, although it would be strange to check in code that accesses an interface which doesn't have any implementation in the Bazel source tree. Who should I contact from aspect.dev to ask them if they would be willing to contribute the labor of merging this patch? |
I'm the original author of the Aspect CLI. I'm actively looking into it. I'll post back as soon as I have an answer. |
@EdSchouten and others: one thing I am concerned about is ongoing maintenance. If we add this functionality to Bazel, people will come to us and report bugs or ask for features in relation to it and we currently don't have the bandwidth or use-case to staff it. What is the level of commitment from your side to help out with maintaining the functionality? |
Since this proposal was created, a few things have changed. The proposal lists these three downsides of BwoB:
This should be reduced significantly by bazelbuild/bazel#17120
This should have been addressed by bazelbuild/bazel#13604
This is not fully addressed, however we merged bazelbuild/bazel#15638 a while back. I would argue that in most cases users don't need access to additional artifacts after the build. In theory, we could either add another command to bazel that allows downloading intermediate artifacts or change the behavior of In summary, I am wondering how the additional value of this is, also given that not all remote execution service providers offer this functionality. |
The "remote output service" in google3 is ObjFS, right? If I understand correctly this issue is about how other large companies need the same thing. I do see clients who have this need, in addition to Ed's employer. Funding (a.k.a. "commit to maintain") has come up here. It feels to me that the reason we're lacking funding for this project is that the implementations diverged. It's hard to convince Google managers that you should spend time maintaining a thing Google doesn't need, but Google's solution for this problem isn't available to others.
|
Some of our developer builds are indeed on BwoB and not using ObjFs.
What drives this need mostly? |
The scenario we run into is exactly what @EdSchouten wrote:
At the time you run a build, you may not know yet which files from A specific case of wanting to read arbitrarily from |
Right, my suggestion above was to improve Bazel's UX to give users a way to download the files they need instead of everything. Improving this would help everyone instead of just users of remote execution services that support fuse.
So, you are requesting a way to access the metadata that Bazel has in RAM, right? To make it explicit, I'm undecided what to do with this proposal and want to make sure we have discussed potential alternative ways to achieve the same. |
Hi there! Let me try to respond to all of your messages in one go.
We are strongly dependent on having this feature working, as we are using it intensively. I therefore think it's also in our interest ensure it keeps working reliably. So with regards to bugs we can surely help out. With regards to adding new features to it, I can't make any commitments to that, as it would also depend on whether those features are realistic and to a certain extent align with our use case.
No, this is different as far as I know. The ActionInputMap is specific to BwoB bookkeeping, whereas the PR you linked was about reducing memory usage while computing Merkle trees of actions when doing cache lookups.
The point is that users don't always know what they need. For example, in our case it's not uncommon for people to launch GDB/LLDB sessions against some kind of executable that they built. How would a user know which files under bazel-bin/ need to be present to do that?
Using a FUSE file system keeps those kinds of use cases working as they otherwise would, without requiring that the user has to jump through additional hoops. |
Thanks for your reply, @EdSchouten!
I think this is the strongest argument in favor of this proposal. Do you have data on how often users actually access intermediate files? I am interested to learn about more scenarios from you and others to see whether we can improve the UX for BwoB - which is a useful thing to do even if it might not solve all the pain points listed in your proposal. My understanding is that your client daemon is Linux only, right? Do you plan to extend it to other platforms? I would also like to hear from other remote execution service owners (cc @bazelbuild/remote-execution and @brentleyjones, also cc @jmmv who commented on the PR) whether they would implement a similar daemon once this proposal is implemented - and whether they would like to see any changes to the proposal. |
This partially falls under "Work seamlessly with IDEs" in bazelbuild/bazel#6862. The new regex stuff helps, for sure, but it can be a pain to get right (I've fixed many bugs in rules_xcodeproj around this), and the requirements can change over time making this hard to keep right. Also, if a user doesn't need those files every session, they are unnecessarily downloading them, when instead it would be better to download them on demand. Also, if someone isn't using an IDE, or tooling that gets all of those flags right, then they are currently out of luck.
When I was testing |
The use case that I sketched is used by us on a near daily basis.
It also works on macOS, if you make sure you get the stars to align. bb_clientd supports both FUSE and NFSv4. This means that on macOS, you can either use it in combination with macFUSE or with the NFS client that's part of the kernel. Both approaches have their shortcomings, though:
Note that bb_clientd is part of the Buildbarn project, but not coupled against it. The goal is that you can also use it in combination with any other remote execution service. |
Can you elaborate more on this? I don't think ActionInputMap is specific to BwoB, it's used for all the mode. Memory wise, I believe BwoB should be the same comparing to other modes. |
We felt the need to give the feedback that this feature is significant to Qualcomm for much the same reasons as have been brought up by others and we would much prefer it be a mainline feature rather than maintained in several unofficial forks. The feature especially shines when building large monorepos with a wide net of targets (i.e. In such situations remote_download_all is unfeasible for obvious reasons and listing all inputs possibly required by latter steps in the pipeline/workflow is also unrealistic. We also feel the need to clarify that while bb-clientd is a good implementation of something that consumes the remote output service apis and most users of bb-clientd are likely to be using buildbarn. The actual pull request is independent of bb-clientd and bb-clientd itself is independent of buildbarn. I.e. any REAPI compatible environment would benefit from the improved traffic flow from using bb-clientd and the remote output service can be consumed by things other than bb-clientd. It could for example be used to integrate with some other virtual filesystem. Without the information communicated by the remote output service this information becomes lost inside of bazel. |
From this thread, it looks like that there is a lot of demand for the output tree to be on a FUSE file system of some sort. With that in mind, I'm fine with marking this design doc as approved and merging bazelbuild/bazel#12823 as long as @EdSchouten (or someone else, but he is the most obvious candidate) can commit to bringing it up to the standards of the code base of Bazel and putting it behind an experimental flag. WDYT? |
Friendly ping! I'd like to decide one way or the other; to reiterate: I'm fine with merging #12823 as long as someone can commit to rebasing it and bringing it up to our quality standards. |
My take: It seems like we need a new sponsor for the proposal to move forward. Let's put a deadline (1 month?) on this proposal to find a new sponsor to (a) rebase + fixup the PR and (b) help maintain it. I could ask during https://github.com/bazelbuild/remote-apis/ meeting to see if any of the BuildStream/BuildBox maintainers are willing to pick it up. There has been some similar interest expressed in https://groups.google.com/g/remote-execution-apis/c/qOSWWwBLPzo and a similar proposal https://docs.google.com/document/d/1SYialcjncU-hEWMoaxvEoNHZezUf5jG9DR-i67ZhWVw/edit#, which could take the proto API design down a slightly different route. I don't think dropping the proposal needs to be a "final" state. A new sponsor could put forward a smaller proposal referencing the dropped proposal to continue the work. It's not like the code in the PR is getting deleted. |
Yeah, I don't think dropping a proposal needs to mark the "final death" of said proposal, but it's a pretty strong signal. I wanted to keep this open for a while because @EdSchouten has invested a lot of energy into it and IIUC they maintain a fork of Bazel with that patch anyway, so I thought it would be a net win for them to not have to maintain their patch at the cost of some upfront investment. |
I'll keep this open for a while (say, a month, assuming I actually manage to remember the deadline) to signal that we are looking for volunteers and if there are none, I'll mark it as "dropped". |
The associated pull request bazelbuild/bazel#12823 still seems to see some activity, but Ed and Chi are not responsive, so I suppose this is the right thing to do.
d028031
to
864aaaa
Compare
At the last moment, @stagnation and @Gormo stepped up. So I'll update this pull request to say "move to approved" and will ask @coeuvre to review it since he's the most likely candidate for the eventual code review. |
The associated pull request
bazelbuild/bazel#12823 still seems to see some activity, but Ed and Chi are not responsive, so I suppose this is the right thing to do.