-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sandbox slowness on OSX #8230
Comments
Note that on linux I also see slowness for the sandbox, but much better than mac. It takes ~15s for linux-sandbox, and ~3s for strategy=SassCompiler=local on my Lenovo P920. |
cc @jmmv |
discussed with @dslomov this morning, I think we should disable sandbox on mac by default. It has dubious benefits for keeping your build hermetic while causing lots of broken (non-hermetic) and slow builds |
That's a bold strategy. Because without sandboxing, all actions will run in the same, shared If you think that this doesn't matter for your particular use case or if you have other mitigations for this, then feel free to go ahead with the plan, but in general this is not recommended. If your sass build steps are that slow, it can only mean that they have a huge number of input files. Is that the case?
Please give concrete examples of broken builds due to sandboxing. If builds break due to sandboxing, it's almost always because the rules or the BUILD files don't declare all dependencies. |
I have also noticed that sandboxing is generally slower on macOS compared to linux. Does anyone actually have any explanation of why that is? Is it due to some macOS reasons, or is the sandbox on macOS not as optimized? Also @fenghaolw have you tested the same build with https://github.com/bazelbuild/sandboxfs? Apparently there is a chance it can make sandboxed builds on macOS quite a bit faster. Would be curios to see some comparisons with that. |
We tested sandboxfs with our iOS build without much luck bazelbuild/sandboxfs#76 |
There are two problems regarding sandboxing slowness:
There is this other idea floating around of changing the way sandboxing works: instead of preventing operations, we just log them all and then verify, after the action completes, that whatever the action accessed was allowed. This would avoid the symlinks completely and also bypass sandbox-exec. I think this is how Windows sandboxing will work. But... I don't think there is a way to do this kind of tracing on macOS out of the box that doesn't require either root or disabling SIP. It's worth investigating though. |
@keith Yeah I saw that, very interesting. I have just done some quick tests on my 6 year old macbook Pro and for a tiny JS build at least (with already a fair amount of inputs due to npm modules) my build time halved from roughly 6s to 3s. Without sandbox it is 2s. That does look pretty good to me but I do wonder for what type of builds and what type of setup/configuration that differs. Edit: Another slightly larger JS build of only 1 action went down from |
Some tests I ran with this reproduction:
And corresponding observations:
|
@jmmv I have also seen improvements with the async delete and I reported the crash here: #7527 (comment) |
Thanks. I have a fix for that crash when enabling asynchronous deletions but I'm still having trouble with one test... plus in benchmarking the change, I'm consistently getting worse results in large builds. Need to look into that too. Separately, I've been looking into the old claim that
where Therefore I conclude that the major cost of sandboxing in macOS continues to be the symlinks tree and handling files. |
@alexeagle Since we already have a P1 bug for Mac Sandbox performance issue, I think we can continue our previous discussion here. The current problem is sandbox on Mac is very slow, especially for Angular project, where there are a large number of input files. So we wonder that's the way forward. @jmmv Is there any current work on improving the Mac sandbox performance? From the above discussion, sandbox might be a way, how big could the improvement be? @alexeagle Did you every try it before? If it's hard to improve the performance, should we disable sandbox on Mac? Disabling sandbox could make the build less hermetic, but @alexeagle also mentioned it was causing some broken build that may not be the fault of the build rules? I'm also wondering how this would affect remote execution. Some issues may only be spotted when remote execution is enabled if we didn't have sandbox in advance. If we should not disable sandbox on mac, we should provide a nicer way to disable sandbox for a specific platform. That means, we only want to apply flags like
Then Bazel loads different flags according to the platform it's running on and users never have to pass |
Irregardless of the performance improvements of the sandbox I have had the need before for platform specific flags and it would indeed be very nice if it could just be defined centrally and the user would not have to pass in a |
I agree, #5055 is for tracking this feature request. It has been silent for a while, I'll follow up on this. |
For what it's worth, Swift users will have issues debugging with sandboxing enabled (see bazelbuild/tulsi#15) |
This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See bazelbuild/bazel#8230 (comment) Also fix the default npm publish tag which cost me time in tonights release
This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See bazelbuild/bazel#8230 (comment) Also fix the default npm publish tag which cost me time in tonights release
This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See bazelbuild/bazel#8230 (comment)
Could the slowness be due to notarization? https://sigpipe.macromates.com/2020/macos-catalina-slow-by-design/ Seems like since notarization was introduced, all scripts that run for the first time are firing a network request to apple. Since bazel uses sandbox, potentially all scripts there are considered new by the system. I could not check this theory though |
It might be worth parallelizing some of these things, especially once Loom makes it cheaper to use threads. Having a small threadpool for creating symlinks ought to help. But fundamentally, OS X is just much slower than Linux at handling lots of creation and deletion of small files. |
@tony-scio is your overhead of 2x with the flag |
@tony-scio in most cases |
@oquenchil I ran into this issue as well, posting in here to continue the discussion, but let me know if you'd prefer a separate issue. We are using On my (M1 mac) machine, the sh_test spends around 6-7 seconds in RepoMappingManifestAction (@fmeum fixed this already), and 6 seconds in sandbox.createFileSystem. Running the same test in a docker image on the same machine shows only 1 second in sandbox.createFileSystem. I have reuse_sandbox_directories enabled, but running the test several times in a row shows the same timing each time. Even stranger; I tried with |
Hi David, Just to make sure the slowness is coming from sandboxing, if you run that same target with Does your repro build with bazel at head? If not, what's the latest version that it works with? Thanks for the repro, I will take a look in the following days! |
Yes, I can repro with
Hi @oquenchil! I pushed up some more "minimal" examples that don't use the JS rules and manually create TreeArtifacts with a simple rule. Here's what I see when running OSX with Ubuntu docker image with linux-sandbox on the same machine (this is on 7.0.0-pre.20231011.2 not last_green because there's no arm64 binaries): For completeness, ubuntu docker image with It's interesting that the Linux sandboxed version is even faster than the unsandboxed OSX |
The Linux file system is much more optimized for the kinds of operations that happen in builds. |
@DavidZbarsky-at Hi David, I see there are |
@coeuvre yes there was a remote cache (though this multisecond gap is maybe another performance issue?). I see similar timing without it on the sandboxed tests, though it appears to fix the unattributed time in the |
@DavidZbarsky-at Thanks for confirming. Yes, it's another performance issue. Created #20555 to track. |
There is another performance issue: #20584 I will take care of this one. After it's checked in I will profile again and see what else we can do. |
@DavidZbarsky-at there are several things we can do to improve performance. I wanted to ask you a couple of questions though:
I will be able to take a look again on the 2nd of January but any additional information you can provide would be very helpful. Thanks! |
@oquenchil Awesome, that sandbox reuse one seems like it will be a big help for us. Thanks! Re: your questions:
Happy to poke around and provide whatever else would be helpful, just let me know! |
Reducing unnecessary dependencies is always good. But here's a thing I've been wanting to try but not been able to: With Loom virtual threads, could we do more parallelism in creating/destroying the sandbox? |
We are unfortunately not there yet with using JDK21 features (such as Loom) but @oquenchil looked at speeding up sandbox creation by a good plain old executor service yesterday (and that triggered his question above). |
I came across this doc/idea, not sure if it's feasible but it seems like it would solve this case (folders with hundreds + or thousands of files) quite nicely |
I had been considering |
The idea as I originally described it on that document doesn't quite work because when you symlink directories, the path That leaves us with macOS where we are at the point of simply optimizing the existing strategies instead of coming up with a new one entirely. As meisterT pointed out, me adding multi-threading is what prompted those questions. What I found was that with multi-threading the bottleneck is in the directory lock held by the kernel when writing a file. If most files are concentrated in a small handful of directories then multi-threading won't help much but looking at the raw data you provided it seems that's not the case, so it should cut down the time a bit. Because of where I think the bottleneck is I don't think it will matter whether we use Loom. For cleaning up an existing stash, I tried moving a tree out of the way then deleting asynchronously. This also gives a significant speed bump. I will check in those two optimizations then we can see where we are at. Without symlinking directories or bind mounting, we might hit a limit in how much more we can do on macOS for cold builds but we aren't there yet. |
Python builds also suffer from the performance hit due to so many files needing to be materialized. The basic issue is every Python program needs a runtime, and a Python runtime installation is a lot of files (2000 to 4000 files). This IO costs adds up. I think it was Tensorflow that contacted me asking for advice because their CI builds went from 45 minutes using the system python to 2+ hours using the downloaded runtimes, and all evidence pointed to all the extra IO work creating thousands of symlinks for every test. There's similar reports to rules_python by other users, too, so its not just them. While we think we can partially avoid this by zipping up most of a runtime installation (we can't zip up all of it), that feels like a lateral move overall. There is a still an O(2000 - 4000) cost paid to create the zip file and the cost of copying a decent sized file around. That's still a lot more cost than creating a single symlink to a directory. Zipping stuff correctly can also be somewhat subtle (e.g., making sure you're avoiding nondeterminism bugs). Relatedly, for Bazel CI and Python, it currently gets to avoid this issue because it's using the system runtime. However, my 2nd or 3rd todo list item right now is to upgrade Bazel CI's rules_python version, which will make it use the downloaded runtimes; i.e. every python build action or python test it runs will incur a separate 2,000+ file copy. Relatedly, pypi Python dependencies also suffer from this problem, as they work mostly the same as the runtime. A repo rule downloads a zip file and extracts it. The program needs those files at runtime, so they all get copied over into runfiles. But all it really needs is a symlink'd directory pointing to the files. The footprint of third party dependencies is easily the size of the runtime itself or orders of magnitude larger. As an example, I have a program with 5 direct dependencies, and it turns into about 15,0000 runtime files (technically ~7k, but they get duped because of the legacy-externals-runfiles behavior). In comparison, if directories were symlinked, it would be about 32 symlinks. Unfortunately, trying to zip those up isn't as easy, straight forward, or feasible. |
That actually suggests an API for building an entire tree of symlinks. Then for MacOS, the implementation could spread the writing across multiple directories to get around the directory lock bottleneck. |
I talked about symlinking directories a while ago here. I sent that to bazel-discuss. The reason why symlinking directories doesn't work is because as soon as you have There are alternatives using bind mounts on Linux which we are currently looking at but that won't solve the problem for macOS which is what this issue is about. Regarding the directory lock bottleneck, that wouldn't solve it since each directory would still contains as many files as before. In any case it doesn't look like in practice it's as much of a bottleneck at least for JS since files are not grouped in a single directory. Not sure in the case of Python.
For the reason I mention above we wouldn't want to indiscriminately use the strategy of symlinking directories. However, if the rule author could guarantee that there would never be a Richard, please tell me how you envision such an API as passed from the rule or toolchain implementation and I can write a prototype. |
This hasn't been linked yet, and is probably worth a read for folk designing in this IO intensive space: https://gregoryszorc.com/blog/2018/10/29/global-kernel-locks-in-apfs/ Also, it might be worth testing concurrent symlink creation from separate processes, based on some of the notes in this |
Description of the problem / feature request:
building has been extremely slow with the default darwin-sandbox
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
A mini repro could be found in https://github.com/alexeagle/rules_sass_repro
The repro contains 40 empty sass files
Running the sass compiler on them should be fast
bazel build :all takes ~60s on my mac
bazel build --strategy=SassCompiler=local :all takes ~4s
What operating system are you running Bazel on?
Mac OS 10.14.4
What's the output of
bazel info release
?release 0.25.0
Have you found anything relevant by searching the web?
I found these issues: #902 and #1836 but they all seem obsolete.
JSON profile
According to https://docs.bazel.build/versions/master/skylark/performance.html#json-profile, I grabbed profiles for different strategies:
profiles.zip
The text was updated successfully, but these errors were encountered: