New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go_tool_binary: set GOMAXPROCS to 1 #3536
go_tool_binary: set GOMAXPROCS to 1 #3536
Conversation
7646fe8
to
8de2570
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyler-french Could you test this at Uber? We should also include it in the patch release.
@sluongng Could you copy these lines over to the definitions of the failing pipeline runs? Bazel CI changed again a few hours ago. |
I would not mind rebasing on top of origin/master. I will leave the CI fixing to a separate PR? |
On darwin, in repositories with many transitions configured such as rules_go.git, the action GoToolchainBinaryBuild could be run multiple times in parallel by Bazel. Each of these actions would execute a command like this in a sandbox: ``` $GO build -o $OUTPUT -trimpath \ go/tools/builders/ar.go \ go/tools/builders/asm.go \ go/tools/builders/builder.go \ go/tools/builders/cgo2.go \ go/tools/builders/compilepkg.go \ go/tools/builders/cover.go \ go/tools/builders/edit.go \ go/tools/builders/embedcfg.go \ go/tools/builders/env.go \ go/tools/builders/filter.go \ go/tools/builders/filter_buildid.go \ go/tools/builders/flags.go \ go/tools/builders/generate_nogo_main.go \ go/tools/builders/generate_test_main.go \ go/tools/builders/importcfg.go \ go/tools/builders/link.go \ go/tools/builders/pack.go \ go/tools/builders/read.go \ go/tools/builders/replicate.go \ go/tools/builders/stdlib.go \ go/tools/builders/stdliblist.go \ go/tools/builders/path.go ``` Then, each `go build` process would execute multiple `go tools compile` command up to the number of max Go routines it is set to. The default max number of Go routines for each `go build` process is the number of core on the machine executing the action. On a local build, this means that we could have up to n x n number of goroutines running in parallel when combining the asynchronousity of both Bazel and `go build`. After upgrading to Go 1.20.x, we observed that such parallelism could get a local MacOS machine run into a deadlock on the OS-level, halting spawn of new processes such as `git config`, or new Chrome tab browser. Typical sampling of a `go build` process stuck in deadlock would result in the following call graphs: ``` 2430 Thread_44978 DispatchQueue_1: com.apple.main-thread (serial) + 2430 ??? (in <unknown binary>) [0x1358] + 2430 runtime.asmcgocall.abi0 (in go) + 124 [0x100b702ac] + 2430 runtime.syscall6.abi0 (in go) + 56 [0x100b71a98] + 2430 __wait4_nocancel (in libsystem_kernel.dylib) + 8 [0x1a17a04f4] 2430 Thread_44999 + 2430 runtime.asmcgocall.abi0 (in go) + 201 [0x100b702f9] + 2430 runtime.pthread_cond_timedwait_relative_np_trampoline.abi0 (in go) + 28 [0x100b717ec] + 2430 _pthread_cond_wait (in libsystem_pthread.dylib) + 1276 [0x1a17d45a0] + 2430 __psynch_cvwait (in libsystem_kernel.dylib) + 8 [0x1a1797710] 2430 Thread_45038 + 2430 runtime.kevent_trampoline.abi0 (in go) + 40 [0x100b71518] + 2430 kevent (in libsystem_kernel.dylib) + 8 [0x1a179a060] ``` We could not reproduce outside of Bazel (using bash & to parallelize `go build`). We also could not reproduce this issue with Go 1.19 releases. Let's apply a rudimentry workaround by limitting the concurrency of `go build` process by setting it's max Go routines to 1. This should make this and future race conditions happen a lot less on all platforms. By doing this, we rely more on Bazel and less on Go runtime to saturate our CPU resources. This change should not affect remote execution as the default ResourceSet for each action is set at 1 CPU and 250MB RAM.
8de2570
to
d4bce7f
Compare
Tested at Uber |
On darwin, in repositories with many transitions configured such as rules_go.git, the action GoToolchainBinaryBuild could be run multiple times in parallel by Bazel. Each of these actions would execute a command like this in a sandbox: ``` $GO build -o $OUTPUT -trimpath \ go/tools/builders/ar.go \ go/tools/builders/asm.go \ go/tools/builders/builder.go \ go/tools/builders/cgo2.go \ go/tools/builders/compilepkg.go \ go/tools/builders/cover.go \ go/tools/builders/edit.go \ go/tools/builders/embedcfg.go \ go/tools/builders/env.go \ go/tools/builders/filter.go \ go/tools/builders/filter_buildid.go \ go/tools/builders/flags.go \ go/tools/builders/generate_nogo_main.go \ go/tools/builders/generate_test_main.go \ go/tools/builders/importcfg.go \ go/tools/builders/link.go \ go/tools/builders/pack.go \ go/tools/builders/read.go \ go/tools/builders/replicate.go \ go/tools/builders/stdlib.go \ go/tools/builders/stdliblist.go \ go/tools/builders/path.go ``` Then, each `go build` process would execute multiple `go tools compile` command up to the number of max Go routines it is set to. The default max number of Go routines for each `go build` process is the number of core on the machine executing the action. On a local build, this means that we could have up to n x n number of goroutines running in parallel when combining the asynchronousity of both Bazel and `go build`. After upgrading to Go 1.20.x, we observed that such parallelism could get a local MacOS machine run into a deadlock on the OS-level, halting spawn of new processes such as `git config`, or new Chrome tab browser. Typical sampling of a `go build` process stuck in deadlock would result in the following call graphs: ``` 2430 Thread_44978 DispatchQueue_1: com.apple.main-thread (serial) + 2430 ??? (in <unknown binary>) [0x1358] + 2430 runtime.asmcgocall.abi0 (in go) + 124 [0x100b702ac] + 2430 runtime.syscall6.abi0 (in go) + 56 [0x100b71a98] + 2430 __wait4_nocancel (in libsystem_kernel.dylib) + 8 [0x1a17a04f4] 2430 Thread_44999 + 2430 runtime.asmcgocall.abi0 (in go) + 201 [0x100b702f9] + 2430 runtime.pthread_cond_timedwait_relative_np_trampoline.abi0 (in go) + 28 [0x100b717ec] + 2430 _pthread_cond_wait (in libsystem_pthread.dylib) + 1276 [0x1a17d45a0] + 2430 __psynch_cvwait (in libsystem_kernel.dylib) + 8 [0x1a1797710] 2430 Thread_45038 + 2430 runtime.kevent_trampoline.abi0 (in go) + 40 [0x100b71518] + 2430 kevent (in libsystem_kernel.dylib) + 8 [0x1a179a060] ``` We could not reproduce outside of Bazel (using bash & to parallelize `go build`). We also could not reproduce this issue with Go 1.19 releases. Let's apply a rudimentry workaround by limitting the concurrency of `go build` process by setting it's max Go routines to 1. This should make this and future race conditions happen a lot less on all platforms. By doing this, we rely more on Bazel and less on Go runtime to saturate our CPU resources. This change should not affect remote execution as the default ResourceSet for each action is set at 1 CPU and 250MB RAM.
On darwin, in repositories with many transitions configured such as rules_go.git, the action GoToolchainBinaryBuild could be run multiple times in parallel by Bazel. Each of these actions would execute a command like this in a sandbox: ``` $GO build -o $OUTPUT -trimpath \ go/tools/builders/ar.go \ go/tools/builders/asm.go \ go/tools/builders/builder.go \ go/tools/builders/cgo2.go \ go/tools/builders/compilepkg.go \ go/tools/builders/cover.go \ go/tools/builders/edit.go \ go/tools/builders/embedcfg.go \ go/tools/builders/env.go \ go/tools/builders/filter.go \ go/tools/builders/filter_buildid.go \ go/tools/builders/flags.go \ go/tools/builders/generate_nogo_main.go \ go/tools/builders/generate_test_main.go \ go/tools/builders/importcfg.go \ go/tools/builders/link.go \ go/tools/builders/pack.go \ go/tools/builders/read.go \ go/tools/builders/replicate.go \ go/tools/builders/stdlib.go \ go/tools/builders/stdliblist.go \ go/tools/builders/path.go ``` Then, each `go build` process would execute multiple `go tools compile` command up to the number of max Go routines it is set to. The default max number of Go routines for each `go build` process is the number of core on the machine executing the action. On a local build, this means that we could have up to n x n number of goroutines running in parallel when combining the asynchronousity of both Bazel and `go build`. After upgrading to Go 1.20.x, we observed that such parallelism could get a local MacOS machine run into a deadlock on the OS-level, halting spawn of new processes such as `git config`, or new Chrome tab browser. Typical sampling of a `go build` process stuck in deadlock would result in the following call graphs: ``` 2430 Thread_44978 DispatchQueue_1: com.apple.main-thread (serial) + 2430 ??? (in <unknown binary>) [0x1358] + 2430 runtime.asmcgocall.abi0 (in go) + 124 [0x100b702ac] + 2430 runtime.syscall6.abi0 (in go) + 56 [0x100b71a98] + 2430 __wait4_nocancel (in libsystem_kernel.dylib) + 8 [0x1a17a04f4] 2430 Thread_44999 + 2430 runtime.asmcgocall.abi0 (in go) + 201 [0x100b702f9] + 2430 runtime.pthread_cond_timedwait_relative_np_trampoline.abi0 (in go) + 28 [0x100b717ec] + 2430 _pthread_cond_wait (in libsystem_pthread.dylib) + 1276 [0x1a17d45a0] + 2430 __psynch_cvwait (in libsystem_kernel.dylib) + 8 [0x1a1797710] 2430 Thread_45038 + 2430 runtime.kevent_trampoline.abi0 (in go) + 40 [0x100b71518] + 2430 kevent (in libsystem_kernel.dylib) + 8 [0x1a179a060] ``` We could not reproduce outside of Bazel (using bash & to parallelize `go build`). We also could not reproduce this issue with Go 1.19 releases. Let's apply a rudimentry workaround by limitting the concurrency of `go build` process by setting it's max Go routines to 1. This should make this and future race conditions happen a lot less on all platforms. By doing this, we rely more on Bazel and less on Go runtime to saturate our CPU resources. This change should not affect remote execution as the default ResourceSet for each action is set at 1 CPU and 250MB RAM.
@sluongng Are you still getting multiple builds on bazel 7? I have seen that it reduced number of builder builds from 2 to 1 for my toy example. If I understand correctly in theory only single builder should be needed as it does not need to be compiled against target configuration, but should be in exec configuration itself - i.e. single builder should be able to build for pure, race, target platform, etc. But not sure if resetting config is enough here, maybe @fmeum has more insights including why bazel 7 helped in some cases. |
Bazel 7 should reduce the number of |
What type of PR is this?
Bug fix
What does this PR do? Why is it needed?
On darwin, in repositories with many transitions configured such as
rules_go.git, the action GoToolchainBinaryBuild could be run multiple
times in parallel by Bazel.
Each of these actions would execute a command like this in a sandbox:
Then, each
go build
process would execute multiplego tools compile
command up to the number of max Go routines it is set to. The default
max number of Go routines for each
go build
process is the number ofcore on the machine executing the action.
On a local build, this means that we could have up to n x n number of
goroutines running in parallel when combining the asynchronousity of
both Bazel and
go build
.After upgrading to Go 1.20.x, we observed that such parallelism could
get a local MacOS machine run into a deadlock on the OS-level, halting
spawn of new processes such as
git config
, or new Chrome tab browser.Typical sampling of a
go build
process stuck in deadlock would resultin the following call graphs:
We could not reproduce outside of Bazel (using bash & to parallelize
go build
). We also could not reproduce this issue with Go 1.19 releases.Let's apply a rudimentry workaround by limitting the concurrency of
go build
process by setting it's max Go routines to 1. This should makethis and future race conditions happen a lot less on all platforms.
By doing this, we rely more on Bazel and less on Go runtime to saturate
our CPU resources.
This change should not affect remote execution as the default
ResourceSet for each action is set at 1 CPU and 250MB RAM.
Which issues(s) does this PR fix?
Related to golang/go#59657
Other notes for review