Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define thread QoS priorities on macOS #7446

Open
jmmv opened this Issue Feb 15, 2019 · 2 comments

Comments

Projects
None yet
1 participant
@jmmv
Copy link
Contributor

jmmv commented Feb 15, 2019

macOS has a concept of thread QoS service classes. Bazel at the moment uses the default class when run from the command line. The documentation strongly recommends defining QoS classes for proper operation instead of relying on the default one. See https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html

We have found that the use of the default class causes two problems: first, Bazel can make the system unresponsive; and, second, Bazel can become very slow if it relies on system services that run at a lower service class (e.g. think of FUSE file systems).

I have been experimenting with this by explicitly declaring Bazel to run at the Utility level and this makes both problems go away. I think we should do this change, but then account for the many threads we run. In particular, it'd be nice if we defaulted to Utility but made the UI thread and the Bazel client higher priority to ensure snappier console output. (This latter detail is nice, but not a requirement in my opinion.) Of course, we must measure if this causes a performance regression.

I also tried setting Bazel to Background level. This made the system even more snappy but also prevented Bazel from fully utilizing all CPUs.

One last thing to note is that changing the QoS class of a program after it has started, even if it has just one thread, does not modify the class for the main thread. This means that any further spawned threads or subprocesses do not respect the class change and instead use whatever was set at the main thread level. The only way I've found so far of changing the class of the main level is by using posix_spawnattr_set_qos_class_np, which means we have to exec a process using posix_spawn. This makes the change slightly more complicated, but given that the Bazel client is already spawning the server, we can probably fit this in that code path.

@jmmv jmmv self-assigned this Feb 15, 2019

bazel-io pushed a commit that referenced this issue Feb 21, 2019

Clean up args conversion to an argv array.
This introduces a new CharPP class that implements the conversion of
C++ data structures to their corresponding char** representation
needed for execv-like calls.

As a side-effect of this change, we fix two issues: first, a memory
leak; and second, a theoretical const-correctness violation.

I am doing this change to provide room for implementing a similar
conversion function for envp arrays, which I will need to switch to
using posix_spawn.

Prerequisite cleanup to address
#7446.  While this does
fix an issue with a memory leak, this change should have no visible
effects.

RELNOTES: None.
PiperOrigin-RevId: 234968622

bazel-io pushed a commit that referenced this issue Feb 21, 2019

Make PrepareEnvironmentForJvm() explicitly inherit environment variab…
…les.

The map returned by PrepareEnvironmentForJvm() previously represented a
"delta" to apply to the current environment.  This was used to modify the
environment surrounding fork/execv calls because we could not pass an envp
to execv.

However, when using the resulting map to create the envp value of a call
like posix_spawn, we must be exhaustive and list all variables we expect
the subprocess to have.  And as we are going to be using posix_spawn, we
need to make this happen, so change PrepareEnvironmentForJvm() to make
the returned map exhaustive.

Prerequisite change to address
#7446 because of the need to
switch to posix_spawn.

RELNOTES: None.
PiperOrigin-RevId: 234971246

bazel-io pushed a commit that referenced this issue Feb 21, 2019

Use posix_spawn to start the Bazel server.
Introduce a new "daemonize" helper tool that takes the heavy-lifting
of starting the Bazel server process as a daemon.  This tool is pure
C and does not have any big dependencies like gRPC so we can ensure
it's thread-free.  As a result, the code in this tool doesn't have
to take a lot of care when running fork(), which results in easier
to read code overall.

Then, change the Bazel client to simply use posix_spawn to invoke
this helper tool, which in turn runs the Bazel server as a daemon.
The code in the Bazel client becomes simpler as well as we don't
have to worry about the implications of using fork() from a threaded
process.

The switch to posix_spawn is necessary to address
#7446 because the only way
to customize the QoS of a process in macOS is by using this API.
Such a change will come separately.

This change is intended to be a no-op.  However, this fixes a bug
introduced in unknown commit by which we violated the requirement of
not doing any memory management from a child process started with
fork().  (I don't think this bug ever manifested itself, but it was
there -- which is funny because this piece of code went to great
extents to not do the wrong thing... and a one-line change let
hell break loose.)

RELNOTES: None.
PiperOrigin-RevId: 234991943

@bazel-io bazel-io closed this in 0877340 Feb 21, 2019

@jmmv

This comment has been minimized.

Copy link
Contributor Author

jmmv commented Feb 22, 2019

I think we can consider this done for now. There are a couple of things that could be looked into to make this better, but I don't think they are critical:

  • The changes so far only modify the QoS of the server when started in the background, but have no effect on batch mode, the client, or whatever is run via bazel run.

  • Set QoS levels on a thread basis within the Bazel server. This would likely require some JNI glue and an audit of what each threadpool does and what it deserves. Or maybe just make sure that whatever thread is handling the UI is of a higher priority and leave everything else happen at the utility level.

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Clean up args conversion to an argv array.
This introduces a new CharPP class that implements the conversion of
C++ data structures to their corresponding char** representation
needed for execv-like calls.

As a side-effect of this change, we fix two issues: first, a memory
leak; and second, a theoretical const-correctness violation.

I am doing this change to provide room for implementing a similar
conversion function for envp arrays, which I will need to switch to
using posix_spawn.

Prerequisite cleanup to address
bazelbuild#7446.  While this does
fix an issue with a memory leak, this change should have no visible
effects.

RELNOTES: None.
PiperOrigin-RevId: 234968622

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Make PrepareEnvironmentForJvm() explicitly inherit environment variab…
…les.

The map returned by PrepareEnvironmentForJvm() previously represented a
"delta" to apply to the current environment.  This was used to modify the
environment surrounding fork/execv calls because we could not pass an envp
to execv.

However, when using the resulting map to create the envp value of a call
like posix_spawn, we must be exhaustive and list all variables we expect
the subprocess to have.  And as we are going to be using posix_spawn, we
need to make this happen, so change PrepareEnvironmentForJvm() to make
the returned map exhaustive.

Prerequisite change to address
bazelbuild#7446 because of the need to
switch to posix_spawn.

RELNOTES: None.
PiperOrigin-RevId: 234971246

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Use posix_spawn to start the Bazel server.
Introduce a new "daemonize" helper tool that takes the heavy-lifting
of starting the Bazel server process as a daemon.  This tool is pure
C and does not have any big dependencies like gRPC so we can ensure
it's thread-free.  As a result, the code in this tool doesn't have
to take a lot of care when running fork(), which results in easier
to read code overall.

Then, change the Bazel client to simply use posix_spawn to invoke
this helper tool, which in turn runs the Bazel server as a daemon.
The code in the Bazel client becomes simpler as well as we don't
have to worry about the implications of using fork() from a threaded
process.

The switch to posix_spawn is necessary to address
bazelbuild#7446 because the only way
to customize the QoS of a process in macOS is by using this API.
Such a change will come separately.

This change is intended to be a no-op.  However, this fixes a bug
introduced in unknown commit by which we violated the requirement of
not doing any memory management from a child process started with
fork().  (I don't think this bug ever manifested itself, but it was
there -- which is funny because this piece of code went to great
extents to not do the wrong thing... and a one-line change let
hell break loose.)

RELNOTES: None.
PiperOrigin-RevId: 234991943

irengrig added a commit to irengrig/bazel that referenced this issue Feb 26, 2019

Lower Bazel's QoS service class to Utility on macOS.
Even though Bazel is an interactive tool, the operations it performs
are not time-critical and should not starve system services (started
by launchd, typically under the Utility class).  To mitigate this
problem, spawn the Bazel server under the Utility priority as well.

In internal tests at Google, we have seen that this vastly improves
build performance and overall system responsiveness when Bazel might
compete with system services such as openvpn or FUSE daemons.

Note that this is not a complete fix though.  We are still letting
the Bazel client run under the default class and the client still
uses execv in a couple of places.  First, in batch mode, and, second,
in "bazel run".  A possible fix for this would involve making the
Bazel client respawn itself under the right priority at the very
beginning but this has its own difficulties that should be looked
into separately.

Fixes bazelbuild#7446 for the most
part.

RELNOTES: None.
PiperOrigin-RevId: 235054167

@jmmv jmmv reopened this Mar 13, 2019

@jmmv

This comment has been minimized.

Copy link
Contributor Author

jmmv commented Mar 13, 2019

I had to roll back the default QoS change. I'm now adding support for this feature behind a flag and will evaluate behavior on different classes (Utility vs. User-initiated).

bazel-io pushed a commit that referenced this issue Mar 14, 2019

Allow setting the Blaze server's QoS class via a --macos_qos_class flag.
This adds a new flag to set the QoS class of the Bazel server and accepts
selecting any of the possible classes exposed by the system.  We will use
this flag to evaluate the behavior of Bazel under different classes because
it is not clear which one we should be using.  (We tried forcing the class
to be utility, but that caused regressions in some cases.)

This is essentially a retry of 0877340 but gated by a flag (which
is how it should have been done in the first place).

Note that the flag exists on all platforms to ensure that a bazelrc file
shared across different people works in all cases, but this flag is ignored
in non-macOS systems.

Addresses #7446.

RELNOTES: None.
PiperOrigin-RevId: 238440116
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.