Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

Conversation

nvanbenschoten
Copy link
Member

This commit optimizes the Stopper for task creation by ripping out the
existing heavyweight task tracking in production builds. I realized that
my biggest concern with most of the proposals (#52843 and #51566) being
floated to address #51544 was that they bought more into the inefficient
tracking in the Stopper, not that they were doing anything inherently
wrong themselves.

Before this change, creating a task acquired an exclusive mutex and then
wrote to a hashmap. At high levels of concurrency, this would have
become a performance chokepoint. After this change, the cost of
launching a Task is three atomic increments – one to acquire a read
lock, one to register with a WaitGroup, and one to release the read
lock. When no one is draining the Stopper, these are all wait-free
operations, which means that task creation becomes wait-free.

With a change like this, I would feel much more comfortable pushing on
Stopper tasks to solve #51544.


This isn't a real PR. I'm just throwing it up to see how people feel about it.

cc. @tbg @andreimatei

…eation

This commit optimizes the Stopper for task creation by ripping out the
existing heavyweight task tracking in production builds. I realized that
my biggest concern with most of the proposals (cockroachdb#52843 and cockroachdb#51566) being
floated to address cockroachdb#51544 was that they bought more into the inefficient
tracking in the Stopper, not that they were doing anything inherently
wrong themselves.

Before this change, creating a task acquired an exclusive mutex and then
wrote to a hashmap. At high levels of concurrency, this would have
become a performance chokepoint. After this change, the cost of
launching a Task is three atomic increments – one to acquire a read
lock, one to register with a WaitGroup, and one to release the read
lock. When no one is draining the Stopper, these are all wait-free
operations, which means that task creation becomes wait-free.

With a change like this, I would feel much more comfortable pushing on
Stopper tasks to solve cockroachdb#51544.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about doing this, but at the same time I have dreams of more task registration. Namely I want an /rpcz page that lists all ongoing RPCs. So I guess I first want to see how much inserting into a map really hurts us. Currently that map in the stopper is just a regular map behind a mutex, so I 'd also want to see what whether sync.Map helps.
I was planning on experimenting myself.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@nvanbenschoten
Copy link
Member Author

Namely I want an /rpcz page that lists all ongoing RPCs.

Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people.

My contention here is that the task tracking in the Stopper is already too heavyweight for general usage. It does too much and it doesn't do enough well. It sounds like we all want to be using Stopper tasks more, not less, which I agree with. But we aren't going to get there if we add more RPC-specific bloat to what should be a very simple draining mechanism.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people.

That looks cool, but I can't tell if it tells you what RPCs are currently running on a server.

As far as this particular PR is concerned, it looks good to me and I wouldn't even bother with the "debug" implementation. When a test times out, we'll get stack traces for the tasks, which I think should be enough.
But I do want to reserve the right to re-introduce a syncmap update per rpc start/finish in the future if no benchmark regresses too much.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

tbg added a commit to tbg/cockroach that referenced this pull request Feb 1, 2021
We are likely going to invest more in the stopper-conferred
observability in the near future as part of initiatives such as cockroachdb#58164,
but the task tracking that has been a part of the stopper since near
its conception has not proven to be useful in practice, while at the
same time raising concern about stopper use in hot paths.

When shutting down a running server, we don't particularly care about
leaking goroutines (as the process will end anyway). In tests, we want
to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to
look at the stacks to find out why than to consult the task map.

Together, this left little reason to do anything more complicated than
what's left after this commit: we keep track of the running number of
tasks, and wait until this drops to zero.

With this change in, we should feel comfortable using the stopper
extensively and, for example, ensuring that any CRDB goroutine is
anchored in a Stopper task; this is the right approach for test flakes
such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in
issue cockroachdb#58164 as well.

In a future change, we should make the Stopper more configurable and,
through this configurability, we could in principle bring a version of
the task map back (in debug builds) without backing it into the stopper,
though I don't anticipate that we'll want to.

Closes cockroachdb#52894.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Feb 2, 2021
We are likely going to invest more in the stopper-conferred
observability in the near future as part of initiatives such as cockroachdb#58164,
but the task tracking that has been a part of the stopper since near
its conception has not proven to be useful in practice, while at the
same time raising concern about stopper use in hot paths.

When shutting down a running server, we don't particularly care about
leaking goroutines (as the process will end anyway). In tests, we want
to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to
look at the stacks to find out why than to consult the task map.

Together, this left little reason to do anything more complicated than
what's left after this commit: we keep track of the running number of
tasks, and wait until this drops to zero.

With this change in, we should feel comfortable using the stopper
extensively and, for example, ensuring that any CRDB goroutine is
anchored in a Stopper task; this is the right approach for test flakes
such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in
issue cockroachdb#58164 as well.

In a future change, we should make the Stopper more configurable and,
through this configurability, we could in principle bring a version of
the task map back (in debug builds) without backing it into the stopper,
though I don't anticipate that we'll want to.

Closes cockroachdb#52894.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Feb 3, 2021
We are likely going to invest more in the stopper-conferred
observability in the near future as part of initiatives such as cockroachdb#58164,
but the task tracking that has been a part of the stopper since near
its conception has not proven to be useful in practice, while at the
same time raising concern about stopper use in hot paths.

When shutting down a running server, we don't particularly care about
leaking goroutines (as the process will end anyway). In tests, we want
to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to
look at the stacks to find out why than to consult the task map.

Together, this left little reason to do anything more complicated than
what's left after this commit: we keep track of the running number of
tasks, and wait until this drops to zero.

With this change in, we should feel comfortable using the stopper
extensively and, for example, ensuring that any CRDB goroutine is
anchored in a Stopper task; this is the right approach for test flakes
such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in
issue cockroachdb#58164 as well.

In a future change, we should make the Stopper more configurable and,
through this configurability, we could in principle bring a version of
the task map back (in debug builds) without backing it into the stopper,
though I don't anticipate that we'll want to.

Closes cockroachdb#52894.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 3, 2021
59647: stop: rip out expensive task tracking r=knz a=tbg

First commit was put up for PR separately, ignore it here.

----

We are likely going to invest more in the stopper-conferred
observability in the near future as part of initiatives such as #58164,
but the task tracking that has been a part of the stopper since near
its conception has not proven to be useful in practice, while at the
same time raising concern about stopper use in hot paths.

When shutting down a running server, we don't particularly care about leaking
goroutines (as the process will end anyway). In tests, we want to ensure
goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at
the stacks to find out why than to consult the task map.

Together, this left little reason to do anything more complicated than
what's left after this commit: we keep track of the running number of
tasks, and wait until this drops to zero.

With this change in, we should feel comfortable using the stopper
extensively and, for example, ensuring that any CRDB goroutine is
anchored in a Stopper task; this is the right approach for test flakes
such as in #51544 and makes sense for all of the reasons mentioned in
issue #58164 as well.

In a future change, we should make the Stopper more configurable and,
through this configurability, we could in principle bring a version of
the task map back (in debug builds) without backing it into the stopper,
though I don't anticipate that we'll want to.

Closes #52894.

Release note: None


59732: backupccl: add an owner column behind the WITH PRIVILEGES option r=pbardea a=Elliebababa

Previously, when users perform RESTORE, they are ignorant of the original owner.

This PR gives ownership data as a column behind privileges.

Resolves: #57906.

Release note: None.

59746: opt: switch checks to use CrdbTestBuild instead of RaceEnabled r=RaduBerinde a=RaduBerinde

The RaceEnabled flag is not very useful for checks; e.g. apparently
execbuilder tests aren't run routinely in race mode. These checks are
now "live" in any test build, using the crdb_test build tag.

Release note: None

59747: tree: correct StatementTag of ALTER TABLE ... LOCALITY r=ajstorm a=otan

Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: elliebababa <ellie24.huang@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@craig craig bot closed this in 05c5a74 Feb 3, 2021
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/stopperDrain branch February 6, 2021 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants