Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: debug: option to omit goroutine stacks by default from debug zip #110258

Merged

Conversation

abarganier
Copy link
Contributor

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release


Release justification: low-risk (CLI) change to debug zip to enable reduction of performance impact for clusters with large numbers of goroutines.

@abarganier abarganier requested a review from a team September 8, 2023 14:53
@abarganier abarganier requested a review from a team as a code owner September 8, 2023 14:53
@abarganier abarganier requested review from Santamaura and removed request for a team September 8, 2023 14:53
@blathers-crl
Copy link

blathers-crl bot commented Sep 8, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Sep 8, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Recently, customers have complained about the performance
impact of debug zip. Specifically, the `/debug/stacks/<node_id>`
endpoint has been found to cause significant momentary spikes
in SQL Service Latency for the node serving the request.

This is because the endpoint uses `runtime.Stack()` from the
go runtime to service the request, which is a "stop-the-world"
operation. Naturally, this degrades performance "while the world
is stopped", which larger customers have found to be unacceptable.

This patch adds the `--include-goroutine-stacks` flag to debug zip,
which defaults to `true`. This gives the option to avoid fetching
goroutine stacks from `/debug/stacks/<node_id>`. This enables
customers to reduce the performance impact of taking a debug zip
bundle.

Release note (ops change): The `debug zip` command now has an
option to omit goroutine stack dumps by. This impacts the creation
of `nodes/*/stacks.txt` and `nodes/*/stacks_with_labels.txt` within
debug zip bundles. Users can opt to exclude these goroutine
stacks by using the `--include-goroutine-stacks=false` flag.

Note that fetching stack traces for all goroutines is a
"stop-the-world" operation, which can momentarily have negative
impacts on SQL service latency. Note that any periodic goroutine
dumps previously taken on the node will still be included in
`nodes/*/goroutines/*.txt.gz`, as these would have already been
generated and don't require any stop-the-world operations.
Copy link
Contributor

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the line diffs that are different from the original PR, is it just testdata?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the line diffs that are different from the original PR, is it just testdata?

I usually pull up the diff of the original PR side-by-side with the diff from the backport PR to determine this.

In this instance, there were some package differences and flags present on v22.2 that aren't present anymore on master, so the code had to be tweaked to fit how things are on 22.2.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier)

@abarganier
Copy link
Contributor Author

TFTR!

@abarganier abarganier merged commit d915750 into cockroachdb:release-22.2 Sep 8, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants