-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: debug: option to omit goroutine stacks by default from debug zip #110258
release-22.2: debug: option to omit goroutine stacks by default from debug zip #110258
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Recently, customers have complained about the performance impact of debug zip. Specifically, the `/debug/stacks/<node_id>` endpoint has been found to cause significant momentary spikes in SQL Service Latency for the node serving the request. This is because the endpoint uses `runtime.Stack()` from the go runtime to service the request, which is a "stop-the-world" operation. Naturally, this degrades performance "while the world is stopped", which larger customers have found to be unacceptable. This patch adds the `--include-goroutine-stacks` flag to debug zip, which defaults to `true`. This gives the option to avoid fetching goroutine stacks from `/debug/stacks/<node_id>`. This enables customers to reduce the performance impact of taking a debug zip bundle. Release note (ops change): The `debug zip` command now has an option to omit goroutine stack dumps by. This impacts the creation of `nodes/*/stacks.txt` and `nodes/*/stacks_with_labels.txt` within debug zip bundles. Users can opt to exclude these goroutine stacks by using the `--include-goroutine-stacks=false` flag. Note that fetching stack traces for all goroutines is a "stop-the-world" operation, which can momentarily have negative impacts on SQL service latency. Note that any periodic goroutine dumps previously taken on the node will still be included in `nodes/*/goroutines/*.txt.gz`, as these would have already been generated and don't require any stop-the-world operations.
a316df9
to
ef29bc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the line diffs that are different from the original PR, is it just testdata?
Reviewable status:
complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the line diffs that are different from the original PR, is it just testdata?
I usually pull up the diff of the original PR side-by-side with the diff from the backport PR to determine this.
In this instance, there were some package differences and flags present on v22.2 that aren't present anymore on master, so the code had to be tweaked to fit how things are on 22.2.
Reviewable status:
complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @abarganier)
TFTR! |
Backport:
Please see individual PRs for details.
/cc @cockroachdb/release
Release justification: low-risk (CLI) change to debug zip to enable reduction of performance impact for clusters with large numbers of goroutines.