-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(profiling): optimize prepare_sample_message #1952
Conversation
7b831d4
to
5e4424d
Compare
Numbers from ddprof for default profile types and how much overhead prepare_sample_message has within
This was for a given endpoint in the symfony-demo under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
5e4424d
to
9377906
Compare
This function is called once per stalk walk, so it's on the hot path but not as important to optimize as walking the stack itself. However, it was showing up more than I would have expected in my profiles when examining it with the native profiler. Unfortunately I didn't have granular feedback on what was slowing it down, so I did some general optimization techniques: 1. Statically store the sample types as an array instead of creating them on-demand. 2. Replace Cow<'static, str> with &'static str in ValueType. All usages were already static strings, and I don't see this changing. 3. Simplify the branching logic into slicing operations. 4. Move tag creation to an earlier point and do dumb-copies of tags when needed instead of branching logic.
9377906
to
9f6fb1c
Compare
4d988d5
to
25eaa3e
Compare
This avoids needing to copy them on every stack walk.
25eaa3e
to
abb03c2
Compare
I added another optimization by switching tags from
It's a bigger win, but still overall a small win. |
I made a small improvement to |
Initialization is simpler/faster, and it avoids a copy of the labels.
d3f801e
to
bc48e70
Compare
Description
This function is called once per stack walk, so it's on the hot path but not as important to optimize as walking the stack itself.
However, it was showing up more than I would have expected in my profiles when examining it with the native profiler. Unfortunately I didn't have granular feedback on what was slowing it down, so I did some general optimization techniques:
Arc<Vec<Tag>>
instead of copying them. This makes the work to do on each stack sample be a simple atomic refcount increment. This will also reduce the number of total copies, which is also good.trace
level diagnostics.Aside from performance, it also adds
ValueType::new
to shorten up consecutive constructions (which happen frequently with this type). Between this and the changes to removeCow
, code is simplified:Readiness checklist
Reviewer checklist