Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(profiling): optimize prepare_sample_message #1952

Merged
merged 6 commits into from
Mar 3, 2023

Conversation

morrisonlevi
Copy link
Collaborator

@morrisonlevi morrisonlevi commented Mar 2, 2023

Description

This function is called once per stack walk, so it's on the hot path but not as important to optimize as walking the stack itself.

However, it was showing up more than I would have expected in my profiles when examining it with the native profiler. Unfortunately I didn't have granular feedback on what was slowing it down, so I did some general optimization techniques:

  1. Statically store the sample types as an array instead of creating them on-demand.
  2. Replace Cow<'static, str> with &'static str in ValueType. All usages were already static strings, and I don't see this changing.
  3. Simplify the branching logic into slicing operations.
  4. Move tag creation to rinit, and share them with Arc<Vec<Tag>> instead of copying them. This makes the work to do on each stack sample be a simple atomic refcount increment. This will also reduce the number of total copies, which is also good.
  5. Initialize the sample's labels to the right size, and stop copying them unconditionally just for trace level diagnostics.

Aside from performance, it also adds ValueType::new to shorten up consecutive constructions (which happen frequently with this type). Between this and the changes to remove Cow, code is simplified:

    // before
    ValueType {
        r#type: Cow::Borrowed("sample"),
        unit: Cow::Borrowed("count"),
    },
    ValueType {
        r#type: Cow::Borrowed("wall-time"),
        unit: Cow::Borrowed("nanoseconds"),
    },

    // after
    ValueType::new("sample", "count"),
    ValueType::new("wall-time", "nanoseconds"),

Readiness checklist

  • Changelog has been added to the release document.
  • Tests added for this feature/bug.

Reviewer checklist

  • Appropriate labels assigned.
  • Milestone is set.

@morrisonlevi morrisonlevi added Overhead Relates to latency, CPU, or memory overhead profiling Relates to the Continuous Profiler labels Mar 2, 2023
@morrisonlevi
Copy link
Collaborator Author

Numbers from ddprof for default profile types and how much overhead prepare_sample_message has within collect_time:

  • Old: 12/101 = 0.1188
  • New: 9/93 = 0.0968

This was for a given endpoint in the symfony-demo under ab load for 560-ish seconds. The win is small, but real.

@morrisonlevi morrisonlevi marked this pull request as ready for review March 2, 2023 07:20
@morrisonlevi morrisonlevi requested a review from a team as a code owner March 2, 2023 07:20
Copy link
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

This function is called once per stalk walk, so it's on the hot path
but not as important to optimize as walking the stack itself.

However, it was showing up more than I would have expected in my
profiles when examining it with the native profiler. Unfortunately I
didn't have granular feedback on what was slowing it down, so I did
some general optimization techniques:

 1. Statically store the sample types as an array instead of creating
    them on-demand.
 2. Replace Cow<'static, str> with &'static str in ValueType. All
    usages were already static strings, and I don't see this changing.
 3. Simplify the branching logic into slicing operations.
 4. Move tag creation to an earlier point and do dumb-copies of tags
    when needed instead of branching logic.
@morrisonlevi morrisonlevi changed the base branch from levi/run-time-cache to master March 2, 2023 17:53
@morrisonlevi morrisonlevi added this to the 0.85.0 milestone Mar 2, 2023
This avoids needing to copy them on every stack walk.
@morrisonlevi
Copy link
Collaborator Author

I added another optimization by switching tags from Vec<Tag> to Arc<Vec<Tag>> and just sharing them for every message (they don't change within a request). Stats :

  • Old: 5/46 = 0.1087
  • New (cumulative): 2/32 = 0.0625

It's a bigger win, but still overall a small win.

@morrisonlevi
Copy link
Collaborator Author

morrisonlevi commented Mar 2, 2023

I made a small improvement to message_labels. It's not worth measuring, I just noticed it while driving by (it's the code right above prepare_sample_message). The labels are no longer copied either, which requires adjustment to the log messages. Please review!

Initialization is simpler/faster, and it avoids a copy of the labels.
@realFlowControl realFlowControl merged commit 31c6605 into master Mar 3, 2023
@realFlowControl realFlowControl deleted the levi/prepare-sample-message branch March 3, 2023 11:11
@bwoebi bwoebi mentioned this pull request Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Overhead Relates to latency, CPU, or memory overhead profiling Relates to the Continuous Profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants