Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: ensure that CPU profile records profiler work #1485

Merged
merged 6 commits into from
Sep 30, 2022

Conversation

nsrip-dd
Copy link
Contributor

Our profiler library does post-processing of profiles, such as computing
deltas for heap, block, and mutex profiles, which has non-zero CPU
overhead. Right now there's no guarantee that that work will be captured
in the CPU profile. Add synchronization so that the CPU profiler will
only be stopped after all other profiles are completed.

Our profiler library does post-processing of profiles, such as computing
deltas for heap, block, and mutex profiles, which has non-zero CPU
overhead. Right now there's no guarantee that that work will be captured
in the CPU profile. Add synchronization so that the CPU profiler will
only be stopped after all other profiles are completed.
@nsrip-dd nsrip-dd added this to the 1.43.0 milestone Sep 28, 2022
@@ -159,8 +168,9 @@ var profileTypes = map[ProfileType]profileType{
},
}

func collectGenericProfile(name string, delta *pprofutils.Delta) func(p *profiler) ([]byte, error) {
func collectGenericProfile(name string, pt ProfileType, delta *pprofutils.Delta) func(p *profiler) ([]byte, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I tried just making collectGenericProfile just take a ProfileType argument and drop the less type-safe name argument. However, this creates a initialization cycle between the profileTypes map and the collectGenericProfile function, since we still want to get the profile name for doing the lookup, and getting the name requires accessing profileTypes.

Not sure how much refactoring to do in this PR. I could give ProfileType a name() string method that just does a switch statement, and remove Name from profileType? Or leave it kind of awkward for now and refactor in a subsequent PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it awkward for now, it's ok. We'll come back to it.

@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Sep 28, 2022

As-is, this PR breaks CPUDuration by forcing the CPU profiler to wait until all the other profiles have finished. We could fix it by starting the CPU profile at profile period - cpu duration seconds, so the CPU profile covers the tail end of the period rather than the beginning, but still captures the other profiles' work.

EDIT: changed the CPU profiler to wait until the end of the profiling period.

If the CPU profile duration is less than the profiling period, make sure
that the CPU profile happens at the end rather than the beginning of
the profile period. That way, we respect the shorter CPU duration but
still capture the profiler library work.
@nsrip-dd nsrip-dd marked this pull request as ready for review September 28, 2022 14:47
@nsrip-dd nsrip-dd requested a review from a team as a code owner September 28, 2022 14:47
felixge
felixge previously approved these changes Sep 29, 2022
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think a wait group would simplify the implementation quite a bit.

How have you tested this? I know it's difficult to write automated tests for this, but IIRC we've had some regression on this concurrency stuff in the past (7dfa5e8, 85c05b2). So at least we should do some manual and/or rel env testing to confirm that this works as intended and captures the delta profiling CPU work.

profiler/profiler.go Show resolved Hide resolved
@nsrip-dd
Copy link
Contributor Author

How have you tested this?

I tested this manually. I ran a test app I've been using for other things. I had to increase the CPU profile frequency to pick up delta profiling since it was pretty fast for this app, but it shows up in the CPU profile with this PR applied:

Type: cpu
Time: Sep 29, 2022 at 7:53am (EDT)
Duration: 15.12s, Total samples = 4.79s (31.71%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) focus=deltaProfile
(pprof) top -cum
Active filters:
   focus=deltaProfile
Showing nodes accounting for 1ms, 0.021% of 4794ms total
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
         0     0%     0%        5ms   0.1%  gopkg.in/DataDog/dd-trace-go.v1/profiler.(*profiler).collect.func1
         0     0%     0%        5ms   0.1%  gopkg.in/DataDog/dd-trace-go.v1/profiler.(*profiler).deltaProfile
         0     0%     0%        5ms   0.1%  gopkg.in/DataDog/dd-trace-go.v1/profiler.(*profiler).runProfile
         0     0%     0%        5ms   0.1%  gopkg.in/DataDog/dd-trace-go.v1/profiler.collectGenericProfile.func1
         0     0%     0%        4ms 0.083%  github.com/google/pprof/profile.(*profileMerger).mapSample
         0     0%     0%        4ms 0.083%  github.com/google/pprof/profile.Merge
         0     0%     0%        4ms 0.083%  gopkg.in/DataDog/dd-trace-go.v1/profiler/internal/pprofutils.Delta.Convert
         0     0%     0%        2ms 0.042%  github.com/google/pprof/profile.(*Sample).key
       1ms 0.021% 0.021%        2ms 0.042%  runtime.mallocgc
         0     0% 0.021%        1ms 0.021%  fmt.Sprintf

felixge
felixge previously approved these changes Sep 29, 2022
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but see comments.

@@ -159,8 +168,9 @@ var profileTypes = map[ProfileType]profileType{
},
}

func collectGenericProfile(name string, delta *pprofutils.Delta) func(p *profiler) ([]byte, error) {
func collectGenericProfile(name string, pt ProfileType, delta *pprofutils.Delta) func(p *profiler) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it awkward for now, it's ok. We'll come back to it.

profiler/profiler.go Show resolved Hide resolved
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

@nsrip-dd nsrip-dd merged commit 0775735 into main Sep 30, 2022
@nsrip-dd nsrip-dd deleted the nick.ripley/profile-collection-order branch September 30, 2022 12:55
nsrip-dd added a commit that referenced this pull request Nov 17, 2022
nsrip-dd added a commit that referenced this pull request Nov 21, 2022
* profiler: add the Name field back for the CPU profile

Accidentally removed by #1485. Also make the log output test more specific, which would have caught this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants