Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Delta Profiling #1511

Merged
merged 41 commits into from Nov 11, 2022
Merged

Fast Delta Profiling #1511

merged 41 commits into from Nov 11, 2022

Conversation

pmbauer
Copy link
Member

@pmbauer pmbauer commented Oct 11, 2022

Description

The pprof protobuf structure when hydrated in-memory as golang structs has significant overhead in memory and allocations. The structure has multiple nested self-referencing sub-structs that cause GC overhead to the point of dominating CPU resources for some services.

We need to keep this structure in memory for computing delta profiles between profile cycles, making our profiling client a cause for performance regressions in some cases.

This pull adds a new delta profile implementation - fastdelta - that uses a streaming pprof parser and low heap allocations so the entire hydrated pprof structure no longer needs to be kept in memory between profile cycles.

Testing Done

  • tested in reliability env with DD_PROFILING_DELTA_METHOD=comparing, no fidelity failures
  • tested on production load with DD_PROFILING_DELTA_METHOD=fastdelta
  • similar results to the synthetic benchmark (below) in tested production service and in reliability env

Deploy Plan

  1. one dd-trace-go release with both delta profile implementations, old algorithm as default, new algorithm enabled with DD_PROFILING_DELTA_METHOD env var. This is so we can continue testing internally.
  2. release N+1, make fastdelta the default, enable old algorithm with an env var fallback if there are any issues.
  3. at some point, we may remove the old algorithm entirely.

Results

On synthetic benchmarks

  • ~4x improvement in wall time
  • ~10x improvement in memory overhead
  • ~20x improvement in allocations
name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-10        2.09ms ± 1%    0.52ms ± 0%  -75.00%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-10    31.6ms ± 1%     7.0ms ± 0%  -77.93%  (p=0.008 n=5+5)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-10        2.98MB ± 0%    0.23MB ± 0%  -92.44%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-10    35.4MB ± 0%     1.7MB ± 0%  -95.18%  (p=0.008 n=5+5)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-10         41.2k ± 0%      1.3k ± 0%  -96.74%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-10      524k ± 0%       14k ± 0%  -97.25%  (p=0.008 n=5+5)

Change Summary

  • vendors fastdelta
  • fix for duration_nanos tracking
  • fixes for over-fit test fixtures
  • test coverage for the 3 delta implementations (fastdelta, pprofile, comparing)
  • comparing telemetry

pmbauer and others added 10 commits October 7, 2022 15:51
- vendor fastdelta library from DataDog/fastdelta@358911a
- anonymized big-heap.pprof, updated corresponding tests

Co-authored-by: Nick Ripley <nick.ripley@datadoghq.com>
- controls which delta method the profiler uses
- defaults to pprofile (pprofile.Scale+pprofile.Merge)
- other options: fastdelta, comparing
- added test coverage for all options
somehow lost goimports-on-save in my local dev; fixed
@pmbauer pmbauer modified the milestones: v1.44.0, 1.43.0 Oct 11, 2022
pmbauer and others added 2 commits October 11, 2022 15:38
name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-12         924µs ± 2%     827µs ± 3%  -10.53%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-12    12.2ms ± 4%    10.9ms ± 4%  -10.29%  (p=0.008 n=5+5)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-12         289kB ± 0%     226kB ± 0%  -22.00%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-12    2.44MB ± 0%    1.71MB ± 0%  -29.77%  (p=0.008 n=5+5)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-12         3.33k ± 0%     1.34k ± 0%  -59.66%  (p=0.008 n=5+5)
FastDelta/testdata/big-heap.pprof-12     37.2k ± 0%     14.5k ± 0%  -61.10%  (p=0.008 n=5+5)
@Julio-Guerra Julio-Guerra modified the milestones: 1.43.0, v1.44.0 Oct 18, 2022
@pmbauer pmbauer marked this pull request as ready for review October 18, 2022 19:58
@pmbauer pmbauer requested a review from a team as a code owner October 18, 2022 19:58
@pmbauer pmbauer requested a review from a team October 18, 2022 19:58
@pmbauer pmbauer requested a review from a team as a code owner October 18, 2022 19:58
Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paul!

  • There is one breaking change around the StatsdClient interface that definitely needs to be addressed. I'm wondering if we should keep the timing stuff, or just the old/new consistency check (so we can still run the implementations side-by-side just to make sure they agree content-wise)
  • I've also left some comments more related to the way the code is structured. If those suggestions are too much for this PR, especially related to the nesting of the different pass functions, we can address them in a subsequent PR.

profiler/profiler.go Outdated Show resolved Hide resolved
profiler/internal/fastdelta/pprof_records.go Outdated Show resolved Hide resolved
profiler/internal/fastdelta/pprof_records.go Outdated Show resolved Hide resolved
profiler/profile.go Outdated Show resolved Hide resolved
profiler/internal/fastdelta/fd.go Outdated Show resolved Hide resolved
profiler/profile.go Outdated Show resolved Hide resolved
profiler/profile_test.go Show resolved Hide resolved
profiler/profile_test.go Outdated Show resolved Hide resolved
// f builds a profile with a single sample which has labels in the given
// order. We build the profile ourselves because we can control the
// precise binary encoding of the profile.
f := func(labels ...string) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I wrote this test, but I'm wondering if there's a nicer way to test the hashing consistency than hand-writing a pprof to get the encoding we want? We could perhaps construct a minimal DeltaComputer with enough state populated to just give some inputs to readSample, and test that directly. On the other hand, I kind of like this more black-box style test that only calls DeltaComputers "public" methods... At the very least, I'll leave this comment here as a breadcrumb, maybe we come back to this in a subsequent PR.

profiler/internal/fastdelta/fd.go Outdated Show resolved Hide resolved
@pmbauer pmbauer changed the title Feature fast delta profiling Fast Delta Profiling Oct 19, 2022
@nsrip-dd nsrip-dd requested a review from felixge October 20, 2022 19:03
nsrip-dd and others added 2 commits October 24, 2022 09:12
Add minimal fuzz testing, and address several issues surfaced by the
fuzzer:

	* Bound check string table indices
	* Bound check when accessing the list of included strings
	* Bound check the location index
	* Check that packed varints are valid, and not too large
	* Check the number of values per sample

Any of these issues could have been identified _without_ the fuzzer by
thinking about bounds for every access, but the fuzzer surfaced all of
them quickly and we now have repeatable bad inputs to test against.

The motivation for this is to have some assurance that the fast
delta computer won't fail dramatically if the underlying input changes
in some unexpected way, or is invalid due to a Go runtime bug.

Also, add a failsafe for crashes/errors to reset the state.
If delta profile computation fails, then the DeltaComputer's internal
state will be in some indeterminate form. Subsequent calls to Delta
given the bad state will return junk. The DeltaComputer should basically
be reset to a fresh start.
@pmbauer
Copy link
Member Author

pmbauer commented Oct 26, 2022

@nsrip-dd your review still says 'changes requested' Is there anything else left to go in before getting a review from @felixge?

nsrip-dd
nsrip-dd previously approved these changes Oct 26, 2022
@nsrip-dd
Copy link
Contributor

Nope, all good for me! IMO these changes are OK to merge as-is especially since this is off-by-default behind an environment variable. I'll check in with Felix tomorrow.

Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to completely review this today, but it's a lot of code 😅. That being said, I made some good progress. Most of my suggestions can be applied by merging this PR: #1550

@@ -208,6 +211,7 @@ func defaultConfig() (*config, error) {
uploadTimeout: DefaultUploadTimeout,
maxGoroutinesWait: 1000, // arbitrary value, should limit STW to ~30ms
deltaProfiles: internal.BoolEnv("DD_PROFILING_DELTA", true),
deltaMethod: os.Getenv("DD_PROFILING_DELTA_METHOD"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Assuming we're ready to truly release this, change this to default to "fastdelta" before merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Paris, we agreed to release this behind a flag, disabled by default. I still think we should release this in stages.

  • 1.44.0 with both implementations, old algorithm as default, new algorithm enabled with an env var. This is so we can continue testing internally. dd-go apps are not going to go to prod with a non-release dd-trace-go. Customers that need it can opt-in.
  • 1.4?.0 still include both algorithms, make fastdelta the default, enable old algorithm with an env var fallback.

profiler/profile.go Outdated Show resolved Hide resolved
return nil, fmt.Errorf("error flushing gzip writer: %v", err)
}
return fdp.buf.Bytes(), nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Why is the gzip stuff handled here rather than inside the fastdelta.DeltaComputer implementation? The old implementation also dealt with it inside of pprofile.ParseData() and deltaProf.Write(), so it's a bit odd to operate at a different level of abstraction here.

Copy link
Member Author

@pmbauer pmbauer Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something I asked for when fastdelta was its own library.
Compression is an orthogonal concern to the process of computing delta's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't feel strongly about it.


// insert associates the given address, mapping ID, and function IDs with the
// given location ID
func (l *locationIndex) insert(id, address, mappingID uint64, functionIDs []uint64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: It'd be nice to make this method public in order to indicate, that unlike e.g. insertFast and insertSlow, this method is supposed to be used by outside callers. The same applies to the reset, get and getMeta method of this struct.

I'd say it's generally a good pattern for struct methods (and fields) to lowercase if they are only used by the struct itself, and uppercase if they are used in other places of the code. Of course this has no meaning to the compiler inside the same package, but it helps human readers and also simplifies the process of moving code to different packages over time.

Suggested change
func (l *locationIndex) insert(id, address, mappingID uint64, functionIDs []uint64) {
func (l *locationIndex) Insert(id, address, mappingID uint64, functionIDs []uint64) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it used to live in fd.go and I don't remember why it was moved out; @nsrip-dd, do you remember? I don't think we'd ever expose this to outside callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to another file because I felt it was a somewhat trickier component that I'd want to look at on its own. As for exposing it to outside callers, Felix's point is that having the methods we actually intend to use capitalized is a reminder for us of what the intended interface is for this type. If this did live in another package, Insert, Get, and GetMeta would be the methods we actually expose. The rest would be implementation details. That's also reflected in the current tests, which only access locationIndex through those methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM; thanks

profiler/internal/fastdelta/location_index.go Show resolved Hide resolved
Co-authored-by: Felix Geisendörfer <felix@datadoghq.com>
1. NIT: use bytes.HasPrefix
2. NIT: Uppercase public locationIndex methods
3. location_index: simplify and -15% allocs

name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-12         921µs ± 0%     907µs ± 1%   -1.52%  (p=0.000 n=10+10)
FastDelta/testdata/big-heap.pprof-12    11.8ms ± 1%    11.6ms ± 1%   -1.73%  (p=0.000 n=10+8)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-12         201kB ± 0%     199kB ± 0%   -0.99%  (p=0.000 n=10+10)
FastDelta/testdata/big-heap.pprof-12    1.77MB ± 0%    1.80MB ± 0%   +1.93%  (p=0.000 n=10+10)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-12         1.39k ± 0%     1.17k ± 0%  -16.15%  (p=0.002 n=8+10)
FastDelta/testdata/big-heap.pprof-12     14.6k ± 0%     12.5k ± 0%  -14.09%  (p=0.000 n=10+10)
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have found a bug, PTAL.

}
return true, nil
})
sort.Sort(byHash(dc.hashes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐞 dc.hashes contains hashes of labels and locations. By sorting it, samples with the same set of location ids ordered in different orders will produce the same final hash. I think that's wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. My intention for sorting was to consistently order the labels, but you're right that the call stack shouldn't be touched. I'll push a fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FWIW technically with more that 2 locations, the location IDs are packed and we hash them in one go here: https://github.com/DataDog/dd-trace-go/pull/1511/files/2b52f5429d013cd2a03f37d58ecf42d78fc21fbd#diff-5a42c40b2af7c9a2c14a5f80489bffbe90130cfae4795cfc311c2fa768fee0e5R615

So I think this should only affect samples with exactly 2 locations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1558

nsrip-dd and others added 2 commits November 1, 2022 18:06
Order matters for hashing the addresses in a call stack. If the sample
has repeated single-value location IDs (rather than packed), we
currently hash them one at a time and append the hash to the hashes
slice, then sort it. But this gets rid of the order the addresses
appear.

To be fully robust, we should collect all the addresses first, and then
hash them after we've visited every location.
fastdelta: reduce allocs by up to 90%

Remove allocation from sort.Sort call (big win) and also only call it if needed
(small win).

Also simplify the hashing a bit, no need to hash the location list and
then re-hash it again, it can all go into the same hash right away.

$ benchstat before.txt after.txt
name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-12         923µs ± 1%     833µs ± 1%   -9.75%  (p=0.000 n=9+10)
FastDelta/testdata/big-heap.pprof-12    11.6ms ± 1%    10.5ms ± 2%   -9.32%  (p=0.000 n=9+9)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-12         199kB ± 0%     175kB ± 0%  -12.02%  (p=0.000 n=10+10)
FastDelta/testdata/big-heap.pprof-12    1.80MB ± 0%    1.53MB ± 0%  -15.11%  (p=0.000 n=10+10)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-12         1.17k ± 0%     0.18k ± 0%  -84.86%  (p=0.000 n=8+10)
FastDelta/testdata/big-heap.pprof-12     12.5k ± 0%      1.2k ± 0%  -90.61%  (p=0.000 n=10+10)

Co-authored-by: Nick Ripley <97066770+nsrip-dd@users.noreply.github.com>
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐞 Maybe I'm missing something, but it seems like the entire stringsTable implementation assumes that string table ID references remain stable between profiles? I can see how that might be a reasonable assumption for the profiles currently emitted by the go runtime, but I'm worried that it will be very difficult to adept our implementation if this invariant gets broken in the future.

err = dc.Delta(after, buf)
if err != nil {
b.Fatal(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This benchmark has a problem. It doesn't measure any delta computation. The only samples that are retained in the after profile are those that have inuse_* counts. I took a closer look at how that works out for the test files we have:

profile before_samples after_samples %
heap.pprof 497 5 1%
big-heap.pprof 5676 686 12%

I don't think we'll encounter many cases like heap.pprof in the real world where only 1% of samples are retained from the initial profile. big-heap.pprof retaining 12% is probably more realistic, but again, I'm still a little worried that we're not actually benchmarking any delta computation here. The benchmark just removes all samples that have insuse_* counts of 0.

Unfortunately I think it will be difficult to construct a better benchmark. We'd probably have to constructor b.N derivates of our input profile with monotonically increasing counts before starting the benchmark itself. It can be done, but will cause a lot of memory usage and increase the startup time of the benchmarks significantly.

So maybe we should just document the limitation of the benchmarks to avoid accidentally overfitting our implementation to the somewhat unrealistic workload?

Copy link
Member Author

@pmbauer pmbauer Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We could re-work this and even incorporate b.N into the test input. I don't think that would be too difficult.

It doesn't measure any delta computation.

Do you mean the one actual delta in the bench results in no non-zero samples for the value types configured? (before and after are identical, yeah we def don't need them both). The table you have is only counting the non-alloc_* samples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is only one Delta computation, the current benchmark is under-representing the gains for processes that last longer than 2 profiling cycles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is only one Delta computation, the current benchmark is under-representing the gains for processes that last longer than 2 profiling cycles.

Yes, that's another problem with this benchmark! I think it'd be better if it create a single DeltaComputer and then applied pprofs against it in the loop rather than creating a new computer in each loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This bench could use some more work. I'll have some updates later today.

return false, err
}
dc.keepLocations(dc.scratchIDs)
return true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐞 Since there is no includeStringIndexFields call here, samples with labels that go through this code pass may have string table references that don't exist.

Copy link
Member Author

@pmbauer pmbauer Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch
In practice, we throw away the first delta and the comparingDeltaProfiler didn't catch this one, but will fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch
In practice, we throw away the first delta and the comparingDeltaProfiler didn't catch this one, but will fix.

Maybe hold off on that fix. I have a branch with some more refactorings where I can easily fix this as well.

@pmbauer
Copy link
Member Author

pmbauer commented Nov 2, 2022

🐞 Maybe I'm missing something, but it seems like the entire stringsTable implementation assumes that string table ID references remain stable between profiles?

The stringTable is not used between profiles and is reset at the start of each delta. Unless I'm misunderstanding something, we don't assume the IDs are stable between profiles. The strings that get written out to the delta pprof will have the same offsets as the input pprof passed to Delta, but there is no assumed carry-over from one delta pprof to the next.

@felixge
Copy link
Member

felixge commented Nov 3, 2022

The strings that get written out to the delta pprof will have the same offsets as the input pprof passed to Delta, but there is no assumed carry-over from one delta pprof to the next.

💡 Ah, got it. For some reason I thought that there might be references into the string table from the previous profile. But I can see now that this is clearly not the case. All references are from the current profile, so it doesn't matter if the same string had a different index in the previous profile. ✅

nsrip-dd and others added 4 commits November 7, 2022 17:59
Add a test that repeatedly generates real heap profiles, each time
creating more garbage, and checking the fidelity of the delta against
the original implementation.

The garbage is created using mutual recursion so that there are many new
unique call stacks for each iteration. This is inspired by seeing
sorting in a profile where left and right recursive merging lead to an
exponential number of unique call stacks.

The test is too resource intensive to run on CI. Use an env var to
enable it, so we can run it locally.
Comparison of old pprof implementation vs fastdelta:

name                                  old time/op    new time/op     delta
Delta/heap.pprof/setup-10               2.69ms ± 1%     0.92ms ± 0%   -65.88%  (p=0.000 n=10+8)
Delta/heap.pprof/steady-state-10        2.05ms ± 0%     0.38ms ± 0%   -81.53%  (p=0.000 n=10+10)
Delta/big-heap.pprof/setup-10           32.1ms ± 1%     10.7ms ± 0%   -66.56%  (p=0.000 n=9+8)
Delta/big-heap.pprof/steady-state-10    26.5ms ± 2%      4.7ms ± 0%   -82.36%  (p=0.000 n=10+9)

name                                  old speed      new speed       delta
Delta/heap.pprof/setup-10             9.82MB/s ± 1%  28.78MB/s ± 0%  +193.05%  (p=0.000 n=10+8)
Delta/heap.pprof/steady-state-10      12.9MB/s ± 0%   69.6MB/s ± 0%  +441.42%  (p=0.000 n=10+10)
Delta/big-heap.pprof/setup-10         9.68MB/s ± 1%  28.96MB/s ± 0%  +199.07%  (p=0.000 n=9+8)
Delta/big-heap.pprof/steady-state-10  11.7MB/s ± 2%   66.5MB/s ± 0%  +466.96%  (p=0.000 n=10+9)
name                                  old alloc/op             new alloc/op             delta
Delta/heap.pprof/setup-10                         2.99MB ± 0%              0.24MB ± 0%   -91.94%  (p=0.000 n=10+10)
Delta/heap.pprof/steady-state-10                  2.38MB ± 0%              0.00MB       -100.00%  (p=0.000 n=10+10)
Delta/big-heap.pprof/setup-10                     36.1MB ± 0%               1.7MB ± 0%   -95.26%  (p=0.000 n=10+10)
Delta/big-heap.pprof/steady-state-10              28.7MB ± 0%               0.0MB       -100.00%  (p=0.000 n=9+10)

name                                  old allocs/op            new allocs/op            delta
Delta/heap.pprof/setup-10                          41.2k ± 0%                0.1k ± 0%   -99.67%  (p=0.000 n=10+10)
Delta/heap.pprof/steady-state-10                   34.3k ± 0%                0.0k       -100.00%  (p=0.000 n=10+10)
Delta/big-heap.pprof/setup-10                       527k ± 0%                  0k ± 0%   -99.93%  (p=0.000 n=10+9)
Delta/big-heap.pprof/steady-state-10                450k ± 0%                  0k       -100.00%  (p=0.000 n=9+10)

name                                  old heap-inuse-alloc/op  new heap-inuse-alloc/op  delta
Delta/heap.pprof/steady-state-10                   406kB ± 0%               140kB ± 0%   -65.44%  (p=0.000 n=9+10)
Delta/big-heap.pprof/steady-state-10              4.58MB ± 0%              0.81MB ± 1%   -82.23%  (p=0.000 n=9+10)

Co-authored-by: Nick Ripley <nick.ripley@datadoghq.com>
@felixge felixge self-requested a review November 11, 2022 22:36
@felixge felixge merged commit f7fee57 into main Nov 11, 2022
@felixge felixge deleted the feature-fast-delta-profiling branch November 11, 2022 22:40
@pmbauer pmbauer mentioned this pull request Dec 8, 2022
5 tasks
@nsrip-dd nsrip-dd mentioned this pull request Mar 15, 2023
3 tasks
nsrip-dd added a commit that referenced this pull request Mar 15, 2023
The PprofDiff function was added to our public API unintentionally as
part of #1511. The function is only meant for internal use. This was
missed during code review, likely due to the size of the change and the
fact that most of our attention was devoted to making sure the core
parts of the change were implemented correctly.

This is an exception to our normal v1 compatibility guarantees. It is
very unlikely that any of our users depend on this function. It serves
no purpose in the context of the profiler's public API, and anybody who
might have a reason to use this function would probably have already
found a way to do what they want using the
github.com/google/pprof/profile package.
nsrip-dd added a commit that referenced this pull request Mar 16, 2023
The PprofDiff function was added to our public API unintentionally as
part of #1511. The function is only meant for internal use. This was
missed during code review, likely due to the size of the change and the
fact that most of our attention was devoted to making sure the core
parts of the change were implemented correctly.

Mark the function as deprecated, to be removed in a later release.
This should hopefully give users a chance to notice that it's going away
and stop using it before it's removed. pkg.go.dev will show that the
the function is deprecated, and some IDEs and linters will flag use of
deprecated functions.

This is an exception to our normal v1 compatibility guarantees. It is
very unlikely that any of our users depend on this function. It serves
no purpose in the context of the profiler's public API, and anybody who
might have a reason to use this function would probably have already
found a way to do what they want using the
github.com/google/pprof/profile package.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants