Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastdelta: reduce allocs by up to 90% #1559

Conversation

felixge
Copy link
Member

@felixge felixge commented Nov 1, 2022

Avoid calling sort.Sort which allocates due to the slice to interface conversion.

Also simplify the hashing a bit, no need to hash the location list and then re-hash it again, it can all go into the same hash right away.

$ benchstat before.txt after.txt
name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-12         923µs ± 1%     833µs ± 1%   -9.75%  (p=0.000 n=9+10)
FastDelta/testdata/big-heap.pprof-12    11.6ms ± 1%    10.5ms ± 2%   -9.32%  (p=0.000 n=9+9)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-12         199kB ± 0%     175kB ± 0%  -12.02%  (p=0.000 n=10+10)
FastDelta/testdata/big-heap.pprof-12    1.80MB ± 0%    1.53MB ± 0%  -15.11%  (p=0.000 n=10+10)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-12         1.17k ± 0%     0.18k ± 0%  -84.86%  (p=0.000 n=8+10)
FastDelta/testdata/big-heap.pprof-12     12.5k ± 0%      1.2k ± 0%  -90.61%  (p=0.000 n=10+10)

Avoid calling sort.Sort which allocates due to the slice to interface
conversion.

Also simplify the hashing a bit, no need to hash the location list and
then re-hash it again, it can all go into the same hash right away.

$ benchstat before.txt after.txt
name                                  old time/op    new time/op    delta
FastDelta/testdata/heap.pprof-12         923µs ± 1%     833µs ± 1%   -9.75%  (p=0.000 n=9+10)
FastDelta/testdata/big-heap.pprof-12    11.6ms ± 1%    10.5ms ± 2%   -9.32%  (p=0.000 n=9+9)

name                                  old alloc/op   new alloc/op   delta
FastDelta/testdata/heap.pprof-12         199kB ± 0%     175kB ± 0%  -12.02%  (p=0.000 n=10+10)
FastDelta/testdata/big-heap.pprof-12    1.80MB ± 0%    1.53MB ± 0%  -15.11%  (p=0.000 n=10+10)

name                                  old allocs/op  new allocs/op  delta
FastDelta/testdata/heap.pprof-12         1.17k ± 0%     0.18k ± 0%  -84.86%  (p=0.000 n=8+10)
FastDelta/testdata/big-heap.pprof-12     12.5k ± 0%      1.2k ± 0%  -90.61%  (p=0.000 n=10+10)
@felixge felixge requested a review from a team as a code owner November 1, 2022 17:22
Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'm a little surprised because I thought it was sort.Slices was what caused allocations, which we got rid of with #1512. 🤷

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Nov 1, 2022

We can get a similar benefit by making dc.hashes have type byHash and then remove the conversion when we sort.

diff --git a/profiler/internal/fastdelta/fd.go b/profiler/internal/fastdelta/fd.go
index afdd7ba6..6f6780f0 100644
--- a/profiler/internal/fastdelta/fd.go
+++ b/profiler/internal/fastdelta/fd.go
@@ -107,7 +107,7 @@ type DeltaComputer struct {
        // saves some heap allocations
        scratch    [128]byte
        scratchIDs []uint64
-       hashes     []Hash
+       hashes     byHash
 
        // include* is for pruning the delta output, populated on merge pass
        includeMapping  map[uint64]struct{}
@@ -694,7 +694,7 @@ func (dc *DeltaComputer) readSample(v []byte, h hash.Hash, hash *Hash) (value sa
                }
                return true, nil
        })
-       sort.Sort(byHash(dc.hashes))
+       sort.Sort(&dc.hashes)
 
        h.Reset()
        for _, sub := range dc.hashes {

@felixge
Copy link
Member Author

felixge commented Nov 1, 2022

We can get a similar benefit by making dc.hashes have type byHash and then remove the conversion when we sort.

Fascinating. That was actually the first thing I tried, but I did sort.Sort(dc.hashes) instead of sort.Sort(&dc.hashes).

Seems like my version compiled to:

        sort.Sort(dc.hashes)
  0x122f87d             488b8338010000          MOVQ 0x138(BX), AX                                                                                      
  0x122f884             488b9340010000          MOVQ 0x140(BX), DX                                                                                      
  0x122f88b             488b8b48010000          MOVQ 0x148(BX), CX                                                                                      
  0x122f892             4889d3                  MOVQ DX, BX                                                                                             
  0x122f895             e886caddff              CALL runtime.convTslice(SB)                                                                             
  0x122f89a             4889c3                  MOVQ AX, BX                                                                                             
  0x122f89d             488d052c210e00          LEAQ go.itab.gopkg.in/DataDog/dd-trace-go.v1/profiler/internal/fastdelta.byHash,sort.Interface(SB), AX  
  0x122f8a4             e897c3e5ff              CALL sort.Sort(SB) 

Whereas yours compiles to

        sort.Sort(&dc.hashes)
  0x122f87d             488d8b38010000          LEAQ 0x138(BX), CX                                                                                      
  0x122f884             488d05751f0e00          LEAQ go.itab.*gopkg.in/DataDog/dd-trace-go.v1/profiler/internal/fastdelta.byHash,sort.Interface(SB), AX 
  0x122f88b             4889cb                  MOVQ CX, BX                                                                                             
  0x122f88e             e8adc3e5ff              CALL sort.Sort(SB)

I'm not sure I fully understand why this is happening. If you do, please share.

Either way, I'll update this to use your version.

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Nov 1, 2022

vscode actually tab-completed sort.Sort(dc.h) to sort.Sort(&dc.hashes) for me. I'm digging into why that is, and why it works... (it doesn't always do that)

Co-authored-by: Nick Ripley <97066770+nsrip-dd@users.noreply.github.com>
@nsrip-dd
Copy link
Contributor

nsrip-dd commented Nov 1, 2022

I think the reason this works is that by passing the address of dc.hashes to sort.Sort, we're using the byHash that already exists (edit: and probably already went on the heap). If we pass dc.hashes by value, we're copying it (the slice header) and the new copy (of the slice header) needs to go on the heap.

@felixge
Copy link
Member Author

felixge commented Nov 1, 2022

I think the reason this works is that by passing the address of dc.hashes to sort.Sort, we're using the byHash that already exists. If we pass dc.hashes by value, we're copying it (the slice header) and the new copy (of the slice header) needs to go on the heap.

Yeah, that's what I figured from looking at runtime.convTslice as well. What I don't understand is if this is a language spec thing or a compiler quirk. I mean I know that an interface internally needs a pointer to the value it's holding. But why does it need to make a copy of the slice header and use a pointer to that instead of a pointer to the slice header itself? I can't think of a language semantic reason for that 🤔

@felixge felixge merged commit e8bfed7 into feature-fast-delta-profiling Nov 1, 2022
@felixge felixge deleted the felix.geisendoerfer/feature-fast-delta-profiling-2 branch November 1, 2022 18:14
@felixge
Copy link
Member Author

felixge commented Nov 3, 2022

But why does it need to make a copy of the slice header and use a pointer to that instead of a pointer to the slice header itself?

I think I understand it now. There is no language spec reason for this. It's simply the heap escape analysis not being powerful enough to understand that the copy of the slice header doesn't have to escape to the heap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants