New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsqlrun: optimize merge joiner memory account usage #30924

Merged
merged 1 commit into from Oct 9, 2018

Conversation

Projects
None yet
5 participants
@changangela
Collaborator

changangela commented Oct 3, 2018

Fixes #30687.

Instead of opening and closing the memory account on every single row of the merge join, we can allocate a certain minimum block size on the memory account. This way, we are not constantly requesting and releasing memory.

Benchmark results for BenchmarkMergeJoiner:

name                           old time/op    new time/op    delta
MergeJoiner/InputSize=0-8        5.37µs ±24%    5.85µs ±32%      ~     (p=0.280 n=10+10)
MergeJoiner/InputSize=4-8        14.7µs ±27%     8.4µs ± 7%   -42.88%  (p=0.000 n=10+9)
MergeJoiner/InputSize=16-8       22.9µs ±20%    12.7µs ±24%   -44.44%  (p=0.000 n=10+9)
MergeJoiner/InputSize=256-8       210µs ± 7%      89µs ± 4%   -57.62%  (p=0.000 n=10+9)
MergeJoiner/InputSize=4096-8     3.84ms ±17%    1.31ms ± 4%   -65.80%  (p=0.000 n=10+9)
MergeJoiner/InputSize=65536-8    54.3ms ±12%    22.5ms ± 6%   -58.51%  (p=0.000 n=10+10)

name                           old alloc/op   new alloc/op   delta
MergeJoiner/InputSize=0-8        6.65kB ± 0%    6.66kB ± 0%    +0.24%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4-8        9.72kB ± 0%    9.74kB ± 0%    +0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=16-8       9.72kB ± 0%    9.74kB ± 0%    +0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=256-8      32.8kB ± 0%    32.8kB ± 0%    +0.05%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4096-8      401kB ± 0%     401kB ± 0%    +0.00%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8    6.30MB ± 0%    6.30MB ± 0%    +0.00%  (p=0.000 n=9+10)

name                           old allocs/op  new allocs/op  delta
MergeJoiner/InputSize=0-8          15.0 ± 0%      15.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=4-8          17.0 ± 0%      17.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=16-8         17.0 ± 0%      17.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=256-8        47.0 ± 0%      47.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=4096-8        527 ± 0%       527 ± 0%      ~     (all equal)
MergeJoiner/InputSize=65536-8     8.21k ± 0%     8.21k ± 0%      ~     (all equal)

name                           old speed      new speed      delta
MergeJoiner/InputSize=4-8      4.42MB/s ±22%  7.65MB/s ± 7%   +72.86%  (p=0.000 n=10+9)
MergeJoiner/InputSize=16-8     11.4MB/s ±23%  20.3MB/s ±20%   +77.91%  (p=0.000 n=10+9)
MergeJoiner/InputSize=256-8    19.5MB/s ± 7%  46.0MB/s ± 4%  +135.56%  (p=0.000 n=10+9)
MergeJoiner/InputSize=4096-8   17.3MB/s ±19%  49.9MB/s ± 4%  +187.63%  (p=0.000 n=10+9)
MergeJoiner/InputSize=65536-8  19.4MB/s ±11%  46.6MB/s ± 6%  +140.58%  (p=0.000 n=10+10)

@changangela changangela requested a review from jordanlewis Oct 3, 2018

@changangela changangela requested review from cockroachdb/distsql-prs as code owners Oct 3, 2018

@cockroach-teamcity

This comment has been minimized.

Show comment
Hide comment
@cockroach-teamcity

cockroach-teamcity Oct 3, 2018

Member

This change is Reviewable

Member

cockroach-teamcity commented Oct 3, 2018

This change is Reviewable

@changangela

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 120 at r1 (raw file):

			ret := s.curGroup[:n:n]
			s.curGroup = s.curGroup[:0]
			s.memAcc.ResizeTo(evalCtx.Context, int64(0))

Instead of resizing the memAcc to 0, we can achieve significant improvement in speed if we if we reserve a minimum size for the MergeJoiner. ie. we grow the account to merge_joiner_min_size at the start, and then s.memAcc.ResizeTo(evalCtx.Context, int64(merge_joiner_min_size)) here. Thoughts?

@jordanlewis

This comment has been minimized.

Show comment
Hide comment
@jordanlewis

jordanlewis Oct 3, 2018

Member

Instead resizing the to a 0, we can achieve significant improvement in speed if we if we reserve a minimum size for the MergeJoiner. ie. we grow the account to merge_joiner_min_size at the start, and then s.memAcc.ResizeTo(evalCtx.Context, int64(merge_joiner_min_size)) here. Thoughts?

That does seem good, but why is that such an improvement?

Member

jordanlewis commented Oct 3, 2018

Instead resizing the to a 0, we can achieve significant improvement in speed if we if we reserve a minimum size for the MergeJoiner. ie. we grow the account to merge_joiner_min_size at the start, and then s.memAcc.ResizeTo(evalCtx.Context, int64(merge_joiner_min_size)) here. Thoughts?

That does seem good, but why is that such an improvement?

@changangela

Since the MergeJoiner is constantly re-requesting for blocks of memory, a lot of the overhead can be reduced if we "allocate" some memory right at the beginning of the merge join and hold onto it until the end. I've updated the code to have the streamGroupAccumulator allocate a single block of memory right at the start of the process, and updated the benchmark above.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@changangela changangela requested a review from cockroachdb/sql-rest-prs as a code owner Oct 4, 2018

@jordanlewis jordanlewis requested a review from solongordon Oct 4, 2018

@solongordon

The general approach makes sense to me. Some questions about implementation details.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 120 at r1 (raw file):

Previously, changangela (Angela Chang) wrote…

Instead resizing the to a 0, we can achieve significant improvement in speed if we if we reserve a minimum size for the MergeJoiner. ie. we grow the account to merge_joiner_min_size at the start, and then s.memAcc.ResizeTo(evalCtx.Context, int64(merge_joiner_min_size)) here. Thoughts?

Could this same thing be achieved at the BytesMonitor level by reserving minAllocated up front?


pkg/sql/distsqlrun/stream_group_accumulator.go, line 63 at r2 (raw file):

func (s *streamGroupAccumulator) start(ctx context.Context) {
	s.src.Start(ctx)
	s.memAcc.AllocateMinimum(ctx, 1)

Could you explain the choice of 1 here? I'm wondering if this should be related to row size. Otherwise if you have rows larger than poolAllocationSize, your optimization will never kick in.


pkg/util/mon/bytes_usage.go, line 452 at r2 (raw file):

	// decreases as used increases (and vice-versa).
	reserved     int64
	minAllocated int64

Could use a comment here.


pkg/util/mon/bytes_usage.go, line 481 at r2 (raw file):

}

// AllocateMinimum allocates a minimum allocated size (reserved + used) for the

SetMinAllocated might be clearer.


pkg/util/mon/bytes_usage.go, line 484 at r2 (raw file):

// account. The account would not be able to Shrink() unless it has surpassed
// this minimum allocated value.
func (b *BoundAccount) AllocateMinimum(ctx context.Context, blocks int64) {

I wonder if blocks is the best interface here. It's hard to know what the right blocks value is without knowing what the pool allocation size is. Maybe better to just specify it in bytes?


pkg/util/mon/bytes_usage.go, line 486 at r2 (raw file):

func (b *BoundAccount) AllocateMinimum(ctx context.Context, blocks int64) {
	if blocks >= 0 {
		b.minAllocated = blocks * DefaultPoolAllocationSize

Should this be b.mon.poolAllocationSize?


pkg/util/mon/bytes_usage.go, line 501 at r2 (raw file):

	b.used = 0
	b.reserved = 0
	b.minAllocated = 0

I'm not sure if it makes sense to zero this out in Clear since it's more of a configuration setting than a state variable. I guess it depends on how Clear is generally used.

@changangela

Thanks @solongordon for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 120 at r1 (raw file):

Previously, solongordon wrote…

Could this same thing be achieved at the BytesMonitor level by reserving minAllocated up front?

Hmm, the reason I thought it was best to put it into the BoundAccount instead of (*BytesMonitor).releaseBytes was because function calling is quite expensive in this part of the code (ie. one level deeper function call decreases the benchmark speed by ~5%), and so it might take little longer before the Shrink decides to stop shrinking. Is this what you had in mind?


pkg/sql/distsqlrun/stream_group_accumulator.go, line 63 at r2 (raw file):

Previously, solongordon wrote…

Could you explain the choice of 1 here? I'm wondering if this should be related to row size. Otherwise if you have rows larger than poolAllocationSize, your optimization will never kick in.

Yeah, I was actually just investigating that because using 1 optimized the joiner significantly. It turns out that in Shrink(), we were doing a (vacuously true) check on:

	if b.reserved >= b.mon.poolAllocationSize {
		b.mon.releaseBytes(ctx, b.reserved-b.mon.poolAllocationSize)
		b.reserved = b.mon.poolAllocationSize
	}

And, a very significant amount of overhead was due to the >=. If we change it to > then the block size of 1 becomes irrelevant. I think it makes sense for minAllocated to be related to row size in for the merge joiner though, will work on that.


pkg/util/mon/bytes_usage.go, line 452 at r2 (raw file):

Previously, solongordon wrote…

Could use a comment here.

Done.


pkg/util/mon/bytes_usage.go, line 481 at r2 (raw file):

Previously, solongordon wrote…

SetMinAllocated might be clearer.

Done.


pkg/util/mon/bytes_usage.go, line 484 at r2 (raw file):

Previously, solongordon wrote…

I wonder if blocks is the best interface here. It's hard to know what the right blocks value is without knowing what the pool allocation size is. Maybe better to just specify it in bytes?

I think it might be important to enforce that the minAllocated is some multiple of poolAllocationSize because of the way Shrink() is related to poolAllocationSize. For example, if the client chooses 20479 (poolAllocationSize * 2 - 1) bytes, then this is logically equivalent to using 10240 (poolAllocationSize) which might be confusing. What do you think?


pkg/util/mon/bytes_usage.go, line 486 at r2 (raw file):

Previously, solongordon wrote…

Should this be b.mon.poolAllocationSize?

Done. I'm not too sure what the difference is :P


pkg/util/mon/bytes_usage.go, line 501 at r2 (raw file):

Previously, solongordon wrote…

I'm not sure if it makes sense to zero this out in Clear since it's more of a configuration setting than a state variable. I guess it depends on how Clear is generally used.

Done. Yeah that makes sense.

@solongordon

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 120 at r1 (raw file):

Previously, changangela (Angela Chang) wrote…

Hmm, the reason I thought it was best to put it into the BoundAccount instead of (*BytesMonitor).releaseBytes was because function calling is quite expensive in this part of the code (ie. one level deeper function call decreases the benchmark speed by ~5%), and so it might take little longer before the Shrink decides to stop shrinking. Is this what you had in mind?

Sorry, I meant to say BoundAccount not BytesMonitor. I was just thinking that for your suggestion above, you could implement that in BoundAccount rather than the MergeJoiner itself and keep things simpler. Though perhaps you are trying to avoid the function calls to Grow?


pkg/util/mon/bytes_usage.go, line 484 at r2 (raw file):

Previously, changangela (Angela Chang) wrote…

I think it might be important to enforce that the minAllocated is some multiple of poolAllocationSize because of the way Shrink() is related to poolAllocationSize. For example, if the client chooses 20479 (poolAllocationSize * 2 - 1) bytes, then this is logically equivalent to using 10240 (poolAllocationSize) which might be confusing. What do you think?

Yeah, I see your point, though of the two I think it's more confusing to specify blocks without knowing what the block size is. And the concept of blocks is an implementation detail of the monitor so it feels weird to expose it. I'm not sure there's much harm in reserving more than specified, since that's already the case in calls to Grow.


pkg/util/mon/bytes_usage.go, line 486 at r2 (raw file):

Previously, changangela (Angela Chang) wrote…

Done. I'm not too sure what the difference is :P

Looks like the default value can be overridden via the increment parameter in MakeMonitor.

@solongordon

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 120 at r1 (raw file):

Previously, solongordon wrote…

Sorry, I meant to say BoundAccount not BytesMonitor. I was just thinking that for your suggestion above, you could implement that in BoundAccount rather than the MergeJoiner itself and keep things simpler. Though perhaps you are trying to avoid the function calls to Grow?

Oops, didn't realize the original comment referred to what was already implemented. Disregard.

@changangela

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 63 at r2 (raw file):

Previously, changangela (Angela Chang) wrote…

Yeah, I was actually just investigating that because using 1 optimized the joiner significantly. It turns out that in Shrink(), we were doing a (vacuously true) check on:

	if b.reserved >= b.mon.poolAllocationSize {
		b.mon.releaseBytes(ctx, b.reserved-b.mon.poolAllocationSize)
		b.reserved = b.mon.poolAllocationSize
	}

And, a very significant amount of overhead was due to the >=. If we change it to > then the block size of 1 becomes irrelevant. I think it makes sense for minAllocated to be related to row size in for the merge joiner though, will work on that.

Done.


pkg/util/mon/bytes_usage.go, line 484 at r2 (raw file):

Previously, solongordon (Solon) wrote…

Yeah, I see your point, though of the two I think it's more confusing to specify blocks without knowing what the block size is. And the concept of blocks is an implementation detail of the monitor so it feels weird to expose it. I'm not sure there's much harm in reserving more than specified, since that's already the case in calls to Grow.

Done.

@changangela

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/mon/bytes_usage.go, line 501 at r2 (raw file):

Previously, changangela (Angela Chang) wrote…

Done. Yeah that makes sense.

Since we're now reserving the minAllocated memory right on call to SetMinAllocated, I think it makes sense to reset minAllocated when used + reserved = 0

@solongordon

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 126 at r3 (raw file):

			s.curGroup = s.curGroup[:0]

			if totalSize > s.memAcc.Allocated() {

Hm, I'm worried this approach is too greedy now. If I understand this right, you now have a BoundAccount which only ever reserves more bytes without releasing any until it is closed. That means if there are many small groups with one large one, we'll reserve the large group size until the merge join is complete.

Maybe there is a middle ground, like setting minAllocated to the size of one row?

@changangela

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 126 at r3 (raw file):

Previously, solongordon (Solon) wrote…

Hm, I'm worried this approach is too greedy now. If I understand this right, you now have a BoundAccount which only ever reserves more bytes without releasing any until it is closed. That means if there are many small groups with one large one, we'll reserve the large group size until the merge join is complete.

Maybe there is a middle ground, like setting minAllocated to the size of one row?

Done.

@changangela

Fixed SetMinAllocated to release bytes if the new minAllocated shrinks below the current used. Also added some tests for the bytes_usage functions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@solongordon

Nice, a few last comments but this is looking good.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/stream_group_accumulator.go, line 59 at r4 (raw file):

		ordering:        ordering,
		memAcc:          memMonitor.MakeBoundAccount(),
		minAllocatedSet: false,

Nit: This line isn't really necessary since false is the zero value for bools. No harm having it though.


pkg/util/mon/bytes_usage.go, line 491 at r4 (raw file):

	if size >= 0 {
		b.minAllocated = b.mon.roundSize(size)
	}

You probably want to set minAllocated = 0 if size == 0, right?


pkg/util/mon/bytes_usage.go, line 508 at r4 (raw file):

		b.mon.releaseBytes(ctx, released)
		b.reserved -= released
	}

The logic above looks good to me after some staring, though it would be nice to add some basic unit tests to cover the different cases here, explicitly checking that the b.reserved value is as expected after SetMinAllocated is called.

@changangela

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/mon/bytes_usage.go, line 491 at r4 (raw file):

Previously, solongordon (Solon) wrote…

You probably want to set minAllocated = 0 if size == 0, right?

Yeah, mon.roundSize should take care of that properly.


pkg/util/mon/bytes_usage.go, line 508 at r4 (raw file):

Previously, solongordon (Solon) wrote…

The logic above looks good to me after some staring, though it would be nice to add some basic unit tests to cover the different cases here, explicitly checking that the b.reserved value is as expected after SetMinAllocated is called.

I've added some tests to cover the different logic paths in bytes_usage_test.

distsqlrun: optimize merge joiner memory account usage
Instead of opening and closing the memory account on every single row of the
merge join, we can allocate a certain minimum block size on the memory account.
This way, we are not constantly requesting and releasing memory.

Release note: None
@changangela

Updated the benchmark results in the comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@solongordon

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@changangela

This comment has been minimized.

Show comment
Hide comment
@changangela

changangela Oct 9, 2018

Collaborator

bors r+

Collaborator

changangela commented Oct 9, 2018

bors r+

craig bot pushed a commit that referenced this pull request Oct 9, 2018

Merge #30924
30924: distsqlrun: optimize merge joiner memory account usage r=changangela a=changangela

Instead of opening and closing the memory account on every single row of the merge join, we can allocate a certain minimum block size on the memory account. This way, we are not constantly requesting and releasing memory.

Benchmark results for `BenchmarkMergeJoiner`:

```
name                           old time/op    new time/op    delta
MergeJoiner/InputSize=0-8        5.37µs ±24%    5.85µs ±32%      ~     (p=0.280 n=10+10)
MergeJoiner/InputSize=4-8        14.7µs ±27%     8.4µs ± 7%   -42.88%  (p=0.000 n=10+9)
MergeJoiner/InputSize=16-8       22.9µs ±20%    12.7µs ±24%   -44.44%  (p=0.000 n=10+9)
MergeJoiner/InputSize=256-8       210µs ± 7%      89µs ± 4%   -57.62%  (p=0.000 n=10+9)
MergeJoiner/InputSize=4096-8     3.84ms ±17%    1.31ms ± 4%   -65.80%  (p=0.000 n=10+9)
MergeJoiner/InputSize=65536-8    54.3ms ±12%    22.5ms ± 6%   -58.51%  (p=0.000 n=10+10)

name                           old alloc/op   new alloc/op   delta
MergeJoiner/InputSize=0-8        6.65kB ± 0%    6.66kB ± 0%    +0.24%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4-8        9.72kB ± 0%    9.74kB ± 0%    +0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=16-8       9.72kB ± 0%    9.74kB ± 0%    +0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=256-8      32.8kB ± 0%    32.8kB ± 0%    +0.05%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4096-8      401kB ± 0%     401kB ± 0%    +0.00%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8    6.30MB ± 0%    6.30MB ± 0%    +0.00%  (p=0.000 n=9+10)

name                           old allocs/op  new allocs/op  delta
MergeJoiner/InputSize=0-8          15.0 ± 0%      15.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=4-8          17.0 ± 0%      17.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=16-8         17.0 ± 0%      17.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=256-8        47.0 ± 0%      47.0 ± 0%      ~     (all equal)
MergeJoiner/InputSize=4096-8        527 ± 0%       527 ± 0%      ~     (all equal)
MergeJoiner/InputSize=65536-8     8.21k ± 0%     8.21k ± 0%      ~     (all equal)

name                           old speed      new speed      delta
MergeJoiner/InputSize=4-8      4.42MB/s ±22%  7.65MB/s ± 7%   +72.86%  (p=0.000 n=10+9)
MergeJoiner/InputSize=16-8     11.4MB/s ±23%  20.3MB/s ±20%   +77.91%  (p=0.000 n=10+9)
MergeJoiner/InputSize=256-8    19.5MB/s ± 7%  46.0MB/s ± 4%  +135.56%  (p=0.000 n=10+9)
MergeJoiner/InputSize=4096-8   17.3MB/s ±19%  49.9MB/s ± 4%  +187.63%  (p=0.000 n=10+9)
MergeJoiner/InputSize=65536-8  19.4MB/s ±11%  46.6MB/s ± 6%  +140.58%  (p=0.000 n=10+10)
```

Co-authored-by: changangela <angelachang27@gmail.com>
@craig

This comment has been minimized.

Show comment
Hide comment
@craig

craig bot commented Oct 9, 2018

Build succeeded

@craig craig bot merged commit 279c2d8 into cockroachdb:master Oct 9, 2018

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
@changangela

This comment has been minimized.

Show comment
Hide comment
@changangela

changangela Oct 9, 2018

Collaborator

@knz we are thinking of backporting this commit, would you be able to take a look just in case we missed something?

Collaborator

changangela commented Oct 9, 2018

@knz we are thinking of backporting this commit, would you be able to take a look just in case we missed something?

@knz

This comment has been minimized.

Show comment
Hide comment
@knz

knz Oct 9, 2018

Member

will look tomorrow

Member

knz commented Oct 9, 2018

will look tomorrow

@knz

Friends, there are several things here that needs some additional work before I can even start to evaluate it for a backport.

I appreciate this is still early steps for Angela so please assume kindness in my words, I promise we'll take this easy and make it an opportunity to learn. Cheers!

The changes I am strongly suggesting (let me be honest, it's "requesting" at this point) can be brought in a separate PR. Ping me when that's created, I'll continue the discussion there.

Reviewed 2 of 5 files at r2, 1 of 4 files at r3, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/util/mon/bytes_usage.go, line 453 at r5 (raw file):

	reserved int64
	// minAllocated is a minimum allocated bytes size that the account reach
	// before being able to release bytes.

This comment is:

  1. unclear about what this is trying to achieve (I don't understand the sentence)

  2. is not in the right place: the comment above (for BoundAccount) and the comment above that (at the start of the file) must thoroughly document the "big picture" and the protocol to use these data structures. This PR is making a rather major change in the protocol, and it is therefore absolutely required to update the module doc to reflect this change. The extended commentS will need exampleS to

    a) justify this change
    b) show how to use the new interface

  3. the commentS you are going to use must describe more-or-less axiomatically the mathematical relationship between reserved, used, minAllocated, the size requested to the monitor so far. I don't see this relationship stated/specified anywhere, so I can't evaluate whether your code preserves the required invariants.

  4. in general - this single one liner here is increasing the size of a pretty common data structure by 25%. I'd like to see alloc benchmarks beyond the single-use test you've added, for example a couple of the standard SQL benchmarks, to properly confirmed this addition is amortized throughout CockroachDB.


pkg/util/mon/bytes_usage.go, line 475 at r5 (raw file):

// Allocated returns the total number of bytes which this account is using or
// reserving

period missing.


pkg/util/mon/bytes_usage.go, line 486 at r5 (raw file):

// SetMinAllocated allocates a minimum Allocated size (reserved + used) for the
// account. The account would not be able to Shrink() unless it has surpassed

"The account would not be able to ..."

  1. what happens if one tries?

  2. why is this restriction put in place? (see my comment above)


pkg/util/mon/bytes_usage.go, line 490 at r5 (raw file):

func (b *BoundAccount) SetMinAllocated(ctx context.Context, size int64) error {
	if size < 0 {
		panic(fmt.Sprintf("%s: cannot set bound account min allocated to a negative value",

Don't use panic in functions that return error. Instead use pgerror.NewErrorf(pgerror.CodeInternalError, "programming error: ...")


pkg/util/mon/bytes_usage.go, line 493 at r5 (raw file):

			b.mon.name))
	}
	b.minAllocated = b.mon.roundSize(size)

You may have noticed that every function larger than ~10 LOCs in this file has comments in the body of the function to explain what is going on.

Aside (not specific to this code, but to give you insight into the culture): the purpose of exhaustive comments (and exhaustive commit messages, and PR descriptions) is not to just be nice and polite. The purpose is to establish a baseline about your own expectations (as a human) about what you (as a human) intend your code to do. Then the reviewer can compare your stated intent with the actual realization, and properly recognize any divergences as bugs.

The problem we're trying to avoid this way is when the code appears correct on its face but is not actually what you intended. Unless you tell us what you intend these bugs would survive review.

So yeah before I can review all this work I need a lot more explanation (in writing! in the code, commit message and PR description) about:

  • what you're intending to do here, why you're intending it
  • what you are doing and how you expect it to work
  • how life is going to be different for users of this module from now on

pkg/util/mon/bytes_usage.go, line 661 at r5 (raw file):

// reserveBytes().
func (mm *BytesMonitor) releaseBytes(ctx context.Context, sz int64) {
	if sz == 0 {

Please justify this conditional. When this code was designed, the sz = 0 case was sufficiently uncommon.

When you add a conditional on a hot path, you're making every call of that method pay the price of this conditional, even when the condition is rarely taken. The benefit when the condition is true must be so large that it properly outweighs/amortizes the cost paid on every other call. Is that the case here?

@changangela

This comment has been minimized.

Show comment
Hide comment
@changangela

changangela Oct 10, 2018

Collaborator

Thanks @knz for the review! I am working on the changes and will create another PR.

Collaborator

changangela commented Oct 10, 2018

Thanks @knz for the review! I am working on the changes and will create another PR.

changangela added a commit to changangela/cockroach that referenced this pull request Oct 10, 2018

Revert "Merge cockroachdb#30924"
This reverts commit 472d36f, reversing
changes made to 6b912fa.

Release note: None

changangela added a commit to changangela/cockroach that referenced this pull request Oct 10, 2018

Revert "Merge cockroachdb#30924"
This reverts commit 472d36f, reversing
changes made to 6b912fa.

Release note: None

craig bot pushed a commit that referenced this pull request Oct 11, 2018

Merge #31216 #31219
31216: sql: add merge joiner benchmark and bytes usage optimization r=changangela a=changangela

Reverted #30924 for now to compare different approaches (discussed in #31191). This PR is mainly for adding some merge joiner benchmarks as well a small change in `BoundAccount.Shrink()` that significantly improves the merge joiner performance. This way, we can safely backport this change.

MergeJoinerBenchmark against `release-2.1`:

```
name                           old time/op    new time/op    delta
MergeJoiner/InputSize=0-8        4.31µs ±10%    4.33µs ±15%      ~     (p=0.968 n=10+9)
MergeJoiner/InputSize=4-8        8.34µs ± 3%    8.10µs ± 4%      ~     (p=0.074 n=8+9)
MergeJoiner/InputSize=16-8       16.3µs ± 2%    11.1µs ± 5%   -31.87%  (p=0.000 n=9+9)
MergeJoiner/InputSize=256-8       190µs ± 2%      85µs ± 2%   -55.04%  (p=0.000 n=10+8)
MergeJoiner/InputSize=4096-8     2.96ms ± 2%    1.28ms ± 2%   -56.86%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8    49.0ms ± 5%    20.6ms ± 1%   -57.88%  (p=0.000 n=10+10)

name                           old alloc/op   new alloc/op   delta
MergeJoiner/InputSize=0-8        6.42kB ± 0%    6.65kB ± 0%    +3.49%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4-8        9.50kB ± 0%    9.72kB ± 0%    +2.36%  (p=0.000 n=10+10)
MergeJoiner/InputSize=16-8       9.50kB ± 0%    9.72kB ± 0%    +2.36%  (p=0.000 n=10+10)
MergeJoiner/InputSize=256-8      32.5kB ± 0%    32.8kB ± 0%    +0.69%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4096-8      401kB ± 0%     401kB ± 0%    +0.06%  (p=0.000 n=9+10)
MergeJoiner/InputSize=65536-8    6.30MB ± 0%    6.30MB ± 0%    +0.00%  (p=0.000 n=9+10)

name                           old allocs/op  new allocs/op  delta
MergeJoiner/InputSize=0-8          14.0 ± 0%      15.0 ± 0%    +7.14%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4-8          16.0 ± 0%      17.0 ± 0%    +6.25%  (p=0.000 n=10+10)
MergeJoiner/InputSize=16-8         16.0 ± 0%      17.0 ± 0%    +6.25%  (p=0.000 n=10+10)
MergeJoiner/InputSize=256-8        46.0 ± 0%      47.0 ± 0%    +2.17%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4096-8        526 ± 0%       527 ± 0%    +0.19%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8     8.21k ± 0%     8.21k ± 0%    +0.01%  (p=0.000 n=10+10)

name                           old speed      new speed      delta
MergeJoiner/InputSize=4-8      7.67MB/s ± 3%  7.91MB/s ± 4%      ~     (p=0.070 n=8+9)
MergeJoiner/InputSize=16-8     15.7MB/s ± 2%  23.0MB/s ± 5%   +46.89%  (p=0.000 n=9+9)
MergeJoiner/InputSize=256-8    21.6MB/s ± 2%  48.0MB/s ± 2%  +122.41%  (p=0.000 n=10+8)
MergeJoiner/InputSize=4096-8   22.1MB/s ± 2%  51.3MB/s ± 2%  +131.81%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8  21.4MB/s ± 5%  50.8MB/s ± 1%  +137.16%  (p=0.000 n=10+10)
```

MergeJoinerBenchmark compared with `master` (`master` already has this exact optimization, we want to ensure that the performance did not deteriorate)
```
name                                          old time/op    new time/op    delta
MergeJoiner/InputSize=0-8                       4.61µs ± 9%    4.45µs ± 5%     ~     (p=0.060 n=10+10)
MergeJoiner/InputSize=4-8                       8.34µs ± 6%    8.00µs ±10%   -4.09%  (p=0.037 n=9+10)
MergeJoiner/InputSize=16-8                      11.6µs ± 4%    11.4µs ± 5%     ~     (p=0.123 n=10+10)
MergeJoiner/InputSize=256-8                     88.3µs ± 3%    89.8µs ± 6%     ~     (p=0.258 n=9+9)
MergeJoiner/InputSize=4096-8                    1.33ms ± 4%    1.27ms ± 5%   -4.67%  (p=0.001 n=9+10)
MergeJoiner/InputSize=65536-8                   22.4ms ± 6%    21.3ms ±10%     ~     (p=0.052 n=10+10)
MergeJoiner/OneSideRepeatInputSize=0-8          4.57µs ±15%    4.38µs ± 6%     ~     (p=0.353 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4-8          7.71µs ±10%    7.71µs ± 4%     ~     (p=0.549 n=10+9)
MergeJoiner/OneSideRepeatInputSize=16-8         11.8µs ±29%    10.5µs ± 6%  -10.77%  (p=0.043 n=10+10)
MergeJoiner/OneSideRepeatInputSize=256-8        82.8µs ± 4%    80.3µs ± 5%   -2.93%  (p=0.004 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4096-8       1.25ms ± 2%    1.54ms ±13%  +23.09%  (p=0.000 n=9+10)
MergeJoiner/OneSideRepeatInputSize=65536-8      24.2ms ± 3%    26.7ms ± 9%  +10.08%  (p=0.000 n=10+9)
MergeJoiner/BothSidesRepeatInputSize=0-8        4.60µs ±10%    4.36µs ±10%     ~     (p=0.063 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=4-8        7.13µs ± 4%    7.62µs ±17%   +6.84%  (p=0.005 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=16-8       8.66µs ±14%    8.24µs ± 3%     ~     (p=0.549 n=10+9)
MergeJoiner/BothSidesRepeatInputSize=256-8      22.1µs ± 5%    23.2µs ± 9%   +5.06%  (p=0.004 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=4096-8      219µs ± 4%     240µs ±25%     ~     (p=0.065 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=65536-8    1.19ms ± 3%    1.17ms ± 1%   -2.18%  (p=0.001 n=10+9)

name                                          old alloc/op   new alloc/op   delta
MergeJoiner/InputSize=0-8                       6.66kB ± 0%    6.65kB ± 0%   -0.24%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4-8                       9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=16-8                      9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/InputSize=256-8                     32.8kB ± 0%    32.8kB ± 0%   -0.05%  (p=0.000 n=10+10)
MergeJoiner/InputSize=4096-8                     401kB ± 0%     401kB ± 0%   -0.00%  (p=0.000 n=10+10)
MergeJoiner/InputSize=65536-8                   6.30MB ± 0%    6.30MB ± 0%   -0.00%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=0-8          6.66kB ± 0%    6.65kB ± 0%   -0.24%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4-8          9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=16-8         9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=256-8        42.0kB ± 0%    42.0kB ± 0%   -0.04%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4096-8        751kB ± 0%     751kB ± 0%   -0.00%  (p=0.000 n=10+10)
MergeJoiner/OneSideRepeatInputSize=65536-8      15.5MB ± 0%    15.5MB ± 0%   -0.00%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=0-8        6.66kB ± 0%    6.65kB ± 0%   -0.24%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=4-8        9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=16-8       9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=256-8      9.74kB ± 0%    9.72kB ± 0%   -0.16%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=4096-8     14.3kB ± 0%    14.3kB ± 0%   -0.11%  (p=0.000 n=10+10)
MergeJoiner/BothSidesRepeatInputSize=65536-8    38.9kB ± 0%    38.9kB ± 0%   -0.04%  (p=0.000 n=10+10)

name                                          old allocs/op  new allocs/op  delta
MergeJoiner/InputSize=0-8                         15.0 ± 0%      15.0 ± 0%     ~     (all equal)
MergeJoiner/InputSize=4-8                         17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/InputSize=16-8                        17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/InputSize=256-8                       47.0 ± 0%      47.0 ± 0%     ~     (all equal)
MergeJoiner/InputSize=4096-8                       527 ± 0%       527 ± 0%     ~     (all equal)
MergeJoiner/InputSize=65536-8                    8.21k ± 0%     8.21k ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=0-8            15.0 ± 0%      15.0 ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=4-8            17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=16-8           17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=256-8          49.0 ± 0%      49.0 ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=4096-8          536 ± 0%       536 ± 0%     ~     (all equal)
MergeJoiner/OneSideRepeatInputSize=65536-8       8.23k ± 0%     8.23k ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=0-8          15.0 ± 0%      15.0 ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=4-8          17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=16-8         17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=256-8        17.0 ± 0%      17.0 ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=4096-8       23.0 ± 0%      23.0 ± 0%     ~     (all equal)
MergeJoiner/BothSidesRepeatInputSize=65536-8      49.0 ± 0%      49.0 ± 0%     ~     (all equal)

name                                          old speed      new speed      delta
MergeJoiner/InputSize=4-8                     7.68MB/s ± 6%  8.02MB/s ± 9%   +4.43%  (p=0.037 n=9+10)
MergeJoiner/InputSize=16-8                    22.2MB/s ± 4%  22.6MB/s ± 5%     ~     (p=0.123 n=10+10)
MergeJoiner/InputSize=256-8                   46.4MB/s ± 3%  45.6MB/s ± 5%     ~     (p=0.231 n=9+9)
MergeJoiner/InputSize=4096-8                  49.2MB/s ± 4%  51.6MB/s ± 5%   +4.93%  (p=0.001 n=9+10)
MergeJoiner/InputSize=65536-8                 46.8MB/s ± 6%  49.3MB/s ± 9%     ~     (p=0.052 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4-8        8.32MB/s ± 9%  8.31MB/s ± 4%     ~     (p=0.549 n=10+9)
MergeJoiner/OneSideRepeatInputSize=16-8       22.1MB/s ±24%  24.3MB/s ± 6%  +10.32%  (p=0.037 n=10+10)
MergeJoiner/OneSideRepeatInputSize=256-8      49.5MB/s ± 4%  51.0MB/s ± 5%   +3.03%  (p=0.003 n=10+10)
MergeJoiner/OneSideRepeatInputSize=4096-8     52.5MB/s ± 2%  42.8MB/s ±12%  -18.43%  (p=0.000 n=9+10)
MergeJoiner/OneSideRepeatInputSize=65536-8    43.3MB/s ± 3%  39.4MB/s ± 9%   -8.94%  (p=0.000 n=10+9)
MergeJoiner/BothSidesRepeatInputSize=4-8      8.98MB/s ± 4%  8.45MB/s ±15%   -5.96%  (p=0.005 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=16-8     29.7MB/s ±13%  31.1MB/s ± 3%     ~     (p=0.549 n=10+9)
MergeJoiner/BothSidesRepeatInputSize=256-8     186MB/s ± 5%   177MB/s ± 8%   -4.73%  (p=0.004 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=4096-8    299MB/s ± 3%   276MB/s ±21%     ~     (p=0.065 n=9+10)
MergeJoiner/BothSidesRepeatInputSize=65536-8   878MB/s ± 3%   898MB/s ± 1%   +2.22%  (p=0.001 n=10+9)
```

31219: kubernetes: Update request-cert image version to include recent fix r=a-robinson a=a-robinson

See cockroachdb/k8s#14

Release note: None

Co-authored-by: changangela <angelachang27@gmail.com>
Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment