Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/bulk: use slab buffer for kv pairs #36743

Merged
merged 3 commits into from Apr 16, 2019

Conversation

Projects
None yet
3 participants
@dt
Copy link
Member

commented Apr 11, 2019

Currently the BufferingAdder uses a slice kvPair to buffer key value
pairs, where each element has a key and value byte slice. These two
slices mean the overhead per buffered pair is 48 bytes.

This overhead is actually a significant faction -- or even multiple --
of the total size of the buffered data in some cases, particularly in
index encoded data where values are often nil.

This adds a new slab-backed buffer, replacing the key and value slices
in each entry with key and value uint64s, which each index into the
single slab. Each uint64 contains the offset into the slab and legtht of
the key/value, packed together into the high and low bits of the uint64.

This buffer only adds 16b of overhead per k/v pair stored compared to
the 48b of keeping individual slices, and is still sortable by sorting
the slice of offset/len pairs.

The packed offset and len impose a maximum size on those fields -- in
our case the total buffer cannot be more than 32gb and the length of any
single key or value cannot exceed 512mb.

The batcher already needs to track its last key added and takes sorted input,
making checking for duplicates there easy.

While I'm here, given that it takes sorted input, we don't need to be comparing
the key to start and end key to decide if we should update start and end -- if
start is not set, it is the first key and is by definition (given the sorted
input) the start key, and the most recent key is similarly by definition also
the end key.

This uses the slab-backed key/value buffer to store the buffered kvs
instead of individual slices. This cuts the per-pair overhead from 48b
to 16b.

Release note (performance improvement): improved memory efficiency of
bulk ingestion and index backfills buffing.

dt added some commits Apr 4, 2019

storage/bulk: add slab-backed kv buffer
Currently the BufferingAdder uses a slice kvPair to buffer key value
pairs, where each element has a key and value byte slice. These two
slices mean the overhead per buffered pair is 48 bytes.

This overhead is actually a significant faction -- or even multiple --
of the total size of the buffered data in some cases, particularly in
index encoded data where values are often nil.

This adds a new slab-backed buffer, replacing the key and value slices
in each entry with key and value uint64s, which each index into the
single slab. Each uint64 contains the offset into the slab and legtht of
the key/value, packed together into the high and low bits of the uint64.

This buffer only adds 16b of overhead per k/v pair stored compared to
the 48b of keeping individual slices, and is still sortable by sorting
the slice of offset/len pairs.

The packed offset and len impose a maximum size on those fields -- in
our case the total buffer cannot be more than 32gb and the length of any
single key or value cannot exceed 512mb.

Release note: none.
storage/bulk: push duplicate skipping/errors to batcher
The batcher already needs to track its last key added and takes sorted input,
making checking for duplicates there easy.

While I'm here, given that it takes sorted input, we don't need to be comparing
the key to start and end key to decide if we should update start and end -- if
start is not set, it is the first key and is by definition (given the sorted
input) the start key, and the most recent key is similarly by definition also
the end key.

Release note: none.

@dt dt requested review from jordanlewis, lucy-zhang and vivekmenezes Apr 11, 2019

@dt dt requested a review from cockroachdb/core-prs as a code owner Apr 11, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

This change is Reviewable

@dt dt changed the title Slab storage/bulk: use slab buffer for kv pairs Apr 11, 2019

@vivekmenezes
Copy link
Collaborator

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jordanlewis, @lucy-zhang, and @vivekmenezes)


pkg/storage/bulk/kv_buf.go, line 29 at r3 (raw file):

	keySpan uint64
	valSpan uint64
}

I believe you need three fields (location, key-length, value-length): (36, 28, 28 bits) < 12 bytes. I'd argue that the key length can be smaller but I see no space savings from that. So perhaps

type kVBufEntry struct {
keySpan uint64
valLength uint32
}

@dt
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @lucy-zhang, and @vivekmenezes)


pkg/storage/bulk/kv_buf.go, line 29 at r3 (raw file):

Previously, vivekmenezes wrote…

I believe you need three fields (location, key-length, value-length): (36, 28, 28 bits) < 12 bytes. I'd argue that the key length can be smaller but I see no space savings from that. So perhaps

type kVBufEntry struct {
keySpan uint64
valLength uint32
}

Yeah, I thought about that, but I don't know if it is quite that easy: AFAIK, the compiler will just pad the struct up to a multiple of the size of the largest element -- so if we shrink val to 4 while key is still 8, I think the compiler pads us up to 16.

We could potentially do 3 x uint32, and even pretty simply if we just wanted to trade away buffer size, but I figured I wanted to start simple and we could pursue further optimizations if it actually yields significant return and indicates this is a good place to invest more effort.

@vivekmenezes

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

thanks will take a look. I see it's not compiling

@dt dt force-pushed the dt:slab branch 2 times, most recently from 3781255 to c489cbd Apr 11, 2019

@vivekmenezes
Copy link
Collaborator

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jordanlewis, and @lucy-zhang)


pkg/storage/bulk/buffering_adder.go, line 104 at r4 (raw file):

// Flush flushes any buffered kvs to the batcher.
func (b *BufferingAdder) Flush(ctx context.Context) error {
	if len(b.curBuf.entries) == 0 {

b.curbuf.Len()


pkg/storage/bulk/kv_buf.go, line 52 at r4 (raw file):

)

func (b *kvBuf) append(k, v []byte) {

I think we should have this method return an error and not panic within this code.


pkg/storage/bulk/kv_buf.go, line 97 at r4 (raw file):

// Less implements sort.Interface.
func (b *kvBuf) Less(i, j int) bool {
	return bytes.Compare(b.read(b.entries[i].keySpan), b.read(b.entries[j].keySpan)) < 0

return b.Key(i).Less(b.Key(j))


pkg/storage/bulk/sst_batcher.go, line 90 at r4 (raw file):

		err.Value = append(err.Value, value...)
		return err
	}

return storagebase.DuplicateKeyError{Key: key.key, Value: value}

storage/bulk: use slab-backed buffer in BufferingAdder
This uses the slab-backed key/value buffer to store the buffered kvs
instead of individual slices. This cuts the per-pair overhead from 48b
to 16b.

Release note (performance improvement): improved memory efficiency of
bulk ingestion and index backfills buffing.

@dt dt force-pushed the dt:slab branch from c489cbd to aa72538 Apr 15, 2019

@dt
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @lucy-zhang, and @vivekmenezes)


pkg/storage/bulk/buffering_adder.go, line 104 at r4 (raw file):

Previously, vivekmenezes wrote…

b.curbuf.Len()

Done.


pkg/storage/bulk/kv_buf.go, line 52 at r4 (raw file):

Previously, vivekmenezes wrote…

I think we should have this method return an error and not panic within this code.

Done.


pkg/storage/bulk/kv_buf.go, line 97 at r4 (raw file):

Previously, vivekmenezes wrote…

return b.Key(i).Less(b.Key(j))

I was thinking it wouldn't inline those and this gets called a lot in during Sort... but I guess we can wait to see how long the sort takes in profiling to mess with it.


pkg/storage/bulk/sst_batcher.go, line 90 at r4 (raw file):

Previously, vivekmenezes wrote…

return storagebase.DuplicateKeyError{Key: key.key, Value: value}

we can't use the slices directly, so we need usual append copy pattern, and it gets pretty long for one line so I broke it up like this.

@vivekmenezes
Copy link
Collaborator

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @lucy-zhang)

@dt

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

bors r+

craig bot pushed a commit that referenced this pull request Apr 16, 2019

Merge #36743
36743: storage/bulk: use slab buffer for kv pairs r=dt a=dt

Currently the BufferingAdder uses a slice kvPair to buffer key value
pairs, where each element has a key and value byte slice. These two
slices mean the overhead per buffered pair is 48 bytes.

This overhead is actually a significant faction -- or even multiple --
of the total size of the buffered data in some cases, particularly in
index encoded data where values are often nil.

This adds a new slab-backed buffer, replacing the key and value slices
in each entry with key and value uint64s, which each index into the
single slab. Each uint64 contains the offset into the slab and legtht of
the key/value, packed together into the high and low bits of the uint64.

This buffer only adds 16b of overhead per k/v pair stored compared to
the 48b of keeping individual slices, and is still sortable by sorting
the slice of offset/len pairs.

The packed offset and len impose a maximum size on those fields -- in
our case the total buffer cannot be more than 32gb and the length of any
single key or value cannot exceed 512mb.

The batcher already needs to track its last key added and takes sorted input,
making checking for duplicates there easy.

While I'm here, given that it takes sorted input, we don't need to be comparing
the key to start and end key to decide if we should update start and end -- if
start is not set, it is the first key and is by definition (given the sorted
input) the start key, and the most recent key is similarly by definition also
the end key.

This uses the slab-backed key/value buffer to store the buffered kvs
instead of individual slices. This cuts the per-pair overhead from 48b
to 16b.

Release note (performance improvement): improved memory efficiency of
bulk ingestion and index backfills buffing.

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
@craig

This comment has been minimized.

Copy link

commented Apr 16, 2019

Build succeeded

@craig craig bot merged commit aa72538 into cockroachdb:master Apr 16, 2019

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@dt dt deleted the dt:slab branch Apr 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.