Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[breaking/format] Add key-offset index to the end of SST table #881

Merged
merged 46 commits into from Jul 5, 2019

Conversation

@jarifibrahim
Copy link
Member

commented Jun 20, 2019

The original PR #816 got closed because I deleted the release/v2.1.0 branch (we didn't need it) and Github won't let me reopen that PR or change the target branch.

This is a breaking change. Older versions of badger data will not work after this change

This commit adds the key-offset index used for searching the blocks in a table to the end of the SST file. The length of the index is stored in the last 4 bytes of the file.

Earlier we used to build the index at runtime by reading the keys stored at the offset specified in the footer of the file. With this change, we won't build the index, we'll read it from the file.

Benchmarks

The following test was run on a badger DB with 300 million sorted key value pairs (DB size 57GB)

  1. With new footer (the proposed changes in this PR)
go test -run=^$ github.com/dgraph-io/badger -bench ^BenchmarkDBOpen$ -benchdir="/home/ibrahim/partition/300-million-sorted-footer" -v
badger 2019/05/28 21:31:20 INFO: 92 tables out of 155 opened in 3.005s
badger 2019/05/28 21:31:22 INFO: All 155 tables opened in 5.132s
badger 2019/05/28 21:31:22 INFO: Replaying file id: 299 at offset: 116366000
badger 2019/05/28 21:31:22 INFO: Replay took: 355.887µs
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDBOpen-8   	       1	5228268412 ns/op
PASS
ok  	github.com/dgraph-io/badger	5.291s
  1. With existing implementation
go test -run=^$ github.com/dgraph-io/badger -bench ^BenchmarkDBOpen$ -benchdir="/home/ibrahim/partition/300-million-sorted" -v
badger 2019/05/28 21:41:46 INFO: 17 tables out of 155 opened in 3.337s
badger 2019/05/28 21:41:49 INFO: 34 tables out of 155 opened in 6.099s
badger 2019/05/28 21:41:52 INFO: 55 tables out of 155 opened in 9.267s
badger 2019/05/28 21:41:55 INFO: 72 tables out of 155 opened in 12.164s
badger 2019/05/28 21:41:58 INFO: 91 tables out of 155 opened in 15.165s
badger 2019/05/28 21:42:01 INFO: 109 tables out of 155 opened in 18.159s
badger 2019/05/28 21:42:04 INFO: 126 tables out of 155 opened in 21.105s
badger 2019/05/28 21:42:07 INFO: 145 tables out of 155 opened in 24.072s
badger 2019/05/28 21:42:09 INFO: All 155 tables opened in 25.695s
badger 2019/05/28 21:42:09 INFO: Replaying file id: 299 at offset: 116366000
badger 2019/05/28 21:42:09 INFO: Replay took: 100.651µs
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDBOpen-8   	       1	25776272170 ns/op
PASS
ok  	github.com/dgraph-io/badger	25.858s

The following code was used for benchmarking (also added to db2_test.go file)

// The following benchmark test is supposed to be run against a badger directory with some data.
// Use badger fill to create data if it doesn't exist.
func BenchmarkDBOpen(b *testing.B) {
	if *benchDir == "" {
		b.Skip("Please set -benchdir to badger directory")
	}
	dir := *benchDir
	opt := DefaultOptions
	opt.Dir = dir
	opt.ValueDir = dir
	opt.ReadOnly = true
	for i := 0; i < b.N; i++ {
		db, err := Open(opt)
		require.NoError(b, err)
		require.NoError(b, db.Close())
	}
}

This change is Reviewable

jarifibrahim added some commits May 17, 2019

Add key-offset index to the end of SST table
This commit adds the key-offset index used for searching the blocks in
a table to the end the SST file. The length of the index is stored in
the last 4 bytes of the file.
@martinmr
Copy link
Member

left a comment

Reviewed 14 of 14 files at r1.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


db2_test.go, line 105 at r1 (raw file):

var manual = flag.Bool("manual", false, "Set when manually running some tests.")

// Badger dir to be used for performing db.Open benchmark

minor: period at end of comments.


table/builder.go, line 160 at r1 (raw file):

	b.addHelper([]byte{}, y.ValueStruct{})
	y.AssertTrue(b.buf.Len() < math.MaxUint32)
	// Add key to the block index

minor: period at the end.


table/builder.go, line 233 at r1 (raw file):

	}

	// Write bloom filter.

why is the bloom filter being written before the block index now?


table/builder.go, line 242 at r1 (raw file):

	y.AssertTrue(n < math.MaxUint32)
	// Write index size

nit: period at the end.


table/builder.go, line 253 at r1 (raw file):

func (b *Builder) writeChecksum(data []byte) {
	// Build checksum for the index

nit: period at the end


table/builder.go, line 267 at r1 (raw file):

	}

	// Write checksum to the file

minor: period at the end.


table/builder.go, line 274 at r1 (raw file):

	y.AssertTrue(n < math.MaxUint32)
	// Write checksum size

minor: period at the end.


table/builder_test.go, line 25 at r1 (raw file):

	"github.com/dgraph-io/badger/options"
	"github.com/dgraph-io/badger/y"

@campoy is making a change to move to v2: #882

If that is submitted before your change, merge and make sure you update the references. So your imports here would become:

	"github.com/dgraph-io/badger/v2/options"
	"github.com/dgraph-io/badger/v2/y"

Same for other badger imports you added.


table/builder_test.go, line 36 at r1 (raw file):

		require.Len(t, table.blockIndex, 1)
	})
	t.Run("multiple keys", func(t *testing.T) {

If these tests are independent, just have them in separate methods.


table/table.go, line 223 at r1 (raw file):

	readPos := t.tableSize

	// Read checksum len from the last 4 bytes

minor: period at the end


table/table.go, line 228 at r1 (raw file):

	checksumLen := binary.BigEndian.Uint32(buf)

	// Read checksum

minor: period at the end


table/table.go, line 236 at r1 (raw file):

	}

	// Read index size from the footer

minor: period at the end


table/table.go, line 240 at r1 (raw file):

	buf = t.readNoFail(readPos, 4)
	indexLen := int(binary.BigEndian.Uint32(buf))
	// Read index

minor: period at the end


table/table.go, line 265 at r1 (raw file):

	ko := t.blockIndex[idx]
	blk := block{
		offset: int(ko.Offset),

I see a lot of conversion between int types in this file. Is there a way to avoid that?

@jarifibrahim
Copy link
Member Author

left a comment

Reviewable status: 3 of 14 files reviewed, 4 unresolved discussions (waiting on @ashish-goswami, @campoy, @manishrjain, and @martinmr)


table/builder.go, line 233 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

why is the bloom filter being written before the block index now?

We're writing bloom filter along with the index. They're part of the same protobuf.
I'll reword the comment so that it makes more sense.


table/builder_test.go, line 25 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

@campoy is making a change to move to v2: #882

If that is submitted before your change, merge and make sure you update the references. So your imports here would become:

	"github.com/dgraph-io/badger/v2/options"
	"github.com/dgraph-io/badger/v2/y"

Same for other badger imports you added.

Yes. I've changed it. Thanks!


table/builder_test.go, line 36 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

If these tests are independent, just have them in separate methods.

Both the tests are related to validation of the table index and this looks like a logical grouping to me.
Keeping it this way makes it more readable as well.


table/table.go, line 265 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I see a lot of conversion between int types in this file. Is there a way to avoid that?

Not really. t.read(..) needs ints and protobuf has int16, int32, int64. No matter which type we use, we'll have to convert it to int.

One thing we can do is change the param type of t.read(..) from int to int64. The function internally converts int to int64 and then we can use int64 everywhere.

@manishrjain
Copy link
Member

left a comment

:lgtm_strong: Got a couple of comments.

Reviewed 5 of 11 files at r2.
Reviewable status: 8 of 14 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @campoy, @jarifibrahim, and @martinmr)


table/builder.go, line 191 at r1 (raw file):

// ReachedCapacity returns true if we... roughly (?) reached capacity?
func (b *Builder) ReachedCapacity(cap int64) bool {
	estimateSz := b.buf.Len() + 8 /* empty header */ + 4 /* Index length */ +

Move each + to a new line and use // instead of /*.


table/table.go, line 265 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Not really. t.read(..) needs ints and protobuf has int16, int32, int64. No matter which type we use, we'll have to convert it to int.

One thing we can do is change the param type of t.read(..) from int to int64. The function internally converts int to int64 and then we can use int64 everywhere.

That sounds like a good idea.

@jarifibrahim
Copy link
Member Author

left a comment

Reviewable status: 7 of 14 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @campoy, @manishrjain, and @martinmr)


table/builder.go, line 191 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move each + to a new line and use // instead of /*.

Done.


table/table.go, line 265 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

That sounds like a good idea.

I'll defer the variable type change to a separate PR. The int => int64 change requires changes at multiple places in the code.

jarifibrahim added some commits Jun 24, 2019

Stream Writer: Stop oracle before starting new one
If we do not stop the existing one, the go routines will keep running in
the background.
Close all opened DBs
When the opened DB is not closed, all the running go routines
(watermark, compaction, etc) are not stopped and they keep running in
the background
@jarifibrahim

This comment has been minimized.

Copy link
Member Author

commented Jun 24, 2019

I had to push ec7de22 and 127ff52 to reduce error logs in the build output. Without those commits, the PR build was failing and the errors were not visible because Travis has a limit on the number of characters in the logs. See https://travis-ci.org/dgraph-io/badger/jobs/549661427 for an example of failed build.

@martinmr
Copy link
Member

left a comment

:lgtm:

Reviewed 6 of 11 files at r2, 3 of 3 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @manishrjain)

@jarifibrahim jarifibrahim merged commit 61c492d into master Jul 5, 2019

4 of 8 checks passed

continuous-integration/appveyor/branch AppVeyor build failed
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details
coverage/coveralls Coverage decreased (-0.5%) to 79.142%
Details
code-review/reviewable 12 files, 1 discussion left (ashish-goswami, manishrjain, martinmr)
Details
GolangCI No issues found!
Details
Unit Tests (badger) TeamCity build finished
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@campoy campoy deleted the ibrahim/footer-protobuf branch Jul 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.