Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compression in Badger #1013

Merged
merged 24 commits into from Sep 30, 2019

Conversation

@jarifibrahim
Copy link
Member

jarifibrahim commented Aug 27, 2019

This PR adds support for compression of SST files.

Todo:

  • Support for changing compression algorithm.
  • Store compression information in the manifest file.
  • Mechanism to migrate uncompressed data to compressed data This is no longer needed since the compression information is stored in a protobuf in manifest file and protobufs are backward compatible.

Fixes #792


This change is Reviewable

Copy link

pullrequest bot left a comment

A review job has been created and sent to the PullRequest network.


@jarifibrahim you can click here to see the review status or cancel the code review job.

table/builder.go Outdated Show resolved Hide resolved
Copy link
Member

manishrjain left a comment

Let me know when this is ready for review.

Reviewable status: 0 of 15 files reviewed, 1 unresolved discussion (waiting on @jarifibrahim)

Copy link

pullrequest bot left a comment

Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard

manifest.go Outdated Show resolved Hide resolved
@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 25, 2019

Coverage Status

Coverage decreased (-0.4%) to 77.487% when pulling 8fa09d4 on ibrahim/compression into e3b5652 on master.

@jarifibrahim jarifibrahim marked this pull request as ready for review Sep 26, 2019
@jarifibrahim jarifibrahim requested review from ashish-goswami and dgraph-io/team as code owners Sep 26, 2019
@jarifibrahim jarifibrahim force-pushed the ibrahim/compression branch from b242229 to a554b7e Sep 26, 2019
Copy link
Member Author

jarifibrahim left a comment

Reviewable status: 0 of 18 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


appveyor.yml, line 20 at r6 (raw file):

# scripts that run after cloning repository
install:
  - set PATH=%GOPATH%\bin;c:\go\bin;c:\msys64\mingw64\bin;%PATH%

The zstd library requires GCC and this change enables GCC on windows build.

Copy link
Contributor

ashish-goswami left a comment

Looks like PR is not complete yet. Will review again.

Reviewable status: 0 of 18 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


appveyor.yml, line 20 at r6 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

The zstd library requires GCC and this change enables GCC on windows build.

May be add this comment to file itself.


manifest.go, line 31 at r6 (raw file):

	"github.com/dgraph-io/badger/options"

merge with existing imports.


table/builder.go, line 28 at r6 (raw file):

	"github.com/golang/snappy"
	"github.com/pkg/errors"

merge all external import paths.


table/builder.go, line 158 at r6 (raw file):

	b.writeChecksum(blockBuf)

	// Compress the block

nit: period


table/builder.go, line 165 at r6 (raw file):

		blockBuf, err = b.compressData(b.buf.Bytes()[b.baseOffset:])
		y.Check(err)
		// Truncate already written data

nit: period.


table/builder.go, line 167 at r6 (raw file):

		// Truncate already written data
		b.buf.Truncate(int(b.baseOffset))
		// Write compressed data

nit: period.

Copy link
Member Author

jarifibrahim left a comment

Dismissed @ashish-goswami and @golangcibot from 4 discussions.
Reviewable status: 0 of 18 files reviewed, all discussions resolved


manifest.go, line 435 at r4 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 104 characters (from lll)

Done.


manifest.go, line 31 at r6 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

merge with existing imports.

Done.


table/builder.go, line 305 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

printf: Errorf format %s has arg ctype of wrong type github.com/dgraph-io/badger/options.CompressionType (from govet)

Done.


table/builder.go, line 28 at r6 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

merge all external import paths.

Done.

Copy link
Contributor

ashish-goswami left a comment

Add compression libraries in go.mod and go.sum files.

Reviewable status: 0 of 18 files reviewed, 9 unresolved discussions (waiting on @jarifibrahim)


options.go, line 117 at r7 (raw file):

		CompactL0OnClose:        true,
		KeepL0InMemory:          true,
		Compression:             options.ZSTDCompression,

We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.


options.go, line 135 at r7 (raw file):

}

// BuildTableOptions ...

Please complete the comment.


options/options.go, line 50 at r7 (raw file):

type CompressionType uint32

// The following constants should always be kept in sync with CompressionType protobuf

nit: period.


options/options.go, line 54 at r7 (raw file):

	// NoCompression mode indicates that a block is not compressed.
	NoCompression CompressionType = 0
	// SnappyCompression mode indicates that a block is not compressed.

correct comment.


options/options.go, line 56 at r7 (raw file):

	// SnappyCompression mode indicates that a block is not compressed.
	SnappyCompression CompressionType = 1
	// ZSTDCompression mode indicates that a block is not compressed.

correct comment.


table/builder.go, line 342 at r7 (raw file):

}

// compressData compresses the given data

nit: period.


table/builder.go, line 352 at r7 (raw file):

		return zstd.Compress(nil, data)
	}
	return nil, errors.New("Unsupported compression type")

We can define this err at the top. can be used while checking errors.


table/table.go, line 65 at r7 (raw file):

	DataKey *pb.DataKey

	// Compression ...

Please complete the comment.


table/table.go, line 99 at r7 (raw file):

}

// CompressionType ..

complete the comment.

Copy link
Member

manishrjain left a comment

:lgtm: Got a few comments.

Reviewed 1 of 2 files at r3, 2 of 14 files at r4, 9 of 11 files at r5, 1 of 3 files at r6, 2 of 4 files at r7, 5 of 5 files at r8.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @jarifibrahim)


db.go, line 914 at r8 (raw file):

		return y.Wrapf(err, "failed to get datakey in db.handleFlushTask")
	}
	bopts := BuildTableOptions(db.opt)

Does this need to be public?


levels.go, line 652 at r8 (raw file):

	for _, table := range newTables {
		changes = append(changes,
			newCreateChange(table.ID(), cd.nextLevel.level, table.KeyID(), table.CompressionType()))

Just send table in newCreateChange, instead of sending 4 different arguments.


options/options.go, line 52 at r8 (raw file):

const (
	// NoCompression mode indicates that a block is not compressed.
	NoCompression CompressionType = 0

Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.

None


options/options.go, line 54 at r8 (raw file):

	NoCompression CompressionType = 0
	// SnappyCompression mode indicates that a block is compressed using Snappy algorithm.
	SnappyCompression CompressionType = 1

Snappy


options/options.go, line 56 at r8 (raw file):

	SnappyCompression CompressionType = 1
	// ZSTDCompression mode indicates that a block is compressed using ZSTD algorithm.
	ZSTDCompression CompressionType = 2

ZSTD


table/table.go, line 368 at r8 (raw file):

	}

	if t.opt.Compression != options.NoCompression {

This if isn't required? Because t.decompressData already does this check.


table/table_test.go, line 855 at r8 (raw file):

func getTableForBenchmarks(b *testing.B, count int) *Table {
	rand.Seed(time.Now().Unix())
	opts := Options{Compression: options.ZSTDCompression, BlockSize: 4 * 1024, BloomFalsePositive: 0.01}

100 chars.

Copy link
Member Author

jarifibrahim left a comment

Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


db.go, line 914 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Does this need to be public?

No. I've changed it now.


levels.go, line 652 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Just send table in newCreateChange, instead of sending 4 different arguments.

newCreateChange is called by table manifest and table also. I can't change newCreateChange without refactoring a lot of code.


options.go, line 117 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.

ZSTD works on windows. The windows build seems to be working fine.


options.go, line 135 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

Please complete the comment.

Done.


options/options.go, line 50 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

nit: period.

The comment was outdate. Removed it.


options/options.go, line 54 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

correct comment.

Done.


options/options.go, line 56 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

correct comment.

Done.


options/options.go, line 52 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.

None

Done.


options/options.go, line 54 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Snappy

Done.


options/options.go, line 56 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ZSTD

Done.


table/builder.go, line 352 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

We can define this err at the top. can be used while checking errors.

I don't think it will be reused anywhere. I'll prefer to keep it as it is.


table/table.go, line 65 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

Please complete the comment.

Done.


table/table.go, line 99 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

complete the comment.

Done.


table/table.go, line 368 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This if isn't required? Because t.decompressData already does this check.

Done.


table/table_test.go, line 855 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.

@jarifibrahim

This comment has been minimized.

Copy link
Member Author

jarifibrahim commented Sep 27, 2019

The Travis build has been failing because coveralls is returning 500 error code. The tests are working fine.

Copy link
Member Author

jarifibrahim left a comment

Reviewable status: 9 of 20 files reviewed, 17 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


table/table.go, line 261 at r10 (raw file):

	if !it2.Valid() {
		return errors.Wrapf(it2.err, "failed to initialize biggest for table %s", t.Filename())
	}

This was changed so that we can return an appropriate error when incorrect checksum algorithm was used.


table/table_test.go, line 592 at r10 (raw file):

		{"k2", "a2"},
	}, opts)
	f2 := buildTable(t, [][]string{{"l1", "b1"}}, opts)

This was added because the changes in table.go requires a table to be non-empty.


table/table_test.go, line 636 at r10 (raw file):

func TestMergingIteratorTakeTwo(t *testing.T) {
	opts := getTestTableOptions()
	f1 := buildTable(t, [][]string{{"l1", "b1"}}, opts)

This was added because the changes in table.go requires a table to be non-empty.

Copy link
Contributor

ashish-goswami left a comment

:lgtm:

Reviewable status: 9 of 20 files reviewed, 10 unresolved discussions (waiting on @manishrjain)

@jarifibrahim jarifibrahim merged commit e7d0a7b into master Sep 30, 2019
7 of 9 checks passed
7 of 9 checks passed
coverage/coveralls Coverage decreased (-0.4%) to 77.487%
Details
code-review/reviewable 11 files, 10 discussions left (manishrjain)
Details
GolangCI No issues found!
Details
Unit Tests (badger) TeamCity build finished
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@jarifibrahim jarifibrahim deleted the ibrahim/compression branch Sep 30, 2019
danielmai added a commit that referenced this pull request Nov 13, 2019
Rename the builder method WithCompressionType added in #1013 to
WithCompression to be consistent with the option struct field
named Compression.

Update the godoc for WithCompression with the default compression
algorithm used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.