Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support checksum verification for values read from vlog #1052

Merged
merged 5 commits into from Oct 4, 2019

Conversation

@jarifibrahim
Copy link
Member

commented Sep 24, 2019

The values are written with its checksum but we were not validating it
for every read. This commit enables checksum validation for every read.

If VerifyValueChecksum is set to true, the checksum will be calculated for
each entry read from the value log. This option has no effect if the
value is stored completely in SST.

Fixes #1049


This change is Reviewable

The values are written with its checksum but we were not validating it
for every read. This commit enables checksum validation for every read.
If "VerifyValueChecksum" is set to true, checksum will be calculated for
each entry read from the value log. This option has no effect if the
value is stored completely in SST.
@jarifibrahim jarifibrahim requested review from ashish-goswami, manishrjain and dgraph-io/team as code owners Sep 24, 2019
Copy link

left a comment

A review job has been created and sent to the PullRequest network.


@jarifibrahim you can click here to see the review status or cancel the code review job.

value.go Outdated Show resolved Hide resolved
Copy link
Member

left a comment

LGTM

@coveralls

This comment has been minimized.

Copy link

commented Sep 24, 2019

Coverage Status

Coverage increased (+0.5%) to 78.089% when pulling 1eee262 on ibrahim/vlog-checksum into e7d0a7b on master.

Copy link

left a comment

Tests and adds the code for verifying checksums as described


Reviewed with ❤️ by PullRequest

@nordicdyno

This comment has been minimized.

Copy link

commented Sep 25, 2019

Is any hope this fix would land in 1.6.x branch?

@afiskon

This comment has been minimized.

Copy link

commented Sep 25, 2019

// Feel free to modify these to suit your needs with the WithX methods.
func DefaultOptions(path string) Options {
	return Options{
// ...
		VerifyValueChecksum:     false,

As I mentioned in #1049 it's worth considering making VerifyValueChecksum true by default.

Copy link
Contributor

left a comment

We can add this option in existing tests also, so that it will get tested more. :lgtm:

Reviewed 2 of 3 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @golangcibot and @manishrjain)

@jarifibrahim

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2019

Is any hope this fix would land in 1.6.x branch?

We should be able to cherry-pick this into the 1.6.x branch.

@jarifibrahim

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2019

// Feel free to modify these to suit your needs with the WithX methods.
func DefaultOptions(path string) Options {
	return Options{
// ...
		VerifyValueChecksum:     false,

As I mentioned in #1049 it's worth considering making VerifyValueChecksum true by default.

I am not very sure if the default behavior should be to verify the checksum of every entry read from value log. This would significantly affect the performance. Keeping it to false allows us to keep the existing behavior (and performance).

The chances of getting corrupted data somewhere in the middle of a value log file are anyway pretty slim. If there was some corruption, it might have affected SSTs too. In that case, Badger would recognize corruption when DB is opened. We perform checksum verification for every SST file when we open the DB. So I am not sure if we should keep it enabled by default.

@afiskon

This comment has been minimized.

Copy link

commented Oct 3, 2019

I am not very sure if the default behavior should be to verify the checksum of every entry read from value log. This would significantly affect the performance. Keeping it to false allows us to keep the existing behavior (and performance). The chances of getting corrupted data somewhere in the middle of a value log file are anyway pretty slim.

I would argue that the default behavior of the storage engine should be to provide consistency, not to look good on benchmarks. Users interested in performance can change the default value if they like. Users interested in consistency may not even know that by default Badger sometimes can loose data, even if chances are pretty slim.

Copy link
Member

left a comment

:lgtm:

Reviewed 1 of 3 files at r2, 1 of 1 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @golangcibot)

Copy link
Member Author

left a comment

Dismissed @golangcibot from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


value.go, line 1170 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Fixed.

Done.

@jarifibrahim jarifibrahim merged commit 0ef91a8 into master Oct 4, 2019
9 checks passed
9 checks passed
GolangCI No issues found!
Details
Unit Tests (badger) TeamCity build finished
Details
code-review/reviewable 3 files reviewed
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.5%) to 78.089%
Details
license/cla Contributor License Agreement is signed.
Details
@jarifibrahim jarifibrahim deleted the ibrahim/vlog-checksum branch Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.