Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data corruption (shard.go) #119

Merged
merged 7 commits into from
Feb 18, 2019
Merged

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Feb 15, 2019

This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to panic, and sometimes leading to data corruption of values.

I thought that fixing the datarace suggested in #117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.

@coveralls
Copy link

coveralls commented Feb 15, 2019

Coverage Status

Coverage decreased (-0.6%) to 89.7% when pulling 97377c6 on holiman:corruption into 84a0ff3 on allegro:master.

@holiman
Copy link
Contributor Author

holiman commented Feb 15, 2019

I pushed a commit on top which fixes a datarace in shard.go, where a slice was not copied into a new buffer before the lock was released. That fixes most of the corruptions, but not all failures.

@holiman
Copy link
Contributor Author

holiman commented Feb 15, 2019

Before the fix in del (7a2f9a2), running with ntests 800000, I got test failures 6 times out of 10.

With the last fix, it passes 10 out of 10 runs.

Since I now do the read twice in del, I added a convenience-method for the first read; since we're not actually interested in the data, it just checks errors and returns those instead. Might shave a nanosecond :)

@holiman holiman changed the title test: failing testcase to demonstrate #117 Fix data corruption (shard.go) Feb 15, 2019
Copy link
Collaborator

@siennathesane siennathesane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the detailed test to make sure there are no cache corruptions, I think this PR is great. It could use a little bit more, but I'm onboard with this. :)

bigcache_test.go Outdated Show resolved Hide resolved
@@ -162,6 +162,11 @@ func (q *BytesQueue) Get(index int) ([]byte, error) {
return data, err
}

// CheckGet checks if an entry can be read from index
func (q *BytesQueue) CheckGet(index int) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for this.

@@ -177,6 +182,23 @@ func (e *queueError) Error() string {
return e.message
}

// peekCheckErr is identical to peek, but does not actually return any data
func (q *BytesQueue) peekCheckErr(index int) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs a test. :)

@holiman
Copy link
Contributor Author

holiman commented Feb 16, 2019

I addressed the concerns.

@holiman holiman mentioned this pull request Feb 16, 2019
Copy link
Collaborator

@siennathesane siennathesane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siennathesane
Copy link
Collaborator

So that's weird. I merged it in the UI, it didn't merge the branch, and then it just kicked off another travis build, but then travis errored out:

$HOME/gopath/bin/goveralls -coverprofile=gover.coverprofile -service travis-ci
Bad response status from coveralls: 422
{"message":"Couldn't find a repository matching this job.","error":true}
The command "$HOME/gopath/bin/goveralls -coverprofile=gover.coverprofile -service travis-ci" exited with 1.

@cristaloleg can you take a look when you get a chance?

@cristaloleg cristaloleg merged commit e24eb22 into allegro:master Feb 18, 2019
@cristaloleg
Copy link
Collaborator

Thank you @holiman

u5surf pushed a commit to u5surf/bigcache that referenced this pull request Jun 4, 2019
This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to `panic`, and sometimes leading to data corruption of values. 

I thought that fixing the datarace suggested in allegro#117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.
flisky pushed a commit to flisky/bigcache that referenced this pull request May 7, 2020
This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to `panic`, and sometimes leading to data corruption of values. 

I thought that fixing the datarace suggested in allegro#117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants