-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible datarace #117
Comments
I'd be happy to take a look at any fuzzing examples you have, but currently we don't have any data races that I'm aware of. In the case of The way it works is every key gets a hash, and then it's sent to the shards (just optimised byte arrays). When we go to get a key, the key is rehashed and we then get the contents from the shards at the index of the shard it resides in. The last I checked the only times we need to lock are when we need to write to the shard, which means We've spent the cycles going through an optimising the locking strategy as best we could at each step. The goal is to lock as little as possible for as quickly as possible, as the goal is pure performance, and allowing reads while writing is locked whenever possible. If you think there's a more efficient locking strategy, I'd be happy to review any PRs you come up with. :) |
I tried reproing something, the result wasn't really what I expected (crash instead of error) func TestCacheDelRandomly(t *testing.T) {
c := Config{
Shards: 1,
LifeWindow: time.Second,
CleanWindow: 0,
MaxEntriesInWindow: 10,
MaxEntrySize: 10,
Verbose: true,
Hasher: newDefaultHasher(),
HardMaxCacheSize: 100,
Logger: DefaultLogger(),
}
cache, _ := NewBigCache(c)
var wg sync.WaitGroup
wg.Add(3)
var ntest = 100000
go func(){
for i := 0; i < ntest; i++{
r := uint8(rand.Int())
key := fmt.Sprintf("thekey%d", r)
cache.Delete(key)
}
wg.Done()
}()
go func(){
for i := 0; i < ntest; i++{
r := uint8(rand.Int())
key := fmt.Sprintf("thekey%d", r)
val := []byte(fmt.Sprintf("%x%x%x%x%x%x%x", r,r,r,r,r,r,r))
cache.Set(key, val)
}
wg.Done()
}()
go func(){
for i := 0; i < ntest; i++{
r := uint8(rand.Int())
key := fmt.Sprintf("thekey%d", r)
val := []byte(fmt.Sprintf("%x%x%x%x%x%x%x", r,r,r,r,r,r,r))
if got, err := cache.Get(key); err == nil && !bytes.Equal(got, val){
fmt.Printf("ERR: Got %s -> %s (exp %s)\n ", key, got, val)
}
}
wg.Done()
}()
wg.Wait()
}
|
A printout in the
and
Looks like
|
More context, printing out the data surrounding the
One of the entries, namely this one (number 138 /
has obviously overwritten the location where it looks for |
If I make the lock around |
This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to `panic`, and sometimes leading to data corruption of values. I thought that fixing the datarace suggested in #117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.
This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to `panic`, and sometimes leading to data corruption of values. I thought that fixing the datarace suggested in allegro#117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.
This PR contains a testcase which demonstrates corruption (intermittently). Sometimes leading to `panic`, and sometimes leading to data corruption of values. I thought that fixing the datarace suggested in allegro#117 would solve it, but it seems I was wrong about that, there's some other underlying bug. I'll push a commit on top if I find it.
I was looking into the
shard.go
, and saw the pattern below which looks racey to me. I haven't gone through all the steps to fuzz or repro it, but thought I'd post it here to get your thoughts about it.Possibly problematic code :
Multiple readers may enter the first section in paralell, each getting an identical
wrappedEntry
for an entry to delete. They will then enter the writelocked section sequentially, andresetKeyFromEntry
will write destructively on the data. It's possible that the sequential deletion of the same element is fine, but even so it might be that another write reuses the data, and the next of these sequential reads will overwrite the data.The proper pattern should be to either wrap everything within write-lock, or redo the first check once the writelock is obtained.
I'm not fully versed in the codebase, so apologies if I've misunderstood something.
The text was updated successfully, but these errors were encountered: