Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctrie data race #122

Closed
newhook opened this issue Dec 24, 2015 · 9 comments · Fixed by #123
Closed

ctrie data race #122

newhook opened this issue Dec 24, 2015 · 9 comments · Fixed by #123

Comments

@newhook
Copy link

newhook commented Dec 24, 2015

I just got a report of data race in the ctrie impl on committed.

func (c *Ctrie) rdcssRoot(old *iNode, expected *mainNode, nv *iNode) bool {
    desc := &iNode{
        rdcss: &rdcssDescriptor{
            old:      old,
            expected: expected,
            nv:       nv,
        },
    }
    if c.casRoot(old, desc) {
        c.rdcssComplete(false)
        return desc.rdcss.committed
    }
    return false
}
func (c *Ctrie) rdcssComplete(abort bool) *iNode {
...
            if oldeMain == exp {
            // Commit the RDCSS.
            if c.casRoot(r, nv) {
                desc.committed = true
                return nv
Read by goroutine 9:
  customerio/chgbuf/ctrie.(*Ctrie).rdcssRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:881 +0x1e7
  customerio/chgbuf/ctrie.(*Ctrie).Snapshot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:318 +0xb7
...
Previous write by goroutine 6:
  customerio/chgbuf/ctrie.(*Ctrie).rdcssComplete()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:912 +0x1a1
  customerio/chgbuf/ctrie.(*Ctrie).rdcssReadRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:863 +0x78
  customerio/chgbuf/ctrie.(*Ctrie).readRoot()
      /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:855 +0x33
  customerio/chgbuf/ctrie.(*Ctrie).ReadOnlySnapshot()
...
@tylertreat
Copy link
Contributor

Thanks, will need to take a closer look, but that probably needs to be an
atomic.

On Thu, Dec 24, 2015, 11:59 AM Matthew Newhook notifications@github.com
wrote:

I just got a report of data race in the ctrie impl on committed.

func (c *Ctrie) rdcssRoot(old *iNode, expected *mainNode, nv *iNode) bool {
desc := &iNode{
rdcss: &rdcssDescriptor{
old: old,
expected: expected,
nv: nv,
},
}
if c.casRoot(old, desc) {
c.rdcssComplete(false)
return desc.rdcss.committed
}
return false
}

func (c *Ctrie) rdcssComplete(abort bool) *iNode {
...
if oldeMain == exp {
// Commit the RDCSS.
if c.casRoot(r, nv) {
desc.committed = true
return nv

Read by goroutine 9:
customerio/chgbuf/ctrie.(_Ctrie).rdcssRoot()
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:881 +0x1e7
customerio/chgbuf/ctrie.(_Ctrie).Snapshot()
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:318 +0xb7
...
Previous write by goroutine 6:
customerio/chgbuf/ctrie.(_Ctrie).rdcssComplete()
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:912 +0x1a1
customerio/chgbuf/ctrie.(_Ctrie).rdcssReadRoot()
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:863 +0x78
customerio/chgbuf/ctrie.(_Ctrie).readRoot()
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:855 +0x33
customerio/chgbuf/ctrie.(_Ctrie).ReadOnlySnapshot()
...


Reply to this email directly or view it on GitHub
#122.

@newhook
Copy link
Author

newhook commented Dec 24, 2015

-               return desc.rdcss.committed
+        return atomic.LoadInt32(&desc.rdcss.committed) == 1
        }
        return false
 }
@@ -909,7 +909,7 @@ func (c *Ctrie) rdcssComplete(abort bool) *iNode {
                if oldeMain == exp {
                        // Commit the RDCSS.
                        if c.casRoot(r, nv) {
-                               desc.committed = true
+                atomic.StoreInt32(&desc.committed, 1)
                                return nv

I made that fix locally, but I'm still getting very occasional crashes in a highly concurrent test case.

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x19146b]

goroutine 5 [running]:
testing.tRunner.func1(0xc8200a8000)
    /private/var/folders/q8/bf_4b1ts2zj0l7b0p1dv36lr0000gp/T/workdir/go/src/testing/testing.go:450 +0x1f5
customerio/chgbuf/ctrie.(*Ctrie).ilookup(0xc82039cd00, 0xc8203b3ee0, 0xc8203994d0, 0x0, 0x0, 0x3f9990, 0x0, 0x0, 0x8)
    /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:540 +0x7b
customerio/chgbuf/ctrie.(*Ctrie).lookup(0xc82039cd00, 0xc8203994d0, 0x0, 0x0, 0x340ca71c)
    /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:430 +0x9d
customerio/chgbuf/ctrie.(*Ctrie).Lookup(0xc82039cd00, 0xc8203b6090, 0x1, 0x8, 0x0, 0x0, 0x1)
    /Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:303 +0x111
customerio/chgbuf.(*data).child(0xc82039cd40, 0xc8203b6090, 0x1, 0x8, 0x256a00, 0x2fdde8)
    /Users/matthew/src/services/src/customerio/chgbuf/data.go:17 +0x68
customerio/chgbuf.(*bucketData).child(0xc82039cd20, 0xc8203b6090, 0x1, 0x8, 0x2fe980, 0x39a00)

@tylertreat
Copy link
Contributor

If you have a test which can reproduce semi reliably that would be great.

On Thu, Dec 24, 2015, 3:15 PM Matthew Newhook notifications@github.com
wrote:

  •           return desc.rdcss.committed
    
  •    return atomic.LoadInt32(&desc.rdcss.committed) == 1
    }
    return false
    
    }
    @@ -909,7 +909,7 @@ func (c *Ctrie) rdcssComplete(abort bool) *iNode {
    if oldeMain == exp {
    // Commit the RDCSS.
    if c.casRoot(r, nv) {
  •                           desc.committed = true
    
  •            atomic.StoreInt32(&desc.committed, 1)
                            return nv
    

I made that fix locally, but I'm still getting very occasional crashes in
a highly concurrent test case.

customerio/chgbuf/ctrie.(_Ctrie).ilookup(0xc82039cd00, 0xc8203b3ee0, 0xc8203994d0, 0x0, 0x0, 0x3f9990, 0x0, 0x0, 0x8)
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:540 +0x7b
customerio/chgbuf/ctrie.(_Ctrie).lookup(0xc82039cd00, 0xc8203994d0, 0x0, 0x0, 0x340ca71c)
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:430 +0x9d
customerio/chgbuf/ctrie.(_Ctrie).Lookup(0xc82039cd00, 0xc8203b6090, 0x1, 0x8, 0x0, 0x0, 0x1)
/Users/matthew/src/services/src/customerio/chgbuf/ctrie/ctrie.go:303 +0x111
customerio/chgbuf.(_data).child(0xc82039cd40, 0xc8203b6090, 0x1, 0x8, 0x256a00, 0x2fdde8)
/Users/matthew/src/services/src/customerio/chgbuf/data.go:17 +0x68
customerio/chgbuf.(*bucketData).child(0xc82039cd20, 0xc8203b6090, 0x1, 0x8, 0x2fe980, 0x39a00)


Reply to this email directly or view it on GitHub
#122 (comment)
.

@newhook
Copy link
Author

newhook commented Dec 25, 2015

The test is part of something much larger. I'll try and reproduce in something small.

@tylertreat
Copy link
Contributor

I haven't hit the nil pointer issue, but there definitely appears to be an issue with snapshotting (don't know if it's related to your issue or not). I ran the following program which gives the correct output unless the two goroutines are commented out and the wg is adjusted accordingly.

package main

import (
    "fmt"
    "strconv"
    "sync"

    "github.com/Workiva/go-datastructures/trie/ctrie"
)

func main() {
    trie := ctrie.New(nil)
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        for i := 0; i < 1000000; i++ {
            trie.Insert([]byte(strconv.Itoa(i)), i)
        }
        wg.Done()
    }()
    go func() {
        for i := 0; i < 1000000; i++ {
            trie.Lookup([]byte(strconv.Itoa(i)))
        }
        wg.Done()
    }()
    //go func() {
    //  for i := 0; i < 1000000; i++ {
    //      trie.Snapshot()
    //  }
    //  wg.Done()
    //}()
    //go func() {
    //  for i := 0; i < 1000000; i++ {
    //      trie.ReadOnlySnapshot()
    //  }
    //  wg.Done()
    //}()
    wg.Wait()

    i := 1
    for _ = range trie.Iterator(nil) {
        fmt.Println(i)
        i++
    }
    fmt.Println("size", trie.Size())
}

@tylertreat
Copy link
Contributor

I have a branch with a few fixes, but snapshotting is still broken: https://github.com/tylertreat/go-datastructures/tree/fixes

Simpler test:

package main

import (
    "fmt"
    "strconv"
    "sync"

    "github.com/Workiva/go-datastructures/trie/ctrie"
)

func main() {
    trie := ctrie.New(nil)
    var wg sync.WaitGroup
    wg.Add(2)
    go func() {
        for i := 0; i < 7; i++ {
            trie.Insert([]byte(strconv.Itoa(i)), i)
        }
        wg.Done()
    }()
    go func() {
        for i := 0; i < 7; i++ {
            trie.Snapshot()
        }
        wg.Done()
    }()
    wg.Wait()
    fmt.Println("size", trie.Size())
}

Expected size is 7, but if you run it enough times you should see values less than 7 occasionally.

@tylertreat
Copy link
Contributor

Issue appears to be with GCAS. I made an incorrect assumption from Java (where new Object() != new Object()). In Go, this is not quite the case: https://play.golang.org/p/BBde4cX1GC. As a result, GCAS always succeeds, despite the root generation changing during a snapshot.

@tylertreat
Copy link
Contributor

I believe I have it fixed with this: https://github.com/Workiva/go-datastructures/compare/master...tylertreat:fixes?expand=1. I haven't been able to reproduce after applying those fixes.

@newhook can you confirm this branch fixes your problem?

@newhook
Copy link
Author

newhook commented Dec 27, 2015

Awesome! I'll check on Tuesday :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants