Do not panic if database is closed #437

adelowo · 2018-03-10T22:11:30Z

This change is

manishrjain · 2018-03-13T03:44:03Z

Review status: 0 of 5 files reviewed at latest revision, 1 unresolved discussion.

	orc *oracle

	closed chan struct{}

Instead of using a chan as a flag, use atomics.

https://golang.org/pkg/sync/atomic/#StoreInt32

Comments from Reviewable

kataras · 2018-03-13T04:12:51Z

Yes... @adelowo why you choosed a channel instead of atomic uint32 for example, is a theory behind it that we should care about?

adelowo · 2018-03-13T04:13:49Z

I would update it to make use of atomics... @kataras The reasoning was an empty struct should consume no storage

adelowo · 2018-03-13T04:27:01Z

Can you take a look at this now @manishrjain ?

edit :: Probably should have used reviewable instead

manishrjain · 2018-03-13T04:34:16Z

Review status: 0 of 5 files reviewed at latest revision, 4 unresolved discussions.

db.go, line 341 at r2 (raw file):

	db.closers.writes.SignalAndWait()

	atomic.AddUint32(&db.closed, 1)

Use Store instead of add. Also, move this right after checking isClosed. So, all the waiting on signals, etc. happens after. In particular, the writes.SignalAndWait is the place where most people get that writeCh panic.

badger/db.go

Line 669 in 8e23f14

close(db.writeCh)

errors.go, line 27 at r2 (raw file):

var (

	// ErrBadgerClosed is returned when trying to write to a closed database

full stop at end.

transaction.go, line 551 at r2 (raw file):

// Update executes a function, creating and managing a read-write transaction
// for the user. Error returned by the function is relayed by the Update method.
func (db *DB) Update(fn func(txn *Txn) error) error {

Instead of doing this over here and in View, do this within NewTransaction.

We'd still have to deal with the case where DB is closed while a pending transaction is still live. That txn would try to read and write, and still cause panics.

Comments from Reviewable

adelowo · 2018-03-13T09:14:48Z

Review status: 0 of 5 files reviewed at latest revision, 4 unresolved discussions.

transaction.go, line 551 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of doing this over here and in View, do this within NewTransaction.

We'd still have to deal with the case where DB is closed while a pending transaction is still live. That txn would try to read and write, and still cause panics.

NewTransaction does not return an error.. If i have to do add the isClosed check to it, the public api would change as the error must obviously be returned

Comments from Reviewable

adelowo · 2018-03-13T09:29:45Z

Review status: 0 of 5 files reviewed at latest revision, 4 unresolved discussions.

errors.go, line 27 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

full stop at end.

Done... updated the comment anyways

Comments from Reviewable

manishrjain · 2018-03-13T19:11:10Z

Review status: 0 of 5 files reviewed at latest revision, 4 unresolved discussions.

transaction.go, line 551 at r2 (raw file):

Previously, adelowo (Lanre Adelowo) wrote…

NewTransaction does not return an error.. If i have to do add the isClosed check to it, the public api would change as the error must obviously be returned

Can you also respond to the second part of that comment? Dealing with the case where there's a pending txn still live.

Comments from Reviewable

…e db is closed

manishrjain · 2018-03-13T22:41:25Z

This all looks good. My only concern that's left is the impact on read/write performance, as every Get now has to load an atomic var before starting. Any thoughts on that?

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

Comments from Reviewable

manishrjain · 2018-03-13T22:42:10Z

In general, some sort of before and after read benchmarking would be reassuring.

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

Comments from Reviewable

adelowo · 2018-03-14T05:21:33Z

I would get those in place

Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

Comments from Reviewable

adelowo · 2018-03-14T11:29:36Z

I ran some benchmark tests

Here is the benchmark code (Set operations ) I used for both branches --> benchmark-patch.txt

Running against master ( go test -run=YYY -bench=Oops)

goos: darwin
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDB_Oops-8   	   10000	    178647 ns/op	    9044 B/op	      59 allocs/op
PASS
ok  	github.com/dgraph-io/badger	10.982s

Against this PR

goos: darwin
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDB_Oops-8   	   10000	    205644 ns/op	    9042 B/op	      59 allocs/op
PASS
ok  	github.com/dgraph-io/badger	12.341s

I have also used pprof to measure (go test -run=YYY -bench=Oops -cpuprofile=oops.out.. I ran list Update, here is what I got

(pprof) list Update
Total: 560ms
ROUTINE ======================== github.com/dgraph-io/badger.(*DB).Update in /Users/lanreadelowo/go/src/github.com/dgraph-io/badger/transaction.go
         0       70ms (flat, cum) 12.50% of Total
         .          .    564:
         .          .    565:	if err := fn(txn); err != nil {
         .          .    566:		return err
         .          .    567:	}
         .          .    568:
         .       70ms    569:	return txn.Commit(nil)
         .          .    570:}
         .          .    571:
         .          .    572:func (db *DB) isClosed() bool {
         .          .    573:	return atomic.LoadUint32(&db.closed) > 0
         .          .    574:}
(pprof) list Commit
Total: 560ms
ROUTINE ======================== github.com/dgraph-io/badger.(*Txn).Commit in /Users/lanreadelowo/go/src/github.com/dgraph-io/badger/transaction.go
         0       70ms (flat, cum) 12.50% of Total
         .          .    464:		Value: []byte(strconv.FormatUint(commitTs, 10)),
         .          .    465:		meta:  bitFinTxn,
         .          .    466:	}
         .          .    467:	entries = append(entries, e)
         .          .    468:
         .       70ms    469:	req, err := txn.db.sendToWriteCh(entries)
         .          .    470:	state.writeLock.Unlock()
         .          .    471:	if err != nil {
         .          .    472:		return err
         .          .    473:	}
         .          .    474:
(pprof)

For reads,

Master

goos: darwin
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDB_Damn-8   	 2000000	       760 ns/op	     504 B/op	      10 allocs/op
PASS
ok  	github.com/dgraph-io/badger	14.714s

this PR

goos: darwin
goarch: amd64
pkg: github.com/dgraph-io/badger
BenchmarkDB_Damn-8   	 2000000	       788 ns/op	     504 B/op	      10 allocs/op
PASS
ok  	github.com/dgraph-io/badger	14.340s

manishrjain · 2018-03-14T19:18:27Z

Can you add the profile for reads as well? Also include topN.

For the update, I'm seeing 15% degradation. Any ideas where's that coming from, if not from atomic?

adelowo · 2018-03-15T12:42:44Z

Hi @manishrjain , nothing points to atomic. It would be nice if someone helps verify this.

pprof for Reads

(pprof) top20 -cum
Showing nodes accounting for 1.21s, 34.28% of 3.53s total
Dropped 78 nodes (cum <= 0.02s)
Showing top 20 nodes out of 112
      flat  flat%   sum%        cum   cum%
         0     0%     0%      2.82s 79.89%  testing.(*B).launch
         0     0%     0%      2.82s 79.89%  testing.(*B).runN
         0     0%     0%      2.81s 79.60%  github.com/dgraph-io/badger.BenchmarkDB_Damn
     0.01s  0.28%  0.28%      2.75s 77.90%  github.com/dgraph-io/badger.(*DB).Update
     0.03s  0.85%  1.13%      1.85s 52.41%  github.com/dgraph-io/badger.BenchmarkDB_Damn.func3
     0.02s  0.57%  1.70%      1.70s 48.16%  github.com/dgraph-io/badger.(*Txn).Get
     0.30s  8.50% 10.20%      1.62s 45.89%  runtime.mallocgc
     0.06s  1.70% 11.90%      1.09s 30.88%  github.com/dgraph-io/badger.(*DB).get
     0.02s  0.57% 12.46%      1.05s 29.75%  runtime.newobject
         0     0% 12.46%      1.02s 28.90%  runtime.systemstack
         0     0% 12.46%      0.74s 20.96%  runtime.gcStart
         0     0% 12.46%      0.73s 20.68%  runtime.gcStart.func2
     0.73s 20.68% 33.14%      0.73s 20.68%  runtime.kevent
         0     0% 33.14%      0.73s 20.68%  runtime.netpoll
         0     0% 33.14%      0.73s 20.68%  runtime.startTheWorldWithSema
         0     0% 33.14%      0.62s 17.56%  runtime.schedule
         0     0% 33.14%      0.57s 16.15%  runtime.mcall
         0     0% 33.14%      0.57s 16.15%  runtime.park_m
         0     0% 33.14%      0.56s 15.86%  github.com/dgraph-io/badger.(*DB).NewTransaction (inline)
     0.04s  1.13% 34.28%      0.54s 15.30%  github.com/dgraph-io/badger/skl.(*Skiplist).Get
(pprof) top20
Showing nodes accounting for 2.74s, 77.62% of 3.53s total
Dropped 78 nodes (cum <= 0.02s)
Showing top 20 nodes out of 112
      flat  flat%   sum%        cum   cum%
     0.73s 20.68% 20.68%      0.73s 20.68%  runtime.kevent
     0.43s 12.18% 32.86%      0.43s 12.18%  runtime.mach_semaphore_wait
     0.30s  8.50% 41.36%      1.62s 45.89%  runtime.mallocgc
     0.18s  5.10% 46.46%      0.20s  5.67%  runtime.heapBitsSetType
     0.15s  4.25% 50.71%      0.15s  4.25%  runtime.memclrNoHeapPointers
     0.14s  3.97% 54.67%      0.14s  3.97%  runtime.mach_semaphore_signal
     0.12s  3.40% 58.07%      0.21s  5.95%  runtime.deferreturn
     0.08s  2.27% 60.34%      0.38s 10.76%  github.com/dgraph-io/badger/y.ParseKey
     0.08s  2.27% 62.61%      0.08s  2.27%  runtime.nextFreeFast (inline)
     0.06s  1.70% 64.31%      1.09s 30.88%  github.com/dgraph-io/badger.(*DB).get
     0.06s  1.70% 66.01%      0.36s 10.20%  github.com/dgraph-io/badger.(*DB).getMemTables
     0.06s  1.70% 67.71%      0.06s  1.70%  runtime.acquirem (inline)
     0.06s  1.70% 69.41%      0.12s  3.40%  runtime.deferproc
     0.05s  1.42% 70.82%      0.07s  1.98%  runtime.freedefer
     0.05s  1.42% 72.24%      0.05s  1.42%  syscall.Syscall
     0.04s  1.13% 73.37%      0.19s  5.38%  github.com/dgraph-io/badger.(*Txn).Commit
     0.04s  1.13% 74.50%      0.54s 15.30%  github.com/dgraph-io/badger/skl.(*Skiplist).Get
     0.04s  1.13% 75.64%      0.05s  1.42%  runtime.newdefer
     0.04s  1.13% 76.77%      0.05s  1.42%  runtime.scanobject
     0.03s  0.85% 77.62%      0.07s  1.98%  github.com/dgraph-io/badger.(*oracle).decrRef
(pprof) top50
Showing nodes accounting for 3.35s, 94.90% of 3.53s total
Dropped 78 nodes (cum <= 0.02s)
Showing top 50 nodes out of 112
      flat  flat%   sum%        cum   cum%
     0.73s 20.68% 20.68%      0.73s 20.68%  runtime.kevent
     0.43s 12.18% 32.86%      0.43s 12.18%  runtime.mach_semaphore_wait
     0.30s  8.50% 41.36%      1.62s 45.89%  runtime.mallocgc
     0.18s  5.10% 46.46%      0.20s  5.67%  runtime.heapBitsSetType
     0.15s  4.25% 50.71%      0.15s  4.25%  runtime.memclrNoHeapPointers
     0.14s  3.97% 54.67%      0.14s  3.97%  runtime.mach_semaphore_signal
     0.12s  3.40% 58.07%      0.21s  5.95%  runtime.deferreturn
     0.08s  2.27% 60.34%      0.38s 10.76%  github.com/dgraph-io/badger/y.ParseKey
     0.08s  2.27% 62.61%      0.08s  2.27%  runtime.nextFreeFast (inline)
     0.06s  1.70% 64.31%      1.09s 30.88%  github.com/dgraph-io/badger.(*DB).get
     0.06s  1.70% 66.01%      0.36s 10.20%  github.com/dgraph-io/badger.(*DB).getMemTables
     0.06s  1.70% 67.71%      0.06s  1.70%  runtime.acquirem (inline)
     0.06s  1.70% 69.41%      0.12s  3.40%  runtime.deferproc
     0.05s  1.42% 70.82%      0.07s  1.98%  runtime.freedefer
     0.05s  1.42% 72.24%      0.05s  1.42%  syscall.Syscall
     0.04s  1.13% 73.37%      0.19s  5.38%  github.com/dgraph-io/badger.(*Txn).Commit
     0.04s  1.13% 74.50%      0.54s 15.30%  github.com/dgraph-io/badger/skl.(*Skiplist).Get
     0.04s  1.13% 75.64%      0.05s  1.42%  runtime.newdefer
     0.04s  1.13% 76.77%      0.05s  1.42%  runtime.scanobject
     0.03s  0.85% 77.62%      0.07s  1.98%  github.com/dgraph-io/badger.(*oracle).decrRef
     0.03s  0.85% 78.47%      1.85s 52.41%  github.com/dgraph-io/badger.BenchmarkDB_Damn.func3
     0.03s  0.85% 79.32%      0.03s  0.85%  github.com/dgraph-io/badger/skl.(*Skiplist).DecrRef (inline)
     0.03s  0.85% 80.17%      0.03s  0.85%  github.com/dgraph-io/badger/y.(*ValueStruct).Decode
     0.03s  0.85% 81.02%      0.05s  1.42%  github.com/dgraph-io/badger/y.CompareKeys
     0.03s  0.85% 81.87%      0.30s  8.50%  runtime.convT2Eslice
     0.03s  0.85% 82.72%      0.03s  0.85%  runtime.jmpdefer
     0.03s  0.85% 83.57%      0.03s  0.85%  runtime.mach_semaphore_timedwait
     0.03s  0.85% 84.42%      0.03s  0.85%  runtime.usleep
     0.03s  0.85% 85.27%      0.04s  1.13%  runtime.wbBufFlush1
     0.02s  0.57% 85.84%      1.70s 48.16%  github.com/dgraph-io/badger.(*Txn).Get
     0.02s  0.57% 86.40%      0.02s  0.57%  github.com/dgraph-io/badger/y.ParseTs (inline)
     0.02s  0.57% 86.97%      0.40s 11.33%  github.com/dgraph-io/badger/y.SameKey
     0.02s  0.57% 87.54%      0.03s  0.85%  github.com/dgryski/go-farm.Fingerprint64
     0.02s  0.57% 88.10%      0.02s  0.57%  runtime.cmpbody
     0.02s  0.57% 88.67%      0.08s  2.27%  runtime.growslice
     0.02s  0.57% 89.24%      0.02s  0.57%  runtime.heapBitsForAddr (inline)
     0.02s  0.57% 89.80%      0.02s  0.57%  runtime.heapBitsForObject
     0.02s  0.57% 90.37%      1.05s 29.75%  runtime.newobject
     0.02s  0.57% 90.93%      0.11s  3.12%  runtime.rawbyteslice
     0.02s  0.57% 91.50%      0.02s  0.57%  runtime.releasem (inline)
     0.02s  0.57% 92.07%      0.02s  0.57%  sync.(*Mutex).Lock
     0.02s  0.57% 92.63%      0.02s  0.57%  sync.(*Mutex).Unlock
     0.02s  0.57% 93.20%      0.02s  0.57%  sync.(*RWMutex).RLock
     0.01s  0.28% 93.48%      2.75s 77.90%  github.com/dgraph-io/badger.(*DB).Update
     0.01s  0.28% 93.77%      0.08s  2.27%  github.com/dgraph-io/badger.(*Txn).Discard
     0.01s  0.28% 94.05%      0.10s  2.83%  github.com/dgraph-io/badger/y.KeyWithTs
     0.01s  0.28% 94.33%      0.05s  1.42%  runtime.gcDrain
     0.01s  0.28% 94.62%      0.02s  0.57%  runtime.lock
     0.01s  0.28% 94.90%      0.20s  5.67%  runtime.makemap_small
         0     0% 94.90%      0.02s  0.57%  github.com/dgraph-io/badger.(*DB).Close
(pprof) list Get
Total: 3.53s
ROUTINE ======================== github.com/dgraph-io/badger.(*Txn).Get in /Users/lanreadelowo/go/src/github.com/dgraph-io/badger/transaction.go
      20ms      1.70s (flat, cum) 48.16% of Total
         .          .    331:           return nil, ErrEmptyKey
         .          .    332:   } else if txn.discarded {
         .          .    333:           return nil, ErrDiscardedTxn
         .          .    334:   }
         .          .    335:
      10ms      380ms    336:   item = new(Item)
         .          .    337:   if txn.update {
         .       10ms    338:           if e, has := txn.pendingWrites[string(key)]; has && bytes.Equal(key, e.Key) {
         .          .    339:                   if isDeletedOrExpired(e.meta, e.ExpiresAt) {
         .          .    340:                           return nil, ErrKeyNotFound
         .          .    341:                   }
         .          .    342:                   // Fulfill from cache.
         .          .    343:                   item.meta = e.meta
         .          .    344:                   item.val = e.Value
         .          .    345:                   item.userMeta = e.UserMeta
         .          .    346:                   item.key = key
         .          .    347:                   item.status = prefetched
         .          .    348:                   item.version = txn.readTs
         .          .    349:                   // We probably don't need to set db on item here.
         .          .    350:                   return item, nil
         .          .    351:           }
         .          .    352:           // Only track reads if this is update txn. No need to track read if txn serviced it
         .          .    353:           // internally.
         .       30ms    354:           fp := farm.Fingerprint64(key)
      10ms       90ms    355:           txn.reads = append(txn.reads, fp)
         .          .    356:   }
         .          .    357:
         .      100ms    358:   seek := y.KeyWithTs(key, txn.readTs)
         .      1.09s    359:   vs, err := txn.db.get(seek)
         .          .    360:   if err != nil {
         .          .    361:           return nil, errors.Wrapf(err, "DB::Get key: %q", key)
         .          .    362:   }
         .          .    363:   if vs.Value == nil && vs.Meta == 0 {
         .          .    364:           return nil, ErrKeyNotFound
ROUTINE ======================== github.com/dgraph-io/badger/skl.(*Skiplist).Get in /Users/lanreadelowo/go/src/github.com/dgraph-io/badger/skl/skl.go
      40ms      540ms (flat, cum) 15.30% of Total
         .          .    379:}
         .          .    380:
         .          .    381:// Get gets the value associated with the key. It returns a valid value if it finds equal or earlier
         .          .    382:// version of the same key.
         .          .    383:func (s *Skiplist) Get(key []byte) y.ValueStruct {
      10ms       70ms    384:   n, _ := s.findNear(key, false, true) // findGreaterOrEqual.
         .          .    385:   if n == nil {
         .          .    386:           return y.ValueStruct{}
         .          .    387:   }
         .          .    388:
         .          .    389:   nextKey := s.arena.getKey(n.keyOffset, n.keySize)
      20ms      420ms    390:   if !y.SameKey(key, nextKey) {
         .          .    391:           return y.ValueStruct{}
         .          .    392:   }
         .          .    393:
         .          .    394:   valOffset, valSize := n.getValueOffset()
      10ms       40ms    395:   vs := s.arena.getVal(valOffset, valSize)
         .       10ms    396:   vs.Version = y.ParseTs(nextKey)
         .          .    397:   return vs
         .          .    398:}
         .          .    399:
         .          .    400:// NewIterator returns a skiplist iterator.  You have to Close() the iterator.
         .          .    401:func (s *Skiplist) NewIterator() *Iterator {
(pprof)

manishrjain · 2018-03-15T19:40:34Z

Thanks for this. But, the code which is calling the new isClosed function isn't being shown in the list. Can you paste the full Get function?

(pprof) list Get
Total: 3.53s
ROUTINE ======================== github.com/dgraph-io/badger.(*Txn).Get in /Users/lanreadelowo/go/src/github.com/dgraph-io/badger/transaction.go
      20ms      1.70s (flat, cum) 48.16% of Total
         .          .    331:           return nil, ErrEmptyKey
         .          .    332:   } else if txn.discarded {
         .          .    333:           return nil, ErrDiscardedTxn
         .          .    334:   }
         .          .    335:
      10ms      380ms    336:   item = new(Item)
         .          .    337:   if txn.update {

adelowo · 2018-03-15T21:10:19Z

That is all that is being shown... Not too familiar with pprof, can you help verify that ? Anyways, I generated an image of the entire profile
https://user-images.githubusercontent.com/12677701/37491298-9cdbfac8-289d-11e8-8434-a0dcc7c4020f.png

whyrusleeping · 2018-03-17T18:53:01Z

Hey guys, anything the ipfs team can do to help move this one along?

manishrjain · 2018-03-19T05:07:42Z

I want to get a sense of how much slowness this change brings to concurrent reads. Maybe if your team (@whyrusleeping) can help benchmark that, that'd be good. The potential performance hit of this approach is the main concern here.

The profile image doesn't have any relevant code from Badger. It seems like it wasn't done correctly.

whyrusleeping · 2018-03-19T05:16:27Z

@schomatis you game for some benchmarking?

schomatis · 2018-03-19T05:18:49Z

Sure.

schomatis · 2018-03-19T07:07:07Z

@manishrjain Before going through with the benchmark tests let me make some remarks that I would appreciate your feedback on:

These checks seem more a responsibility of the user's code than Badger's API. Avoiding panic is not a bad thing but it will hide more serious underlying problems, multiple closes or closing an ongoing transaction are likely symptoms of a logic error from the consumer of the API than a problem with Badger itself. (In my opinion documentation can be an alternative solution here.)
As you already commented, the biggest issue here is what happens when closing the database with an ongoing operation on it (which is the problem in the IPFS issue referenced here), is that being handled in this PR?
Atomic operations like load or store used here do not provide synchronization per se, they just guarantee no data races over the address itself. Calling Close from different goroutines would still trigger issues like Calling DB.Close() multiple times is not safe #374 (similarly if calling Close and other transaction operation simultaneously). I think channels or locks will be necessary if synchronization is the goal here, which will probably imply some performance degradation and I'm more than willing to benchmark that if you want.

manishrjain · 2018-03-19T09:48:26Z

Hey @schomatis ,

Totally agree about this. This issue falls in the user domain and can be fixed with documentation.
The PR now does check for that, by checking for closed in every read or update call within a transaction.
Issue Calling DB.Close() multiple times is not safe #374 can be fixed with this PR because it checks if DB is already closed before closing again.

schomatis · 2018-03-19T12:01:37Z

The PR now does check for that, by checking for closed in every read or update call within a transaction.

I was referring (and maybe you weren't and I misunderstood the review comment) to the case were an operation like get is already on its way (it already passed the initial isClosed check) when another goroutine starts the Close operation. This is the case of the IPFS issue referenced here and from what I'm testing locally this PR doesn't handle that case (if it should or not is another discussion, see below).

Issue #374 can be fixed with this PR because it checks if DB is already closed before closing again.

I've meant this is not thread-safe. Maybe it's not meant to be, but if it is, the atomic operations do not guarantee that. Two different goroutines can call Close, pass the initial isClosed together, and then later both write atomically to closed without errors, but only one of them will be able to close the channel and the other will panic. The atomic load/stores only guarantee that the value won't be messed up (data race) if two goroutines read/write to it simultaneously and this goes toward thread-safety but it's not enough, what should be atomic is the read-then-write operation over the closed flag, e.g., with a RW mutex (the Close would try to hold the write lock and any other operation would contend only for the read lock). As mentioned before any kind of synchronization like a RW mutex will have a performance impact and that is something worth benchmarking.

In case thread-safety wasn't the goal the atomic operations can be discarded, they have a performance penalty without achieving its goal.

From my perspective, the central issue is that clearly certain operations cannot run in parallel in the DB (e.g., Get and Close, or Close and Close) because they manipulate the same control structures (e.g., the channel being closed that causes #374 or the mt pointer that causes ipfs/kubo#4820), they need to be synchronized so that one waits on the other to finish (Close in part does that with its closers channels), should that synchronization be enforced in the API for all those operations?

That's what (in my opinion) is worth evaluating. As you correctly worry, going down this road will have (not negligible) performance implications. Some users would appreciate it, as they won't have to check for that since it will be handled by Badger (avoiding some unpleasant panics), other users won't, as the performance degradation may be undesirable (and maybe they already have an underlying synchronization mechanism in the code that consumes the API and Badger's own synchronization mechanism would seem redundant).

I think this PR goes in the right direction but falls short in defining what is Badger willing to do and what is left to the user (and that should be documented in the API.)

manishrjain · 2018-03-19T17:17:13Z

I agree with your assessment -- if we try to solve this problem with more code, we'd need to consider all the various edge cases that could happen to do it right. All of these would then add more complexity and come with some performance impact.

I think this issue is purely in the domain of having clear documentation instead. We clearly tell users to not access the DB after closing it and point out in FAQ the exact error they might receive if that happens. That'd be the right division between what Badger can do and what the user should do at their end.

If there are no objections, my take is to close this PR and fix documentation instead.

whyrusleeping · 2018-03-19T17:24:23Z

One thing that might help, and have minimal impact of performance, would be to have an atomic reference counter for operations that may panic if the db is closed. Then we could at least check if things may panic before calling close (and use that to wait for cleanup/finishup on the user side)

whyrusleeping · 2018-03-19T17:25:16Z

Then again, that will probably have the same perf impact as this PR

schomatis · 2018-03-19T17:54:11Z

@manishrjain I agree that this PR could be closed and in its place another issue could be opened to fully discuss how to handle this.

Merging this PR as is will solve some of the problems mentioned here and its cost will probably be a very low performance impact but it has the danger of giving the impression that Badger's API is thread-safe and that might confuse the design of use cases for the consumers of the API.

manishrjain · 2018-03-19T18:06:22Z

Agreed. Thanks for the detailed write-up and thorough analysis of the issue, @schomatis.

Thanks, @adelowo for the PR and for following through with all the reviews requested. Unfortunately, can't accept this PR, but there are many other issues in Badger which I could use your help with. Thanks again!

adelowo · 2018-03-19T18:13:01Z

Thanks for your input(s) @schomatis . I agree with your statement on thread safety..

Lanre Adelowo added 2 commits March 9, 2018 23:57

Item should implement Stringer interface

1898d19

don't panic if db is closed already

a6d9c88

use atomics instead of channel to signify a closed basger instance

77a0dfb

use storeUint instead of Add and added punctuation to doc

29c5baf

Lanre Adelowo and others added 5 commits March 13, 2018 22:11

handle case where a managed transaction might still be alive after th…

3f1d85d

…e db is closed

delete empty test case

74a4084

fix merge conflicts

7acc125

run check if db is closed first

5afa04e

Merge branch 'master' into no_panic

6183ea4

magik6k mentioned this pull request Mar 17, 2018

Canceling an add operation causes a panic in the datastore (no data loss) ipfs/kubo#4820

Closed

manishrjain closed this Mar 19, 2018

adelowo deleted the no_panic branch March 20, 2018 21:31

Stebalien mentioned this pull request Mar 8, 2019

Panic on Get #735

Closed

manishrjain mentioned this pull request May 30, 2019

Return proper error if database is closed #838

Closed

Do not panic if database is closed #437

Do not panic if database is closed #437

Uh oh!

Conversation

adelowo commented Mar 10, 2018 • edited by manishrjain Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manishrjain commented Mar 13, 2018

Uh oh!

kataras commented Mar 13, 2018

Uh oh!

adelowo commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adelowo commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manishrjain commented Mar 13, 2018

Uh oh!

adelowo commented Mar 13, 2018

Uh oh!

adelowo commented Mar 13, 2018

Uh oh!

manishrjain commented Mar 13, 2018

Uh oh!

manishrjain commented Mar 13, 2018

Uh oh!

manishrjain commented Mar 13, 2018

Uh oh!

adelowo commented Mar 14, 2018

Uh oh!

adelowo commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manishrjain commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adelowo commented Mar 15, 2018

Uh oh!

manishrjain commented Mar 15, 2018

Uh oh!

adelowo commented Mar 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whyrusleeping commented Mar 17, 2018

Uh oh!

manishrjain commented Mar 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whyrusleeping commented Mar 19, 2018

Uh oh!

schomatis commented Mar 19, 2018

Uh oh!

schomatis commented Mar 19, 2018

Uh oh!

manishrjain commented Mar 19, 2018

Uh oh!

schomatis commented Mar 19, 2018

Uh oh!

manishrjain commented Mar 19, 2018

Uh oh!

whyrusleeping commented Mar 19, 2018

Uh oh!

whyrusleeping commented Mar 19, 2018

Uh oh!

schomatis commented Mar 19, 2018

Uh oh!

manishrjain commented Mar 19, 2018

Uh oh!

adelowo commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

adelowo commented Mar 10, 2018 •

edited by manishrjain

Loading

adelowo commented Mar 13, 2018 •

edited

Loading

adelowo commented Mar 13, 2018 •

edited

Loading

adelowo commented Mar 14, 2018 •

edited

Loading

manishrjain commented Mar 14, 2018 •

edited

Loading

adelowo commented Mar 15, 2018 •

edited

Loading

manishrjain commented Mar 19, 2018 •

edited

Loading