Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segmentation fault in vlog.Read #1150

Merged
merged 6 commits into from
Dec 10, 2019
Merged

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Dec 9, 2019

Fixes #1136 and #1131

The problem

In both the reported issues the segmentation fault would occur after vlog.DropAll (or if the log file was deleted). In case of dropAll, here's how we delete files

badger/value.go

Lines 739 to 757 in 8b99eb4

// We don't want to block dropAll on any pending transactions. So, don't worry about iterator
// count.
var count int
deleteAll := func() error {
vlog.filesLock.Lock()
defer vlog.filesLock.Unlock()
for _, lf := range vlog.filesMap {
if err := vlog.deleteLogFile(lf); err != nil {
return err
}
count++
}
vlog.filesMap = make(map[uint32]*logFile)
return nil
}
if err := deleteAll(); err != nil {
return count, err
}

At the same time, there could be a read on the same vlog file. Here's the code that reads from the file

badger/value.go

Lines 1411 to 1419 in 8b99eb4

buf, lf, err := vlog.readValueBytes(vp, s)
// log file is locked so, decide whether to lock immediately or let the caller to
// unlock it, after caller uses it.
cb := vlog.getUnlockCallback(lf)
if err != nil {
return nil, cb, err
}
if vlog.opt.VerifyValueChecksum {

The problem occurs if we try to access the buf returned by readValueBytes. There is a window in which if we try to access the buf, we get a segmentation fault.

The following sequence of events would cause the segmentation fault.

  1. vlog.Read is called and the control is at line 1411

    badger/value.go

    Lines 1411 to 1419 in 8b99eb4

    buf, lf, err := vlog.readValueBytes(vp, s)
    // log file is locked so, decide whether to lock immediately or let the caller to
    // unlock it, after caller uses it.
    cb := vlog.getUnlockCallback(lf)
    if err != nil {
    return nil, cb, err
    }
    if vlog.opt.VerifyValueChecksum {
  2. vlog.DropAll is called at the same time. vlog.DropAll completes while the control is at line 1411 in vlog.Read. This essentially means all the log files are deleted and a new vlog0 file is created. This file has a fmap (mmap buffer) of 2 GB and the file size is 0 bytes (since we haven't written anything)
  3. Now, the execution continues on vlog.Read and header.Decode (which is were the buf accessed for the first time) crashes with Segmentation fault ( SIGBUS error). As mentioned in comment Fix value log dropAll race condition #1140 (comment), the SIGBUS occurs when we try to access a mmap buffer which is beyond the actual file size.

With the following changes, I was able to reproduce the segmentation fault. Notice that we stop the execution at line 1411 in vlog.Read until DropAll completes.

diff --git a/managed_db_test.go b/managed_db_test.go
index 5a25174..c4e27ba 100644
--- a/managed_db_test.go
+++ b/managed_db_test.go
@@ -209,7 +209,10 @@ func TestDropAllWithPendingTxn(t *testing.T) {
 	wg.Add(1)
 	go func() {
 		defer wg.Done()
-		itr := txn.NewIterator(DefaultIteratorOptions)
+		d := DefaultIteratorOptions
+		// This is necessary otherwise we'll run into a deadlock in this test.
+		d.PrefetchValues = false
+		itr := txn.NewIterator(d)
 		defer itr.Close()
 
 		var keys []string
@@ -246,6 +249,7 @@ func TestDropAllWithPendingTxn(t *testing.T) {
 		defer wg.Done()
 		time.Sleep(2 * time.Second)
 		require.NoError(t, db.DropAll())
+		db.vlog.ch <- struct{}{}
 	}()
 	wg.Wait()
 }
diff --git a/value.go b/value.go
index b28b2b1..a63a519 100644
--- a/value.go
+++ b/value.go
@@ -780,6 +780,7 @@ type lfDiscardStats struct {
 
 type valueLog struct {
 	dirPath string
+	ch      chan struct{}
 	elog    trace.EventLog
 
 	// guards our view of which files exist, which to be deleted, how many active iterators
@@ -1004,6 +1005,7 @@ func (vlog *valueLog) replayLog(lf *logFile, offset uint32, replayFn logEntry) e
 func (vlog *valueLog) open(db *DB, ptr valuePointer, replayFn logEntry) error {
 	vlog.opt = db.opt
 	vlog.db = db
+	vlog.ch = make(chan struct{}, 1)
 	// We don't need to open any vlog files or collect stats for GC if DB is opened
 	// in InMemory mode. InMemory mode doesn't create any files/directories on disk.
 	if vlog.opt.InMemory {
@@ -1413,6 +1415,7 @@ func (vlog *valueLog) Read(vp valuePointer, s *y.Slice) ([]byte, func(), error)
 			"Invalid value pointer offset: %d greater than current offset: %d",
 			vp.Offset, vlog.woffset())
 	}
+	<-vlog.ch
 	buf, lf, err := vlog.readValueBytes(vp, s)
 	// log file is locked so, decide whether to lock immediately or let the caller to
 	// unlock it, after caller uses it.

This change is Reviewable

@coveralls
Copy link

coveralls commented Dec 9, 2019

Coverage Status

Coverage decreased (-7.9%) to 69.891% when pulling 5ccc431 on ibrahim/header-decode-fix into 8b99eb4 on master.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ashish-goswami)

@jarifibrahim jarifibrahim merged commit 7c5e6d7 into master Dec 10, 2019
@jarifibrahim jarifibrahim deleted the ibrahim/header-decode-fix branch December 10, 2019 15:22
jarifibrahim pushed a commit that referenced this pull request Mar 12, 2020
This commit fixes the segmentation fault in header.Decode that
would happen if we tried to read from a file that was mmapped
but didn't actually have any data in it. This commit fixes the
issue by checking the read is within the actual file size.

See #1150 for all
the details.

Fixes #1136 and
#1131

(cherry picked from commit 7c5e6d7)
manishrjain pushed a commit to outcaste-io/outserv that referenced this pull request Jul 6, 2022
This commit fixes the segmentation fault in header.Decode that
would happen if we tried to read from a file that was mmapped
but didn't actually have any data in it. This commit fixes the
issue by checking the read is within the actual file size.
    
See dgraph-io/badger#1150 for all
the details.

Fixes dgraph-io/badger#1136 and
dgraph-io/badger#1131
acodereviewersbestfriend52 added a commit to acodereviewersbestfriend52/badger that referenced this pull request Aug 3, 2024
This commit fixes the segmentation fault in header.Decode that
would happen if we tried to read from a file that was mmapped
but didn't actually have any data in it. This commit fixes the
issue by checking the read is within the actual file size.
    
See dgraph-io/badger#1150 for all
the details.

Fixes dgraph-io/badger#1136 and
dgraph-io/badger#1131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Seg fault in header.Decode
3 participants