Skip to content

Using badger.IteratorOptions.Prefix can hide data if certain keys have been written #992

@jtacoma

Description

@jtacoma

What version of Go are you using (go version)?

$ go version
go version go1.11.6 linux/amd64

What version of Badger are you using?

I have tried with v1.6.0 as well as latest master (7a7dd17 at time of writing).

Does this issue reproduce with the latest master?

Yes.

What are the hardware specifications of the machine (RAM, OS, Disk)?

16 GiB RAM
Linux machine 4.19.0-5-amd64
400 GiB SSD

What did you do?

I wrote two items to a badger database. The first of these items has the key []byte{0}. Then I closed the database, re-opened it, and tried to find the second item using a prefix iterator, and cannot find it. Here is a test that fails, showing the issue:

package main

import (
	"io/ioutil"
	"os"
	"testing"

	"github.com/dgraph-io/badger"
)

func TestBadger(t *testing.T) {
	dir, err := ioutil.TempDir("", "test-badger")
	if err != nil {
		t.Fatal(err)
	}
	defer os.RemoveAll(dir)
	func() {
		db, err := badger.Open(badger.DefaultOptions(dir))
		if err != nil {
			t.Fatal(err)
		}
		defer db.Close()
		txn := db.NewTransaction(true)
		defer txn.Discard()

		// Setting a value for key []byte{0} is necessary to trigger the bug.
		if err := txn.Set([]byte{0}, nil); err != nil {
			t.Fatal(err)
		}

		// This is the item we're looking for later. It will remain available
		// within the scope of this *badger.DB connection, but becomes partially
		// unavailable to later connections.
		if err := txn.Set(testKey(), []byte("42")); err != nil {
			t.Fatal(err)
		} else if err := txn.Commit(); err != nil {
			t.Fatal(err)
		}

		// Verify that the item we're looking for is available.
		got := CountAll(db)
		if err != nil {
			t.Fatal(err)
		} else if got != 1 {
			t.Fatalf("want 1, got %v", got)
		}
	}()
	func() {
		db, err := badger.Open(badger.DefaultOptions(dir))
		if err != nil {
			t.Fatal(err)
		}
		defer db.Close()

		got := CountAll(db)
		if err != nil {
			t.Fatal(err)
		} else if got != 1 {
			// The test fails here. CountAll is the same function used to verify the
			// availability of the item we're looking for in the previous *badger.DB
			// connection, but this time CountAll(db) returns zero.
			t.Fatalf("want 1, got %v", got)
		}
	}()
}

func testKey() []byte { return []byte{0, 0, 1} }

func CountAll(db *badger.DB) int {
	txn := db.NewTransaction(false)
	defer txn.Discard()
	opts := badger.DefaultIteratorOptions

	// The following line must be included for the bug to occur. This is probably
	// an important hint, as it identifies a more specific code path.
	opts.Prefix = testKey()

	iter := txn.NewIterator(opts)
	defer iter.Close()
	count := 0
	for iter.Seek(testKey()); iter.Valid(); iter.Next() {
		count++
	}
	return count
}

What did you expect to see?

I expect data that I wrote in an earlier badger.DB connection, and that was available within the scope of that connection, to be available in the same way in a later badger.DB connection to the same database.

What did you see instead?

The data I wrote is no longer available. I've found two necessary conditions for this bug: first, the earlier DB connection must include writing to the key []byte{0}; second, the data is only unavailable in a separate DB connection when badger.IteratorOptions.Prefix is set.

This can also happen if []byte{0, 0} is used instead of []byte{0}. However, []byte{0, 0, 0} does not trigger the bug. Since []byte{1} also does not trigger the bug, I've started using that as a work-around, but this is temporary: after writing enough data, the same bug seems to happening again.

Metadata

Metadata

Assignees

Labels

area/data-lossIssues related to data loss or corruption.kind/bugSomething is broken.status/needs-attentionThis issue needs more eyes on it, more investigation might be required before accepting/rejecting it

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions