Skip to content

Spending upto 50% of the time in lmdb(*.Txn).bytes while benchmarking Txn.Get #118

@deepakjois

Description

@deepakjois

I am benchmarking lmdb-go, BoltDB and Badger, using Go benchmarking tools.

In the setup that I have, BoltDB seems to be outperforming lmdb-go by at least an order of magnitude for doing random reads from the key value store. This is a bit surprising and I am trying to investigate why that is the case.

I did some CPU profiling and I noticed this:

$ go tool pprof badger-bench.test  lmdb.out
Entering interactive mode (type "help" for commands)
Showing top 10 nodes out of 44 (cum >= 10ms)
      flat  flat%   sum%        cum   cum%
     120ms 48.00% 48.00%      120ms 48.00%  github.com/bmatsuo/lmdb-go/lmdb.(*Txn).bytes
      80ms 32.00% 80.00%       80ms 32.00%  runtime.cgocall
      10ms  4.00% 84.00%       10ms  4.00%  fmt.(*pp).printArg
      10ms  4.00% 88.00%       10ms  4.00%  github.com/dgraph-io/badger/y.AssertTruef
      10ms  4.00% 92.00%       10ms  4.00%  runtime.(*gcWork).put
      10ms  4.00% 96.00%       10ms  4.00%  runtime.adjustframe
      10ms  4.00%   100%       20ms  8.00%  runtime.greyobject
         0     0%   100%       10ms  4.00%  fmt.(*pp).doPrintf
         0     0%   100%       10ms  4.00%  fmt.Sprintf
         0     0%   100%       10ms  4.00%  github.com/bmatsuo/lmdb-go/lmdb.(*Env).Close

the program spends close to 50% of the time in lmdb.(*Txn).bytes (from here). Is this expected? Can it be improved?

Here is the benchmarking code (view in context):

runRandomReadBenchmark(b, "lmdb", func(c *hitCounter, pb *testing.PB) {
	err := lmdbEnv.View(func(txn *lmdb.Txn) error {
		txn.RawRead = true
		for pb.Next() {
			key := newKey() // Generates a random key
			v, err := txn.Get(lmdbDBI, key)
			if lmdb.IsNotFound(err) {
				c.notFound++
				continue
			} else if err != nil {
				c.errored++
				continue
			}
			y.AssertTruef(len(v) == *flagValueSize, "Assertion failed. value size is %d, expected %d", len(v), *flagValueSize)
			c.found++
		}
		return nil
	})
	if err != nil {
		y.Check(err)
	}
})

The code inside the loop pb.Next {…} is called multiple times and an average calculated. Here are the results for LMDB and Bolt in a simple run:

ubuntu@ip-172-31-39-80:~/go/src/github.com/dgraph-io/badger-bench$ go test -v --bench BenchmarkReadRandomLmdb --keys_mil 5 --valsz 16384 --dir "/mnt/data/16kb"
BenchmarkReadRandomLmdb/read-randomlmdb-128                10000            129638 ns/op
--- BENCH: BenchmarkReadRandomLmdb
        bench_test.go:104: lmdb: 6370 keys had valid values.
        bench_test.go:105: lmdb: 3630 keys had no values
        bench_test.go:106: lmdb: 0 keys had errors
        bench_test.go:107: lmdb: 10000 total keys looked at
        bench_test.go:108: lmdb: hit rate : 0.64
PASS
ok      github.com/dgraph-io/badger-bench       1.362s


ubuntu@ip-172-31-39-80:~/go/src/github.com/dgraph-io/badger-bench$ go test -v --bench BenchmarkReadRandomBolt --keys_mil 5 --valsz 16384 --dir "/mnt/data/16kb"
BenchmarkReadRandomBolt/read-randombolt-128               100000             17122 ns/op
--- BENCH: BenchmarkReadRandomBolt
        bench_test.go:104: bolt: 63722 keys had valid values.
        bench_test.go:105: bolt: 36278 keys had no values
        bench_test.go:106: bolt: 0 keys had errors
        bench_test.go:107: bolt: 100000 total keys looked at
        bench_test.go:108: bolt: hit rate : 0.64
PASS

Bolt comes in at 17122 ns/op which is a lot faster than lmdb-go’s 129638 ns/op

Additional Details

Benchmarks are being performed on a dedicted i3.large instance from Amazon AWS, which provides 450GB NVMe SSD storage, 2 virtual cores along with 15.25GB RAM.

  • Data size on disk: 61GB
  • No. of Keys in DB: 5 million
  • Value Size: 16KB (constant)
  • Key Size: 22B

Please ask if you require any other details about the benchmarking setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions