-
Notifications
You must be signed in to change notification settings - Fork 61
Description
I am benchmarking lmdb-go, BoltDB and Badger, using Go benchmarking tools.
In the setup that I have, BoltDB seems to be outperforming lmdb-go by at least an order of magnitude for doing random reads from the key value store. This is a bit surprising and I am trying to investigate why that is the case.
I did some CPU profiling and I noticed this:
$ go tool pprof badger-bench.test lmdb.out
Entering interactive mode (type "help" for commands)
Showing top 10 nodes out of 44 (cum >= 10ms)
flat flat% sum% cum cum%
120ms 48.00% 48.00% 120ms 48.00% github.com/bmatsuo/lmdb-go/lmdb.(*Txn).bytes
80ms 32.00% 80.00% 80ms 32.00% runtime.cgocall
10ms 4.00% 84.00% 10ms 4.00% fmt.(*pp).printArg
10ms 4.00% 88.00% 10ms 4.00% github.com/dgraph-io/badger/y.AssertTruef
10ms 4.00% 92.00% 10ms 4.00% runtime.(*gcWork).put
10ms 4.00% 96.00% 10ms 4.00% runtime.adjustframe
10ms 4.00% 100% 20ms 8.00% runtime.greyobject
0 0% 100% 10ms 4.00% fmt.(*pp).doPrintf
0 0% 100% 10ms 4.00% fmt.Sprintf
0 0% 100% 10ms 4.00% github.com/bmatsuo/lmdb-go/lmdb.(*Env).Close
the program spends close to 50% of the time in lmdb.(*Txn).bytes (from here). Is this expected? Can it be improved?
Here is the benchmarking code (view in context):
runRandomReadBenchmark(b, "lmdb", func(c *hitCounter, pb *testing.PB) {
err := lmdbEnv.View(func(txn *lmdb.Txn) error {
txn.RawRead = true
for pb.Next() {
key := newKey() // Generates a random key
v, err := txn.Get(lmdbDBI, key)
if lmdb.IsNotFound(err) {
c.notFound++
continue
} else if err != nil {
c.errored++
continue
}
y.AssertTruef(len(v) == *flagValueSize, "Assertion failed. value size is %d, expected %d", len(v), *flagValueSize)
c.found++
}
return nil
})
if err != nil {
y.Check(err)
}
})The code inside the loop pb.Next {…} is called multiple times and an average calculated. Here are the results for LMDB and Bolt in a simple run:
ubuntu@ip-172-31-39-80:~/go/src/github.com/dgraph-io/badger-bench$ go test -v --bench BenchmarkReadRandomLmdb --keys_mil 5 --valsz 16384 --dir "/mnt/data/16kb"
BenchmarkReadRandomLmdb/read-randomlmdb-128 10000 129638 ns/op
--- BENCH: BenchmarkReadRandomLmdb
bench_test.go:104: lmdb: 6370 keys had valid values.
bench_test.go:105: lmdb: 3630 keys had no values
bench_test.go:106: lmdb: 0 keys had errors
bench_test.go:107: lmdb: 10000 total keys looked at
bench_test.go:108: lmdb: hit rate : 0.64
PASS
ok github.com/dgraph-io/badger-bench 1.362s
ubuntu@ip-172-31-39-80:~/go/src/github.com/dgraph-io/badger-bench$ go test -v --bench BenchmarkReadRandomBolt --keys_mil 5 --valsz 16384 --dir "/mnt/data/16kb"
BenchmarkReadRandomBolt/read-randombolt-128 100000 17122 ns/op
--- BENCH: BenchmarkReadRandomBolt
bench_test.go:104: bolt: 63722 keys had valid values.
bench_test.go:105: bolt: 36278 keys had no values
bench_test.go:106: bolt: 0 keys had errors
bench_test.go:107: bolt: 100000 total keys looked at
bench_test.go:108: bolt: hit rate : 0.64
PASS
Bolt comes in at 17122 ns/op which is a lot faster than lmdb-go’s 129638 ns/op
Additional Details
Benchmarks are being performed on a dedicted i3.large instance from Amazon AWS, which provides 450GB NVMe SSD storage, 2 virtual cores along with 15.25GB RAM.
- Data size on disk: 61GB
- No. of Keys in DB: 5 million
- Value Size: 16KB (constant)
- Key Size: 22B
Please ask if you require any other details about the benchmarking setup.