BTree size too big #12

cznic · 2016-10-22T08:58:28Z

Track separately:

how much overhead does each btree node takes ? I tried to run an unrealistic case: saving 1M empty v, and the db size is ~60MB
for _, i := range rand.Perm(1<<20) {
    k := make([]byte, 4)
    binary.BigEndian.PutUint32(k, uint32(i))
    if e = btree.Set(k, nil); e != nil {
        log.Fatal(e)
    }
}

Investigate if it's a storage space (leak) bug or if the implementation is just crappy. Fix/improve it iff possible w/o breaking backward compatibility with any existing DBs.

The text was updated successfully, but these errors were encountered:

ghost · 2016-10-25T11:01:48Z

Additional info, the same workload in sqlite: 11MB, leveldb: 14MB, default settings.
Thanks for the work Jan, I decided to go with other DB for the moment ;)

cznic · 2016-10-25T11:16:22Z

I'm really sorry I haven't yet time to investigate the issue.

In any case, thanks again for your contributions. Also, I will add you as a collaborator now.

cznic · 2016-10-25T14:58:01Z

@deuter0n

I have some datapoints for you. Using d5b271c, on a bit old HW:

$ go test -v -run Issue12 -timeout 1h
=== RUN   TestIssue12
TestIssue12: Warning, run with -timeout at least 1h
--- PASS: TestIssue12 (1197.13s)
    all_test.go:106: Using compression (recommended): false
    all_test.go:122: 1048576 keys in 8m58.523277614s, 1947.132173 keys/s, 513.575µs s/key
    all_test.go:129: File size: 62589728, 59.690216 bytes/key
    all_test.go:136: 
        lldb.AllocStats{
        · Handles: 2486,
        · TotalAtoms: 3911851,
        · AllocBytes: 39917757,
        · AllocAtoms: 2499458,
        · Relocations: 2373,
        · FreeAtoms: 1412393,
            ... snip
        }
    all_test.go:106: Using compression (recommended): true
    all_test.go:122: 1048576 keys in 10m58.374390217s, 1592.674344 keys/s, 627.874µs s/key
    all_test.go:129: File size: 11666448, 11.125992 bytes/key
    all_test.go:136: 
        lldb.AllocStats{
        · Handles: 2486,
        · Compression: 2485,
        · TotalAtoms: 729146,
        · AllocBytes: 39917757,
        · AllocAtoms: 451371,
        · Relocations: 2233,
        · FreeAtoms: 277775,
            ... snip
        }
PASS
ok      github.com/cznic/lldb   1197.151s
$

The observation w/o compression confirms your original report - the file size is ~60MB.

However, when using compresion, as recommended in the documentation:

It's recommended to use compression. For example the BTrees implementation assumes compression is used. Using compression may cause a slowdown in some cases while it may as well cause a speedup.

the file size shrinks to ~11MB.

ghost · 2016-10-26T11:05:53Z

In my case I have to disable compression since it can weaken encryption. Compression depends on the data itself and affects the length of the data. That means that it can leak information about the data through the length bypassing the encryption.
The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.

cznic · 2016-10-26T11:36:04Z

The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.

It's complicated. For BTree items of fixed size it would be easier. For variable length items one has to compromise. Concatenating all the items in the BTree page is not an option b/c the maximum allocation size is capped to about 64kB.

Too small fixed item portion degrades performance b/c on access it may be necessary to reach for the overflown part. Too big fixed item portion degrades space utilization.

The value chosen is 19 bytes . I have benchmarked the value for various data types. Unfortunately, K/V items of total size 4 bytes are far below the good-on-average value. The value is also tuned to accommodate two encoded scalars each of size 8 bytes. That's what typically occurs in a SQL index.

FYI: The data layout is described here.

cznic · 2016-10-26T16:44:16Z

@deuter0n

One more question, if you don't mind. Are the size of the keys and/or the values of the BTree items that you want to use in your app known in advance, ie. are they fixed (per tree)?

ghost · 2016-10-27T16:45:54Z

So I guess it's a matter of tradeoff. By having immutable handles:

simpler read/write
simpler index update

no in-place vacuum
internal page fragmentation

Are the keys / values fixed ?

Some are fixed→fixed, some var→var. Any suggestion ?

I actually did the testcase above due to IndexSeek() confusion. So ...

k = path, e.g. "a/b", "a/c", "b/b", "b/c/d",
v = variable blobs
Tried "get biggest key in a/" using IndexSeek(collate=bytes.Compare, c=negate(bytes.Compare)) Expecting "a/c", didn't work, saw the TODO, or maybe I misunderstood.
Tried creating separate btree index, i.e. get "a/c" from btree2, then get actual v from btree1.
Found btree2 too big

cznic · 2016-10-28T09:56:24Z

So I guess it's a matter of tradeoff.

True.

Some are fixed→fixed, some var→var. Any suggestion ?

I can imagine adding, say CreateFixedBtree(keySize, valueSize int, ...). It will actually be useful also for QL.

k = path, e.g. "a/b", "a/c", "b/b", "b/c/d", v = variable blobs

Tried "get biggest key in a/" using IndexSeek(collate=bytes.Compare, c=negate(bytes.Compare)) Expecting "a/c", didn't work, saw the TODO, or maybe I misunderstood.

I think that to find the biggest key prefixed with "a/" it would be necessary to Seek (not IndexSeek) to the first key which collates lexically after, in this case "b" and then do Prev on the iterator.

The IndexSeek is meant to support range queries on multidimensional data (composite indexes), but the multidimensional index handling code is not yet implemented, hence no test either.

Also, wrt the concerns with encryption and information leaking. When thinking more about this, it seems to me no information can leak due to this until the attacker knows where the blocks reside. But that cannot be inferred w/o decrypting the DB in the first place. I am probably missing something.

ghost · 2016-10-31T09:38:33Z

No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.

IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.

cznic · 2016-10-31T10:31:05Z

No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.

Interesting. I thought about the diff, but I assumed that warrants access to the encrypting machine, meaning encryption cannot help anyhow. I forgot to consider the encrypting machine can be manipulated to produce the diff-revealing data.

IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.

#TIL 😄

cznic mentioned this issue Oct 24, 2016

linter produces some warnings #3

Closed

cznic self-assigned this Oct 25, 2016

cznic added the question label Oct 25, 2016

cznic pushed a commit that referenced this issue Oct 25, 2016

Add some BTree measurements. Updates #12.

d5b271c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BTree size too big #12

BTree size too big #12

cznic commented Oct 22, 2016

ghost commented Oct 25, 2016

cznic commented Oct 25, 2016

cznic commented Oct 25, 2016

ghost commented Oct 26, 2016

cznic commented Oct 26, 2016

cznic commented Oct 26, 2016

ghost commented Oct 27, 2016

cznic commented Oct 28, 2016

ghost commented Oct 31, 2016

cznic commented Oct 31, 2016

BTree size too big #12

BTree size too big #12

Comments

cznic commented Oct 22, 2016

ghost commented Oct 25, 2016

cznic commented Oct 25, 2016

cznic commented Oct 25, 2016

ghost commented Oct 26, 2016

cznic commented Oct 26, 2016

cznic commented Oct 26, 2016

ghost commented Oct 27, 2016

cznic commented Oct 28, 2016

ghost commented Oct 31, 2016

cznic commented Oct 31, 2016