Skip to content
This repository has been archived by the owner on Nov 22, 2018. It is now read-only.

BTree size too big #12

Open
cznic opened this issue Oct 22, 2016 · 10 comments
Open

BTree size too big #12

cznic opened this issue Oct 22, 2016 · 10 comments
Assignees
Labels

Comments

@cznic
Copy link
Owner

cznic commented Oct 22, 2016

Track separately:


In #3 @deuter0n wrote:

how much overhead does each btree node takes ? I tried to run an unrealistic case: saving 1M empty v, and the db size is ~60MB

for _, i := range rand.Perm(1<<20) {
    k := make([]byte, 4)
    binary.BigEndian.PutUint32(k, uint32(i))
    if e = btree.Set(k, nil); e != nil {
        log.Fatal(e)
    }
}

Investigate if it's a storage space (leak) bug or if the implementation is just crappy. Fix/improve it iff possible w/o breaking backward compatibility with any existing DBs.

@ghost
Copy link

ghost commented Oct 25, 2016

Additional info, the same workload in sqlite: 11MB, leveldb: 14MB, default settings.
Thanks for the work Jan, I decided to go with other DB for the moment ;)

@cznic
Copy link
Owner Author

cznic commented Oct 25, 2016

I'm really sorry I haven't yet time to investigate the issue.

In any case, thanks again for your contributions. Also, I will add you as a collaborator now.

@cznic cznic self-assigned this Oct 25, 2016
@cznic cznic added the question label Oct 25, 2016
cznic pushed a commit that referenced this issue Oct 25, 2016
@cznic
Copy link
Owner Author

cznic commented Oct 25, 2016

@deuter0n

I have some datapoints for you. Using d5b271c, on a bit old HW:

$ go test -v -run Issue12 -timeout 1h
=== RUN   TestIssue12
TestIssue12: Warning, run with -timeout at least 1h
--- PASS: TestIssue12 (1197.13s)
    all_test.go:106: Using compression (recommended): false
    all_test.go:122: 1048576 keys in 8m58.523277614s, 1947.132173 keys/s, 513.575µs s/key
    all_test.go:129: File size: 62589728, 59.690216 bytes/key
    all_test.go:136: 
        lldb.AllocStats{
        · Handles: 2486,
        · TotalAtoms: 3911851,
        · AllocBytes: 39917757,
        · AllocAtoms: 2499458,
        · Relocations: 2373,
        · FreeAtoms: 1412393,
            ... snip
        }
    all_test.go:106: Using compression (recommended): true
    all_test.go:122: 1048576 keys in 10m58.374390217s, 1592.674344 keys/s, 627.874µs s/key
    all_test.go:129: File size: 11666448, 11.125992 bytes/key
    all_test.go:136: 
        lldb.AllocStats{
        · Handles: 2486,
        · Compression: 2485,
        · TotalAtoms: 729146,
        · AllocBytes: 39917757,
        · AllocAtoms: 451371,
        · Relocations: 2233,
        · FreeAtoms: 277775,
            ... snip
        }
PASS
ok      github.com/cznic/lldb   1197.151s
$

The observation w/o compression confirms your original report - the file size is ~60MB.

However, when using compresion, as recommended in the documentation:

It's recommended to use compression. For example the BTrees implementation assumes compression is used. Using compression may cause a slowdown in some cases while it may as well cause a speedup.

the file size shrinks to ~11MB.

@ghost
Copy link

ghost commented Oct 26, 2016

In my case I have to disable compression since it can weaken encryption. Compression depends on the data itself and affects the length of the data. That means that it can leak information about the data through the length bypassing the encryption.
The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.

@cznic
Copy link
Owner Author

cznic commented Oct 26, 2016

The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.

It's complicated. For BTree items of fixed size it would be easier. For variable length items one has to compromise. Concatenating all the items in the BTree page is not an option b/c the maximum allocation size is capped to about 64kB.

Too small fixed item portion degrades performance b/c on access it may be necessary to reach for the overflown part. Too big fixed item portion degrades space utilization.

The value chosen is 19 bytes . I have benchmarked the value for various data types. Unfortunately, K/V items of total size 4 bytes are far below the good-on-average value. The value is also tuned to accommodate two encoded scalars each of size 8 bytes. That's what typically occurs in a SQL index.

FYI: The data layout is described here.

@cznic
Copy link
Owner Author

cznic commented Oct 26, 2016

@deuter0n

One more question, if you don't mind. Are the size of the keys and/or the values of the BTree items that you want to use in your app known in advance, ie. are they fixed (per tree)?

@ghost
Copy link

ghost commented Oct 27, 2016

So I guess it's a matter of tradeoff. By having immutable handles:

  • simpler read/write
  • simpler index update
  • no in-place vacuum
  • internal page fragmentation

Are the keys / values fixed ?

Some are fixed→fixed, some var→var. Any suggestion ?

I actually did the testcase above due to IndexSeek() confusion. So ...

  1. k = path, e.g. "a/b", "a/c", "b/b", "b/c/d",
    v = variable blobs
  2. Tried "get biggest key in a/" using IndexSeek(collate=bytes.Compare, c=negate(bytes.Compare)) Expecting "a/c", didn't work, saw the TODO, or maybe I misunderstood.
  3. Tried creating separate btree index, i.e. get "a/c" from btree2, then get actual v from btree1.
  4. Found btree2 too big

@cznic
Copy link
Owner Author

cznic commented Oct 28, 2016

So I guess it's a matter of tradeoff.

True.

Some are fixed→fixed, some var→var. Any suggestion ?

I can imagine adding, say CreateFixedBtree(keySize, valueSize int, ...). It will actually be useful also for QL.

  1. k = path, e.g. "a/b", "a/c", "b/b", "b/c/d", v = variable blobs
  2. Tried "get biggest key in a/" using IndexSeek(collate=bytes.Compare, c=negate(bytes.Compare)) Expecting "a/c", didn't work, saw the TODO, or maybe I misunderstood.

I think that to find the biggest key prefixed with "a/" it would be necessary to Seek (not IndexSeek) to the first key which collates lexically after, in this case "b" and then do Prev on the iterator.

The IndexSeek is meant to support range queries on multidimensional data (composite indexes), but the multidimensional index handling code is not yet implemented, hence no test either.

Also, wrt the concerns with encryption and information leaking. When thinking more about this, it seems to me no information can leak due to this until the attacker knows where the blocks reside. But that cannot be inferred w/o decrypting the DB in the first place. I am probably missing something.

@ghost
Copy link

ghost commented Oct 31, 2016

No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.

IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.

@cznic
Copy link
Owner Author

cznic commented Oct 31, 2016

No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.

Interesting. I thought about the diff, but I assumed that warrants access to the encrypting machine, meaning encryption cannot help anyhow. I forgot to consider the encrypting machine can be manipulated to produce the diff-revealing data.

IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.

#TIL 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant