-
Notifications
You must be signed in to change notification settings - Fork 1
BTree size too big #12
Comments
Additional info, the same workload in sqlite: 11MB, leveldb: 14MB, default settings. |
I'm really sorry I haven't yet time to investigate the issue. In any case, thanks again for your contributions. Also, I will add you as a collaborator now. |
I have some datapoints for you. Using d5b271c, on a bit old HW:
The observation w/o compression confirms your original report - the file size is ~60MB. However, when using compresion, as recommended in the documentation:
the file size shrinks to ~11MB. |
In my case I have to disable compression since it can weaken encryption. Compression depends on the data itself and affects the length of the data. That means that it can leak information about the data through the length bypassing the encryption. |
It's complicated. For BTree items of fixed size it would be easier. For variable length items one has to compromise. Concatenating all the items in the BTree page is not an option b/c the maximum allocation size is capped to about 64kB. Too small fixed item portion degrades performance b/c on access it may be necessary to reach for the overflown part. Too big fixed item portion degrades space utilization. The value chosen is 19 bytes . I have benchmarked the value for various data types. Unfortunately, K/V items of total size 4 bytes are far below the good-on-average value. The value is also tuned to accommodate two encoded scalars each of size 8 bytes. That's what typically occurs in a SQL index. FYI: The data layout is described here. |
One more question, if you don't mind. Are the size of the keys and/or the values of the BTree items that you want to use in your app known in advance, ie. are they fixed (per tree)? |
So I guess it's a matter of tradeoff. By having immutable handles:
Some are fixed→fixed, some var→var. Any suggestion ? I actually did the testcase above due to IndexSeek() confusion. So ...
|
True.
I can imagine adding, say
I think that to find the biggest key prefixed with The Also, wrt the concerns with encryption and information leaking. When thinking more about this, it seems to me no information can leak due to this until the attacker knows where the blocks reside. But that cannot be inferred w/o decrypting the DB in the first place. I am probably missing something. |
No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack. IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable. |
Interesting. I thought about the diff, but I assumed that warrants access to the encrypting machine, meaning encryption cannot help anyhow. I forgot to consider the encrypting machine can be manipulated to produce the diff-revealing data.
#TIL 😄 |
Track separately:
In #3 @deuter0n wrote:
Investigate if it's a storage space (leak) bug or if the implementation is just crappy. Fix/improve it iff possible w/o breaking backward compatibility with any existing DBs.
The text was updated successfully, but these errors were encountered: