Quickdump size improvements #2788
There are a few ways the size of
At the moment
To show how much space is wasted, think of a KeySet with 100 Keys all of which have the metadata
Markers for short strings
This comes with only minor performance hit during reading and writing. However, as soon as we hit 256 bytes string length (might occur for key values, but otherwise unlikely), we will be wasting bytes again.
Even more markers
The above solution could be extended. At the moments the marker bytes are expressive (
Similar performance impact to above. Never wasting bytes. Reduced expressiveness (not that important in a binary format). Reduced possibilities for future markers (only 64 instead of 256 possible markers).
Variable Length Integer Encoding
We could also simply always use a variable length integer encoding for string sizes. For example LEB128 or PrefixVarint from https://github.com/stoklund/varint.
This allows quick encoding and decoding and also moving a KeySet from one parent to another. However, since KeySets are supposed to be hierarchical, we waste a lot of space, by repeating common the same prefix over and over again.
For example this key set:
With the parent key
You can easily see all the repetition. Instead we could use this:
Here by default key names are appended to the previous key name instead of the parent key. Additionally, for when the next key is not below the previous one, there are special markers.
This is quite easy to implement in reading part of
Thank you for the detailed discussion. Yes, I agree: quickdump is meant to be fast, not small.
Trying to make it small now, will probably make it slow without too much success that it will be small enough.
So I either we compress the whole quickdump as-is (if this is good enough) or make a new format which is as small as possible.
For communication with specload, it might make sense to send some header, so that specload can use different formats (e.g. different compressions).
In any case: do we really need this, as we agreed to have compiler-options to completely remove the spec?
I can't really know, because I don't have a full picture of the tradeoffs involved. I'd first benchmark the other improvements and variants.
Maybe I shouldn't be involving myself in this discussion, but:
Generally speaking I kind of doubt that. Often reading a small file from storage is so much faster, that you can easily expand it while the big version is read.
Also, if you are actually concerned about speed, wouldn't you store the keys in some more efficient representation than raw strings internally anyway to make lookups/compares faster (while probably saving RAM at the same time because of reduced duplication)? The same internal representation might also provide what you need to write the keys out in compact form to storage quickly.
You are always welcome to join discussions.
It was not a general statement but about quickdump and the suggestions above. Both the quickdump as now and a new plugin with the suggestions above make sense depending on the available memory, IO speed and so on.
The lookup currently uses either bsearch with memcmp or, if available, a hash lookup. Both obviously need the whole key name to find the correct key. I doubt that someone can beat this performance-wise without immense effort (comparable to what was needed to bring bsearch where it is now).
I know. But as I know nothing about the code in question and can only remark in very general terms, there is a high risk this just adds noise.
Fair enough. Intuitively I'd expect a tree based data structure to give much faster lookups. But maybe the average key set just isn't big enough for this to actually show off.
If you have already sorted the keys for bsearch, prefix elimination should be fairly cheap though ...
Why so? It is also a binary search (log n) but the data is much more distributed in RAM. As said before, with huge investments in performance improvements you might get the same lookup time. And you need to implement your own memory management, otherwise trees needs lot of mallocs.
What would be very interesting is an alternative implementation of Elektra which completely works without mallocs.
I often have large key sets, e.g. 35216 keys is normal. But then you want a hash map for fast lookups.
I didn't prove it, but i think the complexity of bsearch is O(m*log(n)) while tree search should be doable in O(m+log(n)) where m is the typical length of a key. The point of a tree structure of course would be, that the memory footprint of the whole key set is smaller then storing all keys in full length, thus making RAM access more localized instead of less. If this doesn't hold, then the tree indeed is a bad idea.
Yes, and also pointers (on 64-bit systems) would be a bit large. The nodes would need to be addressed by array indices instead.
That seems like a hard challenge. But only implementing the most important core data structures without mallocs might be doable. (Actually I'm not sure what the benefit would be, why it is interesting to you.)
I'm not familiar with the tradeoffs of hash maps.
Not to good too much off topic here, but a tree like structure might actually help a lot with spec stuff. You could easily lookup globbing expressions at runtime with a lot less processing. The spec plugin would also become much more efficient, since you could create a tree of the spec and off the actual configuration and then try to match those two trees.
We would still have to store the full key names. Otherwise you would have to traverse the tree every time you need the full key name for something. But like I said above, at least in some situations a tree structure could improve performance or even enable new features.
AFAIK the hash map implementation that Elektra uses, provides essentially constant lookup times and also preserves ordering. On the other side there is a significant effort involved in calculating the hash map in the first place. @markus2330 please correct me, if I am wrong.
Anyway for the case of quickdump:
The first improvement I suggested (option for 1 byte string sizes), would have basically no impact on reading. It would just be an addition
To show how much space this alone could save, here is a calculation for the
The whole spec has 454 keys. The longest keyname used is 64 bytes. As these are spec keys none of those have values. All keys together have 2557 metakeys, the longest of which is 32 bytes long. Of the corresponding metavalues 73 are longer than 255 bytes.
For any of the strings with length less than 255 (including the empty ones) we would save 7 bytes. In total this makes
The longest string the whole spec (one of the descriptions) was 1020 bytes, so the second and third suggestion would save another
Using an actual variable length integer encoding would definitely have a performance impact that has to be benchmarked. However, considering that WebAssembly and various other projects (e.g. Protobuf) use such encodings while maintaining performance, this should be doable as well.
Actually, @chrysn came up with the requirement and there seems to be a market for systems which do not support mallocs. It is currently not listed in the public topics for students.
It is definitely not prohibited to transform the KeySet to other more suitable data structures. Key's are actually designed that they can be inserted into other data structures while still being members of KeySets. E.g. the mounting logic is implemented with a trie because the KeySet is not suitable for suffix-lookups. For parser/generators it also makes sense to have a data structure which represents the file format. (For hosts file we use an array in the order of the hosts entry. And maybe @sanssecours implemented new ones? Ini and yajl clearly show limitations of exclusively using KeySets.)
According to @mpranj the pointer correction is quite cheap.
Yes, because of that it only pays off for larger key sets and/or many lookups. The hash map is also stored in mmap, so the buildup does not need to pay off during a single execution of a binary. (This is not benchmarked, though.)
Of course it is doable but it is a project on its own. And we first should check if we can use one of the existing implementations.