You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scenario I would like to optimize for is storing many sketches that contain few elements. In my case I have a more-or-less zipfian distribution of set cardinalities (few large ones and a long tail of small ones).
118 bytes is considerably more than, for example, Redis which uses 21 bytes for a HLL containing a single element. See my gist on that.
Looking over the code today and previously I'm considering a few optimizations. @seiflotfy I'd like to get you opinion on some of them before I get my hands dirty.
Remove tmpSet. This would already reduce the memory footprint by half. As far as I understand the code it's used to "cache" multiple elements inserted into a the sparse representation of the sketch. I believe the rationale here was to reduce the time spent on extending and traversing the compressed list. Lack of this cache can be mitigated for example by providing a InsertMany function instead.
Remove 2 pointers to compressedList and registers. Both those types represent []uint8 slices with some additional metadata. That metadata could be rolled into the data structure. That's a 16 bytes saved for pointers. The tradeoff here is code readability.
Don't precompute alpha and m. 12 bytes saving. A few CPU cycles are a tradeoff here.
The above memory optimizations would amount to savings of 75 bytes bringing down the size of an empty sketch to 43 bytes.
typeSketchstruct {
sparsebool// 1 byte - bringing it back since there is no *compressedList to match againstpuint8// 1 bytebuint8// 1 byteregs []uint32// 24 bytesnzuint32// 4 bytescountuint32// 4 byteslastuint32// 4 bytes// 43 bytes total
}
What are your thoughts?
The text was updated successfully, but these errors were encountered:
I love the ideas... I'd like to have each one of those implemented at a time... Wanna open 3 issues and work on those... we can then have parallel yet separate conversations on all
A new instance of a sparse Sketch requires around 118 bytes of memory on amd64.
The scenario I would like to optimize for is storing many sketches that contain few elements. In my case I have a more-or-less zipfian distribution of set cardinalities (few large ones and a long tail of small ones).
118 bytes is considerably more than, for example, Redis which uses 21 bytes for a HLL containing a single element. See my gist on that.
Looking over the code today and previously I'm considering a few optimizations. @seiflotfy I'd like to get you opinion on some of them before I get my hands dirty.
Remove tmpSet. This would already reduce the memory footprint by half. As far as I understand the code it's used to "cache" multiple elements inserted into a the sparse representation of the sketch. I believe the rationale here was to reduce the time spent on extending and traversing the compressed list. Lack of this cache can be mitigated for example by providing a
InsertMany
function instead.Remove 2 pointers to
compressedList
andregisters
. Both those types represent[]uint8
slices with some additional metadata. That metadata could be rolled into the data structure. That's a 16 bytes saved for pointers. The tradeoff here is code readability.Don't precompute
alpha
andm
. 12 bytes saving. A few CPU cycles are a tradeoff here.The above memory optimizations would amount to savings of 75 bytes bringing down the size of an empty sketch to 43 bytes.
What are your thoughts?
The text was updated successfully, but these errors were encountered: