Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of new sparse sketch #26

Closed
maciej opened this issue Nov 9, 2019 · 2 comments
Closed

Reduce size of new sparse sketch #26

maciej opened this issue Nov 9, 2019 · 2 comments

Comments

@maciej
Copy link
Contributor

maciej commented Nov 9, 2019

A new instance of a sparse Sketch requires around 118 bytes of memory on amd64.

type Sketch struct {
	p          uint8                // 1 byte
	b          uint8                // 1 byte
	m          uint32               // 4 bytes
	alpha      float64              // 8 bytes
	tmpSet     set					// 56 bytes
	sparseList *compressedList      // 40 bytes  (8 + 2 * 4 + 24 (pointer + 2 x uint32 + slice))
	regs       *registers           // 8 bytes (nil pointer)
	                                // 118 bytes total
}

The scenario I would like to optimize for is storing many sketches that contain few elements. In my case I have a more-or-less zipfian distribution of set cardinalities (few large ones and a long tail of small ones).
118 bytes is considerably more than, for example, Redis which uses 21 bytes for a HLL containing a single element. See my gist on that.

Looking over the code today and previously I'm considering a few optimizations. @seiflotfy I'd like to get you opinion on some of them before I get my hands dirty.

  1. Remove tmpSet. This would already reduce the memory footprint by half. As far as I understand the code it's used to "cache" multiple elements inserted into a the sparse representation of the sketch. I believe the rationale here was to reduce the time spent on extending and traversing the compressed list. Lack of this cache can be mitigated for example by providing a InsertMany function instead.

  2. Remove 2 pointers to compressedList and registers. Both those types represent []uint8 slices with some additional metadata. That metadata could be rolled into the data structure. That's a 16 bytes saved for pointers. The tradeoff here is code readability.

  3. Don't precompute alpha and m. 12 bytes saving. A few CPU cycles are a tradeoff here.

The above memory optimizations would amount to savings of 75 bytes bringing down the size of an empty sketch to 43 bytes.

type Sketch struct {
	sparse bool     // 1 byte - bringing it back since there is no *compressedList to match against
	p      uint8    // 1 byte
	b      uint8    // 1 byte
	regs   []uint32 // 24 bytes
	nz     uint32   // 4 bytes
	count  uint32   // 4 bytes
	last   uint32   // 4 bytes
	                // 43 bytes total
}

What are your thoughts?

@seiflotfy
Copy link
Member

I love the ideas... I'd like to have each one of those implemented at a time... Wanna open 3 issues and work on those... we can then have parallel yet separate conversations on all

@seiflotfy
Copy link
Member

I think this is covered now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants