Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

u64 support #38

Closed
bvinc opened this issue Mar 6, 2017 · 4 comments
Closed

u64 support #38

bvinc opened this issue Mar 6, 2017 · 4 comments

Comments

@bvinc
Copy link
Contributor

bvinc commented Mar 6, 2017

Am I correct in seeing that u64 bitmaps have been removed? If so, why? The commit message said that u64 never worked. Why didn't it work?

Can this be a feature request for u64 roaring bitmaps? Or are they gone forever?

@Nemo157
Copy link
Collaborator

Nemo157 commented Mar 6, 2017

Normal 32-bit Roaring bitmaps split the 32-bit space into 16,384 containers; each containing either up to 4096 16-bit integers if in array format, or a 8kB 16-bit bitmap if in bitmap format. (and have a paper backing up the design with tests)

My naïve extension to 64bit was to extend the key and value size of the containers to 32bits and keep the same guarantee that at the switchover point the size of the integer list is the same as the size of the full bitmap in the container. This meant the 64-bit space was divided into ~4 billion (2^32) containers; each containing either up to ~134 million (2^27) 32-bit integers if in array format, or a 512MB 32-bit bitmap if in bitmap format. (and do not have a paper backing up the design with tests)

Those numbers are a bit large for current memory sizes. There's probably a few workloads that the scheme would work for, but in most cases I think a sorted list and binary search would work better as it doesn't have the overhead of checking for the bitmap format you're unlikely to reach.


In terms of adding a 64-bit version I would definitely be for it, there's some information on better ways to implement it at RoaringBitmap/RoaringBitmap#109 and RoaringBitmap/CRoaring#1, and the C++ layer of CRoaring has an implementation of the TreeMap method described in http://r-libre.teluq.ca/930/1/Roaring64bits.pdf. I just skimmed the paper via Google Translate and it should be pretty simple to implement that method using std::collections::BTreeMap.

@Nemo157
Copy link
Collaborator

Nemo157 commented Mar 6, 2017

Judging by how little coding I've been doing at home lately I doubt I'll get round to working on this myself anytime soon, if anyone feels like working on it I would be happy to offer whatever help is needed.

@bvinc
Copy link
Contributor Author

bvinc commented Mar 8, 2017

I think I can handle it. I can't read that paper, but I read the C++ code.

inherent.rs has been pretty easy so far. Let me know if you have any comments.

https://github.com/bvinc/roaring-rs/blob/roaring64/src/roaring64/inherent.rs

@Nemo157
Copy link
Collaborator

Nemo157 commented Mar 8, 2017

Looks great :) only suggestion I have is maybe call it RoaringTreemap or something similar so it's obvious what algorithm is being used. That way if any of the other 64bit algorithms get implemented it's easier for users to choose the one that works best for their dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants