Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incrementing CMS above 32-bit integer max rolls over to 0 without a warning #295

Open
nathan-contino opened this issue Apr 28, 2021 · 6 comments

Comments

@nathan-contino
Copy link

Hi! I was using the RedisBloom module for a project recently and I noticed that incrementing Count-Min Sketches above the 32-bit integer max of 4,294,967,291 rolls over to 0 without any kind of warning. Interestingly enough, CMS.INCRBY accepts arguments of greater than the integer max (I can CMS.INCRBY 4,294,967,292, for instance). I feel like there should be a warning here, or perhaps the key value should remain locked at 4,294,967,291 to avoid violating the CMS constraint that Count-Min Sketches should never undercount a value.

Additionally, it seems you can CMS.INCRBY negative values, which also violates the "never undercount a value" constraint (a conflict between two keys, one of which is INCRBY a negative value, can result in an undercount of a key). Is this intended behavior, or should CMS reject negative INCRBY arguments?

@banker
Copy link
Contributor

banker commented Apr 30, 2021

cc @ashtul

@ashtul
Copy link
Contributor

ashtul commented May 2, 2021

@rafie @K-Jo @gkorland
How should we handle this? Extending the bucket to use 64 bits will double the filter size and won't benefit most users.

  • Should we make this a compilation flag?
  • If we issue a warning, then what? Block the filter?

@nathan-contino, do you have any suggestions about behavior that would make sense?

@gkorland
Copy link
Contributor

gkorland commented May 2, 2021

@ashtul can we make it configurable on CMS creation?

@nathan-contino
Copy link
Author

Behaviour-wise, I would (personally speaking) think that when a CMS bucket reaches integer max, we should transition to something like Integer.MAX_VALUE in Java and lock the bucket at that value instead of just rolling over to 0, which is maybe a bit counterintuitive since you can increment and end up with a value less than what you originally had. That kind of counterintuitive behaviour is exactly what we want to avoid since it'll cause footgun edge cases that can cause weird bugs for folks using CMS in production.

Based on my knowlege of CMS, I don't think that you should be able to increment by a negative number (since it destroys underlying guarantees of the data structure regarding no undercounts). So when somebody tries to INCRBY a negative number, we should probably refuse to apply the decrement and return an error value.

I think a configuration option for 32 bit/64 bit is a good idea as well. At the very least, we should document that CMS in RedisBloom can only contain values up to a certain value so folks are aware of the limitation (which is likely low enough that somebody will hit it eventually).

@pushshift
Copy link

I would default the build to 32 bits since that will probably cover 95% of use cases. For the other 5% where someone absolutely needs 64 bit size, I agree a configuration option would be a nice compromise. If it isn't a huge refactor to support 64 bit, I'd definitely put that on the feature list.

@ashtul
Copy link
Contributor

ashtul commented May 3, 2021

@nathan-contino The negative issue is addressed in PR #303. Thank you for it.

As for the size issue, bitwise operations can be used to make the stamp size configurable as @gkorland suggested. I might even be able to make the filter expand dynamically. The issue, in this case, is - expansion could freeze the server on large filters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants