Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start validation #423

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bearrito
Copy link

@bearrito bearrito commented May 1, 2024

@lemire

  1. Starts work for Implement roaring_bitmap_internal_validate #400
  2. I read the original paper on Roaring as well as the one adding run containers, so I think I understand the very basics of the invariants to be checked. I'm sure there are other approaches.
  3. If the overall approach looks good, I'm planning on looking at more ways to corrupt the containers via code eg changing container sizes, breaking sort orders.
  4. I looked at the serialization methods, and have a rough understanding of how that works. Do you think it's worthwhile to try to corrupt a serialized byte stream and see if it's detected when de-serialized. I think that would be moderately difficult for me to do but might be a good test.

runcontainer.go Show resolved Hide resolved
bitmapcontainer.go Outdated Show resolved Hide resolved
runcontainer.go Outdated Show resolved Hide resolved
roaringarray.go Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented May 1, 2024

It looks like the tests fail. Can you have a look?

This is quite good overall. See my minor comments.

Very useful contribution,

@lemire
Copy link
Member

lemire commented May 1, 2024

I looked at the serialization methods, and have a rough understanding of how that works. Do you think it's worthwhile to try to corrupt a serialized byte stream and see if it's detected when de-serialized. I think that would be moderately difficult for me to do but might be a good test.

It could be interesting to try to deserialize garbage content and check that you do get errors.

@bearrito
Copy link
Author

bearrito commented May 2, 2024

Could you help me understand this invariant

If a run container has more than 4096 distinct values, then it must have no more than 2047 runs, otherwise the number
of runs must be less than half the number of distinct values

What is distinct values here?
Does this mean if the sum of the runlength for all intervals is greater than 4096 the len(iv) <= 2047?

@bearrito bearrito changed the title draft: Start validation Start validation May 2, 2024
@bearrito
Copy link
Author

@lemire Anything else on this? I'm going on holiday and am closing out my open pr's

arraycontainer.go Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented May 23, 2024

@bearrito If you recommend merging this, I will!!!

@bearrito
Copy link
Author

@lemire Can you comment on my question above on the run-length invariant. I think I can get one more set of tests of that if I understand better.

After that, I'll feel good merging.

@lemire
Copy link
Member

lemire commented May 24, 2024

@bearrito

a container has between 1 and 2^16 distinct values (in the set 0,1...,2^16-1)
if number of distinct values in the container >= 2048 then
    check that the number of runs is no more than 2047
    (otherwise you could use a bitset container)
else
    check that the number of runs < (number of distinct values) / 2
    (otherwise you could use an array container)

@lemire
Copy link
Member

lemire commented May 24, 2024

Counting the number of distinct values in a run container is basically just the sum of the lengths of the runs.... (remember that the runs are non-overlapping)

@bearrito
Copy link
Author

Added in distinct value sum/run cardinality logic.

I don't think I can add anymore test pressure on this.

@lemire
Copy link
Member

lemire commented May 25, 2024

Running tests. I expect to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants