Skip to content

Improve bloom filter test#5329

Merged
rdblue merged 5 commits intoapache:masterfrom
huaxingao:bf_test
Jul 26, 2022
Merged

Improve bloom filter test#5329
rdblue merged 5 commits intoapache:masterfrom
huaxingao:bf_test

Conversation

@huaxingao
Copy link
Contributor

I saw bloom filter test Assert.assertFalse("Should not read: ...", shouldRead) failed with false positive.

This PR is to make the bloom filter test less flaky by taking consideration of fpp.

@szehon-ho
Copy link
Member

While this makes sense, I wonder if its even worth to assert that number of false positives is less than some number? Each test asserting there's no false negatives may be good enough?

I assume that there is still a failure chance, and it may not really be suitable for unit test to be run each time? Curious what other parquet or other projects do. Also wondering what others think.

@huaxingao
Copy link
Contributor Author

I think besides asserting no false negative, we probably still need to have a negative test for each of the data types?

@szehon-ho
Copy link
Member

szehon-ho commented Jul 23, 2022

Yea it doenst seem great to have a chance of random failures in build tests but could go either way on it. Also curious with your buffer what is the estimated chance of failure ? (like is it astronomically small )

Wonder any thoughts from @kbendick @rdblue @RussellSpitzer

@rdblue
Copy link
Contributor

rdblue commented Jul 25, 2022

Rather than adding code for false positives, wouldn't it be better to make the test use a consistent random seed that doesn't have a false positive? You'd need to replace UUID.randomUUID with a UUID built from random longs instead, but I think that would be better.

@huaxingao
Copy link
Contributor Author

@rdblue Thanks for the suggestion! Done.

@huaxingao huaxingao changed the title Improve bloom filter test for false positive case Improve bloom filter test Jul 26, 2022
@rdblue rdblue merged commit b8dc8c4 into apache:master Jul 26, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 26, 2022

Thanks, @huaxingao!

@huaxingao
Copy link
Contributor Author

Thank you very much! @rdblue @szehon-ho

@huaxingao huaxingao deleted the bf_test branch July 26, 2022 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants