Improve bloom filter test by huaxingao · Pull Request #5329 · apache/iceberg

huaxingao · 2022-07-21T18:41:16Z

I saw bloom filter test Assert.assertFalse("Should not read: ...", shouldRead) failed with false positive.

This PR is to make the bloom filter test less flaky by taking consideration of fpp.

szehon-ho · 2022-07-22T00:35:20Z

While this makes sense, I wonder if its even worth to assert that number of false positives is less than some number? Each test asserting there's no false negatives may be good enough?

I assume that there is still a failure chance, and it may not really be suitable for unit test to be run each time? Curious what other parquet or other projects do. Also wondering what others think.

huaxingao · 2022-07-22T15:56:27Z

I think besides asserting no false negative, we probably still need to have a negative test for each of the data types?

szehon-ho · 2022-07-23T06:35:21Z

Yea it doenst seem great to have a chance of random failures in build tests but could go either way on it. Also curious with your buffer what is the estimated chance of failure ? (like is it astronomically small )

Wonder any thoughts from @kbendick @rdblue @RussellSpitzer

rdblue · 2022-07-25T18:16:45Z

Rather than adding code for false positives, wouldn't it be better to make the test use a consistent random seed that doesn't have a false positive? You'd need to replace UUID.randomUUID with a UUID built from random longs instead, but I think that would be better.

huaxingao · 2022-07-25T22:23:58Z

@rdblue Thanks for the suggestion! Done.

rdblue · 2022-07-26T17:20:52Z

Thanks, @huaxingao!

huaxingao · 2022-07-26T17:25:58Z

Thank you very much! @rdblue @szehon-ho

Improve bloom filer test for false positive case

7b2b97f

github-actions bot added the parquet label Jul 21, 2022

fix test failure

da8b74c

huaxingao added 2 commits July 21, 2022 22:25

fix test failure

3dfbdc7

remove unused import

78bb7ff

address comments

4584245

huaxingao changed the title ~~Improve bloom filter test for false positive case~~ Improve bloom filter test Jul 26, 2022

rdblue approved these changes Jul 26, 2022

View reviewed changes

rdblue merged commit b8dc8c4 into apache:master Jul 26, 2022

huaxingao deleted the bf_test branch July 26, 2022 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve bloom filter test#5329

Improve bloom filter test#5329
rdblue merged 5 commits intoapache:masterfrom
huaxingao:bf_test

huaxingao commented Jul 21, 2022

Uh oh!

szehon-ho commented Jul 22, 2022

Uh oh!

huaxingao commented Jul 22, 2022

Uh oh!

szehon-ho commented Jul 23, 2022 •

edited

Loading

Uh oh!

rdblue commented Jul 25, 2022

Uh oh!

huaxingao commented Jul 25, 2022

Uh oh!

rdblue commented Jul 26, 2022

Uh oh!

huaxingao commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huaxingao commented Jul 21, 2022

Uh oh!

szehon-ho commented Jul 22, 2022

Uh oh!

huaxingao commented Jul 22, 2022

Uh oh!

szehon-ho commented Jul 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdblue commented Jul 25, 2022

Uh oh!

huaxingao commented Jul 25, 2022

Uh oh!

rdblue commented Jul 26, 2022

Uh oh!

huaxingao commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

szehon-ho commented Jul 23, 2022 •

edited

Loading