-
Notifications
You must be signed in to change notification settings - Fork 191
PARQUET-507: Reduce the runtime of rle-test #37
Conversation
|
if we randomly generate input data we can also print the corresponding input on failure. |
|
I'll see if I can make each iteration a function of a single randomly generated seed, and print that on failure, so in the event of a random failure it will be reproducible. Then we can trim the runtime down to a few hundred ms or less |
src/parquet/util/rle-test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since iters < niters = 500, this condition is always false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I didn't write this code. Will clean it up more per the comments (and reducing runtime further)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a performance-improving suggestion. ;)
|
@asandryh @julienledem I further shortened the runtime (whole test suite takes < 500ms for me locally) and using device entropy to generate PRNG seeds. On failure the seed is printed. Feel free to merge when build green |
src/parquet/util/rle-test.cc
Outdated
|
|
||
| // prng setup | ||
| std::random_device rd; | ||
| std::uniform_int_distribution<int> dist(1, std::numeric_limits<int>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small optimization: std::uniform_int_distribution<int> dist(1,20);
and change line 366 to int group_size = dist(gen);
|
lgtm |
|
Done, thanks |
|
+1 |
|
thank you! |
I twiddled this a bit to cut the runtime in half. I'd like to reduce it further but looking for feedback -- my preference would be to use system entropy (
std::random_device) to seed the PRNG and print the seed on failure. So we could run far fewer tests (e.g. only 50 or 100 or so) and occasionally run into flakiness or failure if we refactor and break something internally. Thoughts?