-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add extra case_when benchmarks #18097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pepijnve -- I ran these benchmarks locally and it looks good
I also did a brief profile and it looks like it is measuring what we expect:

|
||
fn criterion_benchmark(c: &mut Criterion) { | ||
// create input data | ||
fn make_batch(row_count: usize, column_count: usize) -> RecordBatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth a comment saying that this column could is always 3 or more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a description and assertion
)); | ||
let mut columns: Vec<ArrayRef> = vec![c1, c2, c3]; | ||
for _ in 3..column_count { | ||
columns.push(Arc::new(Int32Array::from_value(0, row_count))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would be useful to use different values for the columns? Maybe since it is a benchmark (not correctness test) all zeros will be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values themselves don't really matter for the benchmark. Maybe it's safer to not use identical values just to be sure that the array never gets REE encoded.
); | ||
|
||
// No expression, when/then/else, column reference values | ||
c.bench_function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are nice labels:
Benchmarking case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 EN...: Warming up for 3.0000 s
|
||
// Many when/then branches where all are effectively reachable | ||
c.bench_function(format!("case_when {}x{}: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END", batch.num_rows(), batch.num_columns()).as_str(), |b| { | ||
let when_thens = (0..batch.num_rows() as i32).map(|i| (make_x_cmp_y(&c1, Operator::Eq, i), lit(i))).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a lot of when_thens!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentionally so. This is a torture test benchmark to really stress the code.
The first 'all reachable' one is really a worst case scenario test case. This is intended to be able to measure improvements in the processing that's being done in each loop iteration. Filtering, scattering, etc.
The second 'few reachable' one is intended to measure the short circuiting behaviour.
Thank you @pepijnve |
## Which issue does this PR close? - Followup to #18097 ## Rationale for this change The last benchmark was incorrectly essentially indentical to the second to last one. The actual predicate was using `=` instead of `<`. ## What changes are included in this PR? - Adjust the operator in the case predicates to `<` - Adds two additional benchmarks covering `case x when ...` ## Are these changes tested? Verified with debugger. ## Are there any user-facing changes? No
Which issue does this PR close?
None
Rationale for this change
More microbenchmarks make it easier to asses the performance impact of
CaseExpr
implementation changes.What changes are included in this PR?
Add microbenchmarks for
case
expressions that are a bit more representative for real world queries.Are these changes tested?
n/a
Are there any user-facing changes?
no