Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jul 19, 2024

Which issue does this PR close?

N/A

Rationale for this change

case_when: scalar or scalar
                        time:   [5.6794 µs 5.7119 µs 5.7566 µs]
                        change: [-70.724% -70.393% -70.042%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Add a fast path for a specific usage of CASE expression

Are these changes tested?

  • Existing tests
  • Added slt tests

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Jul 19, 2024
@andygrove andygrove added the performance Make DataFusion faster label Jul 19, 2024
@andygrove andygrove changed the title Optimize CASE expression for usage where then and else values are literals feat: Optimize CASE expression for usage where then and else values are literals Jul 19, 2024
@andygrove andygrove requested review from alamb and comphead July 19, 2024 17:43
.unwrap_or_else(|_| Arc::clone(e));
let else_ = Scalar::new(expr.evaluate(batch)?.into_array(1)?);

Ok(ColumnarValue::Array(zip(&when_value, &then_value, &else_)?))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the input is ColumnarValue::Scalar shouldn't the output also be a ColumnarValue::Scalar (rather than a ColumnarValue::Array?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the output will be an array containing values based on two scalar arguments.

SELECT CASE WHEN a > 2 THEN 'even' ELSE 'odd' END FROM foo
----
odd
even
even

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense (in a separate PR) to produce a dictionary array in this case since it will only even contain two distinct values? 🤔

4
6

# scalar or scalar (string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test where both arguments are scalars (like CASE WHEN 1 > 2 THEN 'true' ELSE 'false') ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


# scalar or scalar (string)
query T
SELECT CASE WHEN a > 2 THEN 'even' ELSE 'odd' END FROM foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like NULL handling in this specialized implementation is not tested, we can add a (NULL, NULL) row into foo table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added this.

@andygrove andygrove merged commit b6e55d7 into apache:main Jul 22, 2024
@andygrove andygrove deleted the case-scalar-sclar branch July 22, 2024 15:51
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 23, 2024
…re literals (apache#11553)

* Optimize CASE expression for usage where then and else values are literals

* add slt test

* add more test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants