Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arithmetic operations on timestamps fail when now() is used as the LHS argument #6155

Closed
Tracked by #5753
kyle-mccarthy opened this issue Apr 28, 2023 · 5 comments · Fixed by #6196 or #6433
Closed
Tracked by #5753
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@kyle-mccarthy
Copy link
Contributor

kyle-mccarthy commented Apr 28, 2023

Describe the bug

When performing an arithmetic operation on timestamps, now() causes an error when it is used as the LHS argument. It works as expected when it is used at the RHS argument.

Internal("If RHS of the operation is an array, then LHS also must be")

To Reproduce

The following will return an error and then panic on unwrap.

let ctx = SessionContext::new();

let df = ctx
    .sql(
        r#"
WITH cte AS (
    SELECT now() AS dt 
)
SELECT now() - cte.dt FROM cte
"#,
    )
    .await
    .unwrap();

Expected behavior

The operation should be successfully evaluated.

Additional context

The following will run without issue. Note that now() is the RHS arg in this sample.

let ctx = SessionContext::new();

let df = ctx
    .sql(
        r#"
WITH cte AS (
    SELECT now() AS dt 
)
SELECT cte.dt - now() FROM cte
"#,
    )
    .await
    .unwrap();
@kyle-mccarthy kyle-mccarthy added the bug Something isn't working label Apr 28, 2023
@Weijun-H
Copy link
Contributor

Hi @kyle-mccarthy, I'm unable to reproduce the error you mentioned on my machine. Could you please provide me with more information about the error so that I can better understand and address the issue?

@alamb
Copy link
Contributor

alamb commented May 1, 2023

This query generates an error for me using datafusion-cli (using the latest master branch):

alamb@MacBook-Pro-8:~$ datafusion-cli
DataFusion CLI v23.0.0
❯ WITH cte AS (
    SELECT now() AS dt
)
SELECT now() - cte.dt FROM cte
;
Internal error: If RHS of the operation is an array, then LHS also must be. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

@alamb
Copy link
Contributor

alamb commented May 1, 2023

I think this just means the arithmetic kernels for timestamps need to handle different combinations of ColumnarValue::Scalar and ColumnarValue::Array

@alamb alamb added the good first issue Good for newcomers label May 1, 2023
@kyle-mccarthy
Copy link
Contributor Author

kyle-mccarthy commented May 2, 2023

@Weijun-H hey, thanks for looking into it. The code samples are really similiar, so I wanted to confirm that you ran the first one/repro and not the last one/additional context. For reference, the sample below is the one I am having an issue with.

let ctx = SessionContext::new();

let df = ctx
    .sql(
        r#"
WITH cte AS (
    SELECT now() AS dt 
)
SELECT now() - cte.dt FROM cte
"#,
    )
    .await
    .unwrap();

Also @alamb I think that you are right. I did take a look at solving this and hoped that it would be as simple as swapping the tuples values this match arm, but I think some amount of refactoring may be required.

Currently, all the temporal compute related functions that involve scalars (1 2 3 4 etc...) in the physical-expr crate expect an array as the LHS and a scalar for the RHS. AFAIK, this makes sense for every operation expect for when both the LHS and RHS data types are timestamps (since it doesn't make sense to try and subtract timestamps from intervals).

My initial inclination is to create an enum (TemporalValue) that has variants for the ArrayRef and ScalarValue and then update all the logic to accept the TemporalValue enum. Using an enum instead would require a decent amount of changes to the binary.rs and kernal_arrows.rs files in the expressions module.

Is there a better way to do this? It looks like most compute related thing expect that the LHS is the array and that the RHS is the scalar. I figure that this is because most math operations where order matter can be rewritten?

Footnotes

  1. https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary.rs#L1331-L1335

  2. https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs#L316-L319

  3. https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary/kernels_arrow.rs#L381-L384

  4. https://github.com/apache/arrow-datafusion/blob/6e8f91b41a6ec6a2680357b95b2489d87af33571/datafusion/physical-expr/src/expressions/binary.rs#L1348

@alamb
Copy link
Contributor

alamb commented May 2, 2023

My initial inclination is to create an enum (TemporalValue) that has variants for the ArrayRef and ScalarValue and then update all the logic to accept the TemporalValue enum. Using an enum instead would require a decent amount of changes to the binary.rs and kernal_arrows.rs files in the expressions module.

I wonder if we can use the existing ColumnarValue rather than a new enum? https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html ?

Is there a better way to do this? It looks like most compute related thing expect that the LHS is the array and that the RHS is the scalar. I figure that this is because most math operations where order matter can be rewritten?

One way might be to use the https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html#method.into_array function to make the columnar values into arrays (even though this is not super efficient to turn a single value into an array).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
3 participants