Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of Interval Addition Should Affect Final Output #11055

Open
vbarua opened this issue Jun 21, 2024 · 6 comments
Open

Order of Interval Addition Should Affect Final Output #11055

vbarua opened this issue Jun 21, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@vbarua
Copy link

vbarua commented Jun 21, 2024

Describe the bug

In various engines, the order in which intervals are added to dates can affect the final value. This is especially noticeable with leap years.

Datafusion appears to constant fold these intervals, which throws away the operation order.

> EXPLAIN SELECT
         DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,
         DATE '2019-02-28' + INTERVAL '1 DAY' + INTERVAL '1 YEAR' AS MAR;
+---------------+----------------------------------------------------------------------+
| plan_type     | plan                                                                 |
+---------------+----------------------------------------------------------------------+
| logical_plan  | Projection: Date32("2020-02-29") AS feb, Date32("2020-02-29") AS mar |
|               |   EmptyRelation                                                      |
| physical_plan | ProjectionExec: expr=[2020-02-29 as feb, 2020-02-29 as mar]          |
|               |   PlaceholderRowExec                                                 |
|               |                                                                      |
+---------------+----------------------------------------------------------------------+

To Reproduce

Testing via datafusion-cli

> SELECT
  DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,
  DATE '2019-02-28' + INTERVAL '1 DAY' + INTERVAL '1 YEAR' AS MAR;
+------------+------------+
| feb        | mar        |
+------------+------------+
| 2020-02-29 | 2020-02-29 |
+------------+------------+

Expected behavior

Due to leap year shenanigans, adding the year before the day results in a different date than adding the day before the year.

Trino emits

trino> SELECT
    ->   DATE '2019-02-28' + INTERVAL '1' YEAR + INTERVAL '1' DAY AS FEB,
    ->   DATE '2019-02-28' + INTERVAL '1' DAY + INTERVAL '1' YEAR AS MAR;
    FEB     |    MAR     
------------+------------
 2020-02-29 | 2020-03-01 
(1 row)

as does Postgres and Snowflake (based on their documentation for interval types where this example came from)

Additional context

No response

@vbarua vbarua added the bug Something isn't working label Jun 21, 2024
@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

🤔 maybe we could turn of constant folding for adding intervals

@Lordworms
Copy link
Contributor

take

@Lordworms
Copy link
Contributor

Lordworms commented Jun 23, 2024

This is caused by the adding ordering since the sqlparser would always generate a AST like
image
by only disabling constant folding for Interval would not help here since when we evaluate the PhysicalExpr, it would still do the calculation of the later two components first. I think a change in sqlparser would help? (like we generate the BinaryOP AST with first two components as left leaf and the last one as right leaf) not sure the best way to do it.

@alamb
Copy link
Contributor

alamb commented Jun 23, 2024

I agree with @Lordworms the root cause is related to the precidence rules in sqlparser which control if expressions like

  DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,

are parsed like this

  ((DATE '2019-02-28' + INTERVAL '1 YEAR') + INTERVAL '1 DAY' AS FEB),

Or like

  (DATE '2019-02-28' + (INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB)),

Users who want to control the order of expression evaluation can use ( and ) to explicitly control the evaluation order.

@Lordworms
Copy link
Contributor

I agree with @Lordworms the root cause is related to the precidence rules in sqlparser which control if expressions like

  DATE '2019-02-28' + INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB,

are parsed like this

  ((DATE '2019-02-28' + INTERVAL '1 YEAR') + INTERVAL '1 DAY' AS FEB),

Or like

  (DATE '2019-02-28' + (INTERVAL '1 YEAR' + INTERVAL '1 DAY' AS FEB)),

Users who want to control the order of expression evaluation can use ( and ) to explicitly control the evaluation order.

Thanks so much. Should I fix it in sqlparser?

@alamb
Copy link
Contributor

alamb commented Jun 23, 2024

Thanks so much. Should I fix it in sqlparser?

I think that is probably the best way -- though it is likely pretty tricky (likely related to operator precidence). I think the correct first step would be to do some research and see how such arithmetic is parsed by other databases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants