Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate still has cast when comparing Timestamp(Nano, None) to a timestamp literal, so can't be pushed down or used for pruning #3938

Closed
alamb opened this issue Oct 24, 2022 · 1 comment · Fixed by #4039 or #4148
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Oct 24, 2022

Describe the bug
Comparing a Timestamp(Nanosecond, None) column to a timestamp literal is important for IOx and can be used to potentially prune significant amounts of data and pushed down to scans.

Specifically, to get the last hours of data you can use a predicate like:

col_ts_nano_none < (now() - interval '1 hour')";

Which should be evaluated to something like

test.col_ts_nano_none < TimestampNanosecond(1666612093000000000, Some(\"UTC\")

However, today DataFusion can't get rid of the cast:

CAST(test.col_ts_nano_none AS Timestamp(Nanosecond, Some(\"UTC\"))) < TimestampNanosecond(1666612093000000000, Some(\"UTC\")

To Reproduce
Add this test to the optimizer-integration test:

#[test]
fn timestamp_nano_ts_none_predicates() -> Result<()> {
    let sql = "SELECT col_int32
        FROM test
        WHERE col_ts_nano_none < (now() - interval '1 hour')";
    let plan = test_sql(sql)?;
    // a scan should have the now()... predicate folded to a single
    // constant and compared to the column without a cast so it can be
    // pushed down / pruned
    let expected = "Projection: test.col_int32\n  Filter: test.col_ts_nano_utc < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))\
                    \n    TableScan: test projection=[col_int32, col_ts_nano_none]";
    assert_eq!(expected, format!("{:?}", plan));
    Ok(())
}

It will fail like:

---- timestamp_nano_ts_none_predicates stdout ----
thread 'timestamp_nano_ts_none_predicates' panicked at 'assertion failed: `(left == right)`
  left: `"Projection: test.col_int32\n  Filter: test.col_ts_nano_utc < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))\n    TableScan: test projection=[col_int32, col_ts_nano_none]"`,
 right: `"Projection: test.col_int32\n  Filter: CAST(test.col_ts_nano_none AS Timestamp(Nanosecond, Some(\"UTC\"))) < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))\n    TableScan: test projection=[col_int32, col_ts_nano_none]"`', datafusion/optimizer/tests/integration-test.rs:242:5
stack backtrace:

Because the predicate still contains the cast

 CAST(test.col_ts_nano_none AS Timestamp(Nanosecond, Some(\"UTC\"))) < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))

Expected behavior
Test should pass (though the actual timestamp value may need to be adjusted)

Specifically, the filter should be something like:

test.col_ts_nano_utc < TimestampNanosecond(1666612093000000000, Some(\"UTC\"))\

Additional context
See related IOx issue here: https://github.com/influxdata/influxdb_iox/issues/5875

It is not clear to me if the better fix would be to provide Timestamp(Nanos, UTC) rather than Timestamp(Nanos, None) in this case. However, DataFusion could definitely do a better job with unwrapping the cast

@alamb
Copy link
Contributor Author

alamb commented Nov 8, 2022

It turns out I accidentally marked this as completed when it is still broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant