Describe the bug
A custom table provider for a ParquetSource with a trivial catalog and an Int64 column yields some errors when a SQL query has a filter with a literal limit on that column of the form
assertion `left == right` failed: Simplified expression should have the same data type as the original
left: Null
right: Int64
The error does not occur when using datafusion-python 52; it also does not occur when running the query purely in a Rust SessionContext; the backtrace for the above error shows it coming from datafusion-ffi code as well.
This may of course not be a bug, but instead some bad practice that the version 52 set of crates tolerates but which is now invalid. A MRE of this custom table provider can be found in the public repo https://github.com/jwimberl/datafusion_python_53_int64filter_repro, which contains
- a non-working datafusion53 version (in branch main)
- a baseline working version (in branch datafusion52)
and a canned dummy dataset. The README.md of this repo has more details.
To Reproduce
In the main branch, build the py_repro_provider crate with maturin develop and run python repro.py. This loads the dummy dataset as a table dummy_table and runs two queries
SELECT * FROM dummy_table LIMIT 1, which is successful
SELECT * FROM dummy_table LIMIT 5, which panics
Itss output should be something like
Successful query:
a b
0 0 42
Unsuccesful query:
thread '<unnamed>' panicked at /home/jwimberley/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-53.1.0/src/simplifier/mod.rs:76:17:
assertion `left == right` failed: Simplified expression should have the same data type as the original
left: Null
right: Int64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
followed by backtrace information.
Expected behavior
In the datafusion52 branch, build thepy_repro_provider with maturin develop and run python repro.py. It runs the same two queries, and its output should be
Successful query:
a b
0 0 42
Also successful query:
a b
0 0 42
1 1 42
2 2 42
3 3 42
4 4 42
Additional context
In either the main branch or datafusion52 branch, the Rust code for the table provider is in the directory repro_provider, and there is a corresponding cargo test that runs SELECT * FROM dummy_table WHERE a < 5. Without the FFI layer, this is successful with both datafusion 52 and 53.
Describe the bug
A custom table provider for a ParquetSource with a trivial catalog and an Int64 column yields some errors when a SQL query has a filter with a literal limit on that column of the form
The error does not occur when using datafusion-python 52; it also does not occur when running the query purely in a Rust SessionContext; the backtrace for the above error shows it coming from datafusion-ffi code as well.
This may of course not be a bug, but instead some bad practice that the version 52 set of crates tolerates but which is now invalid. A MRE of this custom table provider can be found in the public repo https://github.com/jwimberl/datafusion_python_53_int64filter_repro, which contains
and a canned dummy dataset. The README.md of this repo has more details.
To Reproduce
In the
mainbranch, build thepy_repro_providercrate withmaturin developand runpython repro.py. This loads the dummy dataset as a tabledummy_tableand runs two queriesSELECT * FROM dummy_table LIMIT 1, which is successfulSELECT * FROM dummy_table LIMIT 5, which panicsItss output should be something like
followed by backtrace information.
Expected behavior
In the
datafusion52branch, build thepy_repro_providerwithmaturin developand runpython repro.py. It runs the same two queries, and its output should beAdditional context
In either the
mainbranch ordatafusion52branch, the Rust code for the table provider is in the directoryrepro_provider, and there is a corresponding cargo test that runsSELECT * FROM dummy_table WHERE a < 5. Without the FFI layer, this is successful with both datafusion 52 and 53.