Skip to content

[C++][Dataset] Filter expressions should not require exact type match #23357

@asfimport

Description

@asfimport

It's not trivial for users to be able to ensure that scalars are of identical type to the fields they relate to in Expressions. For one, FieldExpressions don't contain a type reference, so at the time when I construct field_ref("col1") > scalar(42), I don't know exactly what type col1 is to be able to ensure that scalar(42) matches. Even if it were available, I wouldn't be able to determine what type to make it if the expression were (field_ref("col1") + field_ref("col2")) > scalar(42).

We should allow CompareExpressions to cast the inputs as necessary. This should be among integer types and floating point types, and across integers and floats too. Likewise among date/timestamp types, and probably if comparing a string scalar against a date/timestamp column, the string should be parsed as a datetime. We also need to think about DictionaryTypes (though in practice this is moot until we have a comparison kernels that work on strings).

@fsaintjacques @bkietz

Reporter: Neal Richardson / @nealrichardson
Assignee: Ben Kietzman / @bkietz

PRs and other links:

Note: This issue was originally created as ARROW-7047. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions