-
Couldn't load subscription status.
- Fork 1.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Hello community,
I have been thinking about adding Postgres style JSON operators for nested data structures, mostly for Struct and List. These operators include:
->,->>,#>>: Access data field by index/key@>,<@,?,?|,?&: Containment testing||,-,#-: Data structure manipulation@?,@@: Predicate testing
Describe the solution you'd like
Just want to make sure I'm in the right direction.
- I assume we won't have built-in
jsontype in datafusion, so these operators will be implemented directly onStruct,Listand otherjson-like primitives directly, following postgres' semantics of them. I noticed we have VARIANT coming to arrow/datafusion, will we have a newDataTypeforVARIANT? If so, it will be good option for input and return type of these operators. - At the moment, we don't have support for operators on nested data structure and primitives. If the left input is nested, we will assume the right array is nested too, and perform compare operators recursively: I will need to change this behavior.
datafusion/datafusion/physical-expr/src/expressions/binary.rs
Lines 254 to 259 in 531af8e
if left_data_type.is_nested() { if !left_data_type.equals_datatype(&right_data_type) { return internal_err!("Cannot evaluate binary expression because of type mismatch: left {}, right {} ", left_data_type, right_data_type); } return apply_cmp_for_nested(self.op, &lhs, &rhs); } - Some of the operators may create dynamic results if the right input is Array. For example, if the right array of
->is["a", "b", "c"], it is expected to return 3 different data types in result set which breaks our type system. So for these>operators, I'm going to support scalar version only. - Also I expected less strict check in because these
let result_type = self.data_type(input_schema)?; >will create dynamic return types.
Some of the kernels are going to be implemented in arrow-rs first, and integrate into datafusion.
Let me know if these changes will make sense, and align with our previous plan if any. And I will start to send pull requests on both repos.
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request