[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848
[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848caicancai wants to merge 5 commits intoapache:mainfrom
Conversation
| * <a href="https://issues.apache.org/jira/browse/CALCITE/issues/CALCITE-6293"> | ||
| * [CALCITE-6293] Support OR condition in Arrow adapter</a> is fixed. */ | ||
| public static final boolean CALCITE_6293_FIXED = false; | ||
| public static final boolean CALCITE_6293_FIXED = true; |
There was a problem hiding this comment.
Maybe these can be completely removed?
| } | ||
| String plan = "PLAN=ArrowToEnumerableConverter\n" | ||
| + " ArrowProject(intField=[$0], stringField=[$1])\n" | ||
| + " ArrowFilter(condition=[SEARCH($0, Sarg[0, 1, 2])])\n" |
There was a problem hiding this comment.
Arrow/Gandiva does not support the SEARCH operator; I have fully expanded the SEARCH operator.
| List<List<String>> translateMatch(RexNode condition) { | ||
| // Expand SEARCH nodes and convert to CNF | ||
| final RexNode expanded = RexUtil.expandSearch(rexBuilder, null, condition); | ||
| final RexNode cnf = RexUtil.toCnf(rexBuilder, expanded); |
There was a problem hiding this comment.
This could be much larger than the original condition.
You should add some tests with deeper conditions (multiple nested levels of parens).
| return builder.build(); | ||
| } | ||
|
|
||
| private TreeNode parseSingleCondition(String condition) { |
There was a problem hiding this comment.
I don't really know the grammar for these conditions, so I cannot tell whether the spaces are where you expect them. Is this documented someplace?
What happens if you have a comparison with a string containing a space, for example?
There was a problem hiding this comment.
I've added instructions, but it doesn't seem to solve the empty string problem yet. I need to think about other solutions. 🤔
There was a problem hiding this comment.
I have now changed it to a structured token format instead of string parsing:
- unary: [fieldName, operator]
- binary: [fieldName, operator, value, type]
| /** Adds new predicates. | ||
| * | ||
| * @param predicates Predicates | ||
| * @param predicates Predicates in CNF form (outer list is AND, inner list is OR) |
There was a problem hiding this comment.
I see 3 lists, can you explain all of them?
| * e.g. {@code ["intField", "isnull"]}</li> | ||
| * </ul> | ||
| * | ||
| * <p>Using structured tokens avoids string splitting and safely supports |
There was a problem hiding this comment.
There is no need to justify this choice.
In general, you should use higher-level representations as much as possible.
| } else { | ||
| throw new UnsupportedOperationException("Unsupported disjunctive condition " + condition); | ||
| /** The maximum number of nodes allowed during CNF conversion. | ||
| * If exceeded, the original expression is returned unchanged. */ |
There was a problem hiding this comment.
what happens to the translation in that case? It fails?
| return builder.build(); | ||
| } | ||
|
|
||
| /** Parses a single condition into a Gandiva {@link TreeNode}. |
There was a problem hiding this comment.
It looks to me that it would be better to define a new class for this data structure UnaryOrBinaryCondition perhaps, instead of using a List.
|



https://issues.apache.org/jira/browse/CALCITE-6636