Skip to content

[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848

Open
caicancai wants to merge 5 commits intoapache:mainfrom
caicancai:6636
Open

[CALCITE-6636] Support CNF condition of Arrow ArrowAdapter#4848
caicancai wants to merge 5 commits intoapache:mainfrom
caicancai:6636

Conversation

@caicancai
Copy link
Copy Markdown
Member

@caicancai caicancai commented Mar 26, 2026

@caicancai caicancai marked this pull request as draft March 26, 2026 14:35
@caicancai caicancai marked this pull request as ready for review March 26, 2026 14:41
* <a href="https://issues.apache.org/jira/browse/CALCITE/issues/CALCITE-6293">
* [CALCITE-6293] Support OR condition in Arrow adapter</a> is fixed. */
public static final boolean CALCITE_6293_FIXED = false;
public static final boolean CALCITE_6293_FIXED = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these can be completely removed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

}
String plan = "PLAN=ArrowToEnumerableConverter\n"
+ " ArrowProject(intField=[$0], stringField=[$1])\n"
+ " ArrowFilter(condition=[SEARCH($0, Sarg[0, 1, 2])])\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow supports search?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow/Gandiva does not support the SEARCH operator; I have fully expanded the SEARCH operator.

List<List<String>> translateMatch(RexNode condition) {
// Expand SEARCH nodes and convert to CNF
final RexNode expanded = RexUtil.expandSearch(rexBuilder, null, condition);
final RexNode cnf = RexUtil.toCnf(rexBuilder, expanded);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be much larger than the original condition.
You should add some tests with deeper conditions (multiple nested levels of parens).

return builder.build();
}

private TreeNode parseSingleCondition(String condition) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the grammar for these conditions, so I cannot tell whether the spaces are where you expect them. Is this documented someplace?

What happens if you have a comparison with a string containing a space, for example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added instructions, but it doesn't seem to solve the empty string problem yet. I need to think about other solutions. 🤔

Copy link
Copy Markdown
Member Author

@caicancai caicancai Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now changed it to a structured token format instead of string parsing:

  • unary: [fieldName, operator]
  • binary: [fieldName, operator, value, type]

/** Adds new predicates.
*
* @param predicates Predicates
* @param predicates Predicates in CNF form (outer list is AND, inner list is OR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 3 lists, can you explain all of them?

* e.g. {@code ["intField", "isnull"]}</li>
* </ul>
*
* <p>Using structured tokens avoids string splitting and safely supports
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to justify this choice.
In general, you should use higher-level representations as much as possible.

} else {
throw new UnsupportedOperationException("Unsupported disjunctive condition " + condition);
/** The maximum number of nodes allowed during CNF conversion.
* If exceeded, the original expression is returned unchanged. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the translation in that case? It fails?

return builder.build();
}

/** Parses a single condition into a Gandiva {@link TreeNode}.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that it would be better to define a new class for this data structure UnaryOrBinaryCondition perhaps, instead of using a List.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants