New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flink 16024][Connector][JDBC] Support FilterPushdown #20140
Conversation
* | ||
* @return {@link JdbcFilterPushdownVisitor} | ||
*/ | ||
default JdbcFilterPushdownVisitor getFilterPushdownVisitor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be dialect specific so that it can be specialized to specific SQL syntax.
I noticed we removed all default implementation from this interface, should I do the same for this method?
|
||
/** A {@link DynamicTableSource} for JDBC. */ | ||
@Internal | ||
public class JdbcDynamicTableSource | ||
implements ScanTableSource, | ||
LookupTableSource, | ||
SupportsProjectionPushDown, | ||
SupportsLimitPushDown { | ||
SupportsLimitPushDown, | ||
SupportsFilterPushDown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support Filter Pushdown in JDBC source
* then be handled in Flink runtime. | ||
*/ | ||
@Override | ||
public Result applyFilters(List<ResolvedExpression> filters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core implementation of this change:
We traverse the ResolvedExpression, and produce a String if we know how to push it to SQL Database, returning None if we cannot handle it.
The unhandled expression will be kept as Flink SQL and run in the job
@qingwei91 Thanks for the PR. Can you make sure that your PR title and especially your commit messages are confirming to the Code Contribution guide? https://flink.apache.org/contributing/contribute-code.html For example, your commit message should start with |
3672733
to
16a7978
Compare
@flinkbot run azure |
1 similar comment
@flinkbot run azure |
8ed32ca
to
6c97da0
Compare
@flinkbot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @qingwei91 for driving this PR.
@@ -53,7 +52,6 @@ | |||
* Visitor to walk a Expression AST. Produces a String that can be used to pushdown the filter, | |||
* return Optional.empty() if we cannot pushdown the filter. | |||
*/ | |||
@Public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing public interface is not allowed. Please check FLIP-196: Source API stability guarantees for more information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JingGe, thanks for the comment.
This is a file I newly added, and in my latest commit, I marked it as PublicEvolving: 6c97da0
Do you mean we cannot expose new API?
For instance, I added this here: https://github.com/apache/flink/pull/20140/files#diff-ae60653ffe2ac890a3c1b01da41405bcc4e6913949176c36edc009df5090c38fR157, which adds a new method and a new return type to JdbcDialect
which is PublicEvolving, because filter pushdown might differ across JDBC dialect. Is this approach a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JingGe did you get a chance to check on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we cannot expose new API?
You can definitely add a new API to the master
branch, but the reason why it's problematic here is that you've changed the status of the API in your different commits. That means that when a reviewer is checking individual commits, it shows that you've removed @Public
. In a later commit, you've again added @PublicEvolving
. It's better to squash your commits in this case to avoid this type of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @qingwei91 Could you please explain why a public API is required? Does @internal work too? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JingGe , I believe it has to be PublicEvolving because this class is returned by a public method in JdbcDialect, which is a PublicEvolving class. Here's an example where its failing if I mark it as Internal: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=38493&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461
I will sort out the commit, sorry for the confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've squash the commits now and we have a green build.
Can anyone of you review this again? Thank you!
57590d6
to
d08882b
Compare
@qingwei91 I can't review this since this is not my expertise. I would like to ask @hailin0 to help with a review due to #20304 - Else I would like to ask @leonardBang if he can help out |
@qingwei91 Thanks for your contribution. For the current design, I see you use
Another more general way to handle this is to use |
Hi @libenchao , thanks for the review! Thanks for pointing out the flaw, you're right. 🙇 On your recommended approach, I believe I think you're pointing out a fundamental issue with this PR, SQL statement generation has to be dialect-specific, and me trying to provide a default implementation might be a lost cause here. If we cannot go down the prepared statement route, I can think of 2 ideas:
I think option 1 is less code, but probably more fiddly and easier to break. Option 2 is likely gonna be more code, but the separation is cleaner and less likely to break. Let me know your thoughts. 😄 |
@qingwei91 Currently
|
Hi @libenchao, oh I see, I didnt spot that Thanks for pointing it out, this will be larger change than I expect, I will give it a go.
I think we can tackle this incrementally? I believe IN is supported out of the box, because Flink compile IN into multiple X=Y condition chained together by OR, I never looked into BETWEEN though
I think ultimate we need to allow dialect specificness, right now I design it such that the query generator (ie. JdbcFilterPushdownVisitor) is part of JDBCDialect, so each dialect can provide their own instance to deal with it. Do you think this design is okay? Or is there a better way? |
Not exactly, we have a threshold (default 4): Line 47 in 208f08b
I agree, we can start from some common functions, such as |
Wow, Thanks, It works for me~ |
cf84956
to
f5516a5
Compare
Implement Filter Pushdown for JDBC connector source using expression visitor pattern.
db4b708
to
cd71a47
Compare
@flinkbot run azure |
Hi @libenchao , thank you very much for your review 👍 I've addressed all of your concern. On the TablePlanTest, do you mind to check if that's how it supposed to work? I don't think I understand internal good enough to judge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small tip, do not rebase/force-pushing before the reviewer asks you because this will make the reviewer hard to do incremental review.
</Resource> | ||
<Resource name="optimized exec plan"> | ||
<![CDATA[ | ||
Calc(select=[id, time_col, real_col], where=[((time_col <> '11:11:11.000111') OR (double_col >= -1000.23))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the condition still exists in the Calc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure out, its because the column type Time wasn't supported, I've added support here: 74fe5ee
Thanks for calling it out.
Sorry, my bad, I was advised to squashed in prev PR, but of course that should only be done right before merging. |
@flinkbot run azure |
LogicalType tpe = litExp.getOutputDataType().getLogicalType(); | ||
Class<?> typeCs = tpe.getClass(); | ||
|
||
if (SUPPORTED_DATA_TYPES.contains(typeCs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change this to something like:
switch(tpe.getTypeRoot()) {
case INTEGER: ...
case VARCHAR: ...
...
default:
return Optional.empty();
}
d9b1457
to
dd54aed
Compare
@flinkbot run azure |
dd54aed
to
55d5227
Compare
@flinkbot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qingwei91 Thanks for the update. Please do not use 'squash' or 'force-push' unless you must or the reviewer asks you. (I go through all the codes again, and left several minor comment)
|
||
@Override | ||
public Optional<ParameterizedPredicate> visit(FieldReferenceExpression fieldReference) { | ||
String predicateStr = (this.quoteIdentifierFunction.apply(fieldReference.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless (
)
...n/java/org/apache/flink/connector/jdbc/table/JdbcFilterPushdownPreparedStatementVisitor.java
Show resolved
Hide resolved
public class JdbcFilterPushdownPreparedStatementVisitor | ||
extends ExpressionDefaultVisitor<Optional<ParameterizedPredicate>> { | ||
|
||
private Function<String, String> quoteIdentifierFunction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
Sorry, in the last commit, I thought to bundle all changes I've made since your last review into 55d5227, so that you can review just that without noise. I wont do it again. I will go through the comments and fix them |
Add test for IS NULL and IS NOT NULL
Hi @libenchao , this is the new commit I added to address your comment. I also added support IS NULL and IS NOT NULL as these 2 are quite common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added support IS NULL and IS NOT NULL as these 2 are quite common
This is great!
The PR LGTM with only one minor comment, after fixing that, I'll merge this.
...n/java/org/apache/flink/connector/jdbc/table/JdbcFilterPushdownPreparedStatementVisitor.java
Outdated
Show resolved
Hide resolved
…/flink/connector/jdbc/table/JdbcFilterPushdownPreparedStatementVisitor.java Prefer primitive Co-authored-by: Benchao Li <libenchao@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merging. @qingwei91 Thanks for your contribution and the patience and consistence during the review.
} | ||
|
||
@Override | ||
public Optional<ParameterizedPredicate> visit(CallExpression call) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qingwei91 - Is there any reason why LIKE was not added here? It seems to be as easy as other binary operators below.
I will gladly add LIKE but I just want to ensure I didn't overlook something tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grzegorz8 , I think we simply missed it, thanks for noticing this.
What is the purpose of the change
Implement Filter Pushdown in JDBC Connector Source, this is an optimization that will avoid scanning the whole table into Flink when we only need a subset of the table.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes)Documentation