New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8993] [SQL] MongoDB predicate push down. #10417
[BEAM-8993] [SQL] MongoDB predicate push down. #10417
Conversation
R: @TheNeuralBit |
Run sql postcommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. I have a few architectural suggestions, the only thing I'm really firm on is that you shouldn't do the partitioning in the constructor, and if MongoDbFilter
is its own class it should be package-private.
supported.add(node); | ||
} else { | ||
unsupported.add(node); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't do work in a constructor. I think this constructor should accept a supported and an unsupported List
and maybe be private. There could be a static initializer that will accept a single list and partition it into supported and unsupported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, created a static initializer.
Thanks for pointing this out!
ImmutableList.of( | ||
Pair.of("select * from TEST where unused1=100", true), | ||
Pair.of("select * from TEST where unused1 in (100, 200)", true), | ||
Pair.of("select * from TEST where b", true), | ||
Pair.of("select * from TEST where not b", true), | ||
// Nested conjunction and disjunction is supported as long as child operations are | ||
// supported. | ||
Pair.of( | ||
"select * from TEST where unused1>100 and unused1<=200 and id<>1 and (name='two' or id=2)", | ||
true), | ||
// RegEx matching push-down is not implemented at the moment. | ||
Pair.of("select * from TEST where name like 'o%e'", false), | ||
// Complex operations, which modify a field before a comparison are not supported. | ||
Pair.of("select * from TEST where unused1+10=110", false), | ||
// Since unused2 is of type `short`, it will be cast to int32, making this a complex | ||
// operation. | ||
Pair.of("select * from TEST where unused2=200", false), | ||
// Operations involving more than one column are not supported yet. | ||
Pair.of("select * from TEST where unused1=unused2 and id=2", false), | ||
Pair.of("select * from TEST where unused1+unused2=10", false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider making this a parameterized test instead, up to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, using parameterized tests here makes things a little cleaner and more straight forward.
import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlKind; | ||
import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName; | ||
|
||
public class MongoDbFilter implements BeamSqlTableFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider if you can make this an inner class of MongoDbTable
, since its only used in that class' constructFilter
method. If that's too big of a change, I think it should at least be package-private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved MongoDbFilter
.
MongoDbTable
feels a little convoluted, but not too bad, I guess.
What do you think?
@TheNeuralBit |
Run java presubmit |
Run sql postcommit |
It looks like a test failure is due to a flake:
|
Run sql postcommit |
Run JavaBeamZetaSQL PreCommit |
Run SQL PreCommit |
import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; | ||
|
||
import com.google.common.collect.ImmutableList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New precommit failure is indicating you should use vendored guava here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to vendored import. Thanks!
928f90a
to
576a06a
Compare
No tests running. |
test this please |
Run precommits |
Run SQL PreCommit |
BeamRelNode beamRelNode = sqlEnv.parseQuery(query); | ||
assertThat(beamRelNode, instanceOf(BeamCalcRel.class)); | ||
MongoDbFilter filter = | ||
MongoDbFilter.create(((BeamCalcRel) beamRelNode).getProgram().split().right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't necessarily need to be changed now, but I was envisioning that this would be tested through the public API, MongoDbTable#createFilter
, and that MongoDbFilter could be private.
Run SQL PreCommit |
MongoDbFilter
class, implementingBeamSqlTableFilter
.MongoDbTable#buildIOReader
RexNodes
.FindQuery
.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs