New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8508] [SQL] Standalone filter push down #9943
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two related concerns, otherwise LGTM.
// When project push-down is supported. | ||
if ((options == PushDownOptions.PROJECT || options == PushDownOptions.BOTH) | ||
&& !fieldNames.isEmpty()) { | ||
// When project push-down is supported or field reordering is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. I wouldn't expect an IO to support field reordering unless it also supports push-down. Can you explain what is going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I was trying to fix by modifying that if
statement is a scenario when an IO only supports Filter push-down and predicate can be completely pushed-down to IO layer, all fields are selected, so there is no need to preserve the Calc
to perform project, but selected fields are not in the random order.
In the scenario described above an IO does need to reorder fields (either with a Select
or by not dropping a Calc
).
I agree that the current if
statement is incorrect and it should look more like what it used to, but with an additional check. Not quite sure how to check that the IORel
is not followed by a CalcRel
and that the fields are not selected in the order they are present in the original schema, but I'm working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to not drop the Calc
when fields are selected in a different order for now.
if (isProjectRenameOnlyProgram(program) | ||
&& tableFilter.getNotSupported().isEmpty() | ||
&& (beamSqlTable.supportsProjects() | ||
|| calc.getRowType().getFieldCount() == calcInputRowType.getFieldCount())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to verify the order as well as the count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to drop the Calc
when project push-down is not supported.
It might make sense to allow IOs communicate to the Rule that they support field reordering to decided whether a Calc should be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on checking the order, do not merge yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this check into a separate method to keep things readable.
Updated the check to compare a list of projected field names to the list passed to a Calc
as input.
ef60183
to
e70aa72
Compare
if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) { | ||
// And | ||
// 3. And IO supports project push-down OR all fields are projected by a Calc. | ||
if (isProjectRenameOnlyProgram(program, beamSqlTable.supportsProjects()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case when Calc
should not be dropped is when an IO does not support field reordering and fields are not projected in the same order they are listed in the IO Schema.
Did not get cough by tests, because TestTable providers uses Select, which does field reordering for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be resolved now, IOs can specify whether they support field reordering. When reordering in not supported and fields are projected in a different order (from Schema) - Calc should not be dropped.
1b23074
to
7586096
Compare
…ost model for BeamCalcRel.
…roject push-down is not supported
dd6cfbc
to
7586096
Compare
7586096
to
e79bcf8
Compare
...l/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java
Outdated
Show resolved
Hide resolved
|
dcfdb8a
to
987148a
Compare
@@ -138,12 +138,7 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { | |||
builder.withSelectedFields(fieldNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be:
builder = builder.withSelectedFields(fieldNames);
Will fix in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as it is. Builder normally stores the changes and just returns itself as a convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about this more, builder
isn't actually a builder.
@Override | ||
public BeamCostModel beamComputeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) { | ||
return super.beamComputeSelfCost(planner, mq) | ||
.multiplyBy((double) 1 / (getRowType().getFieldCount() + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine for now if tests pass, but this has a very outsized impact on the cost. I think the impact needs to be less than 1.0, such that two IOs with different row counts still have the correct relative cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BeamCalcRel#beamComputeSelfCost
method to favor programs with smaller predicates.Project
,Filter
,ProjectAndFilter
push-downs.Rel
s more than once.BeamPushDownIOSourceRel
.Other changes:
TestTableProviderWithFilterPushDown
toTestTableProviderWithFilterAndProjectPushDown
, since it tests when both are supported.TestTableProviderWithFilterPushDown
to test when only filter push-down is supported.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).R: @apilloud
cc: @amaliujia
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.