Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8508] [SQL] Standalone filter push down #9943

Merged
merged 6 commits into from Nov 7, 2019

Conversation

11moon11
Copy link
Contributor

@11moon11 11moon11 commented Oct 30, 2019

  • Update the push-down rule to perform predicate push-down for IOs that do not support project push-down.
  • Update the BeamCalcRel#beamComputeSelfCost method to favor programs with smaller predicates.
  • Fixed a bug when selecting a field more than once to not drop the calc, since IOs cannot duplicate projected fields. Also added test cases to ensure expected behavior from Project, Filter, ProjectAndFilter push-downs.
  • PushDownRule should not be applied to IO Rels more than once.
  • Create a BeamPushDownIOSourceRel.

Other changes:

  • Rename TestTableProviderWithFilterPushDown to TestTableProviderWithFilterAndProjectPushDown, since it tests when both are supported.
  • Created a new file TestTableProviderWithFilterPushDown to test when only filter push-down is supported.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
    R: @apilloud
    cc: @amaliujia
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@11moon11 11moon11 changed the title [BEAM-8508] [SQL] [WIP] Standalone filter push down [BEAM-8508] [SQL] Standalone filter push down Oct 30, 2019
@apilloud apilloud self-requested a review October 31, 2019 17:38
Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two related concerns, otherwise LGTM.

// When project push-down is supported.
if ((options == PushDownOptions.PROJECT || options == PushDownOptions.BOTH)
&& !fieldNames.isEmpty()) {
// When project push-down is supported or field reordering is needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I wouldn't expect an IO to support field reordering unless it also supports push-down. Can you explain what is going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I was trying to fix by modifying that if statement is a scenario when an IO only supports Filter push-down and predicate can be completely pushed-down to IO layer, all fields are selected, so there is no need to preserve the Calc to perform project, but selected fields are not in the random order.
In the scenario described above an IO does need to reorder fields (either with a Select or by not dropping a Calc).

I agree that the current if statement is incorrect and it should look more like what it used to, but with an additional check. Not quite sure how to check that the IORel is not followed by a CalcRel and that the fields are not selected in the order they are present in the original schema, but I'm working on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to not drop the Calc when fields are selected in a different order for now.

if (isProjectRenameOnlyProgram(program)
&& tableFilter.getNotSupported().isEmpty()
&& (beamSqlTable.supportsProjects()
|| calc.getRowType().getFieldCount() == calcInputRowType.getFieldCount())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to verify the order as well as the count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to drop the Calc when project push-down is not supported.
It might make sense to allow IOs communicate to the Rule that they support field reordering to decided whether a Calc should be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on checking the order, do not merge yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check into a separate method to keep things readable.
Updated the check to compare a list of projected field names to the list passed to a Calc as input.

if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) {
// And
// 3. And IO supports project push-down OR all fields are projected by a Calc.
if (isProjectRenameOnlyProgram(program, beamSqlTable.supportsProjects())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case when Calc should not be dropped is when an IO does not support field reordering and fields are not projected in the same order they are listed in the IO Schema.
Did not get cough by tests, because TestTable providers uses Select, which does field reordering for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved now, IOs can specify whether they support field reordering. When reordering in not supported and fields are projected in a different order (from Schema) - Calc should not be dropped.

@11moon11
Copy link
Contributor Author

11moon11 commented Nov 6, 2019

PushDownRule should not be applied to the same BeamIOSourceRel more than once.

@@ -138,12 +138,7 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) {
builder.withSelectedFields(fieldNames);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be:
builder = builder.withSelectedFields(fieldNames);
Will fix in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as it is. Builder normally stores the changes and just returns itself as a convenience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about this more, builder isn't actually a builder.

@Override
public BeamCostModel beamComputeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) {
return super.beamComputeSelfCost(planner, mq)
.multiplyBy((double) 1 / (getRowType().getFieldCount() + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for now if tests pass, but this has a very outsized impact on the cost. I think the impact needs to be less than 1.0, such that two IOs with different row counts still have the correct relative cost.

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apilloud apilloud merged commit a48662d into apache:master Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants