Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

Conversation

11moon11
Copy link
Contributor

@11moon11 11moon11 commented Oct 1, 2019

Beam basic aggregation rule should not be applied on Calc, Project, and Filter when their parents/they utilize windowed functions.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@11moon11
Copy link
Contributor Author

11moon11 commented Oct 1, 2019

R: @apilloud

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch 2 times, most recently from b86e8a6 to 8d581ab Compare October 1, 2019 19:41
@apilloud
Copy link
Member

apilloud commented Oct 1, 2019

Run Direct Runner Nexmark Tests

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -40,15 +50,21 @@

public BeamBasicAggregationRule(
Class<? extends Aggregate> aggregateClass, RelBuilderFactory relBuilderFactory) {
super(operand(aggregateClass, operand(TableScan.class, any())), relBuilderFactory, null);
super(operand(aggregateClass, operand(AbstractRelNode.class, any())), relBuilderFactory, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at examples of this in Calcite, I think RelNode is preferable to AbstractRelNode.

Copy link
Contributor Author

@11moon11 11moon11 Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, updated match condition to use RelNode instead.

RelNode newTableScan = tableScan.copy(tableScan.getTraitSet(), tableScan.getInputs());
if (relNode instanceof Project || relNode instanceof Calc || relNode instanceof Filter) {
if (isWindowed(relNode) || hasWindowedParents(relNode)) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth adding a comment here that this case is expected to be handled by BeamAggregationRule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch 3 times, most recently from fd5025b to 7a56f35 Compare October 1, 2019 21:16
@apilloud
Copy link
Member

apilloud commented Oct 1, 2019

Run Direct Runner Nexmark Tests

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 7a56f35 to 60b06c5 Compare October 2, 2019 17:56
@11moon11
Copy link
Contributor Author

11moon11 commented Oct 2, 2019

Run Direct Runner Nexmark Tests

@apilloud
Copy link
Member

apilloud commented Oct 3, 2019

can you do a git pull origin && git rebase origin/master on this?

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 60b06c5 to 8687d90 Compare October 3, 2019 22:33
@amaliujia
Copy link
Contributor

cc: @amaliujia

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 8687d90 to f569da9 Compare October 3, 2019 22:35
@@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() throws Exception {
}

@Test
@Ignore("https://issues.apache.org/jira/browse/BEAM-8317")
public void testSupportsAggregationWithFilterWithoutProjection() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@11moon11 @apilloud

What I really want to propose is when we add new test cases with SQL queries, run the test for both dialects unless there is a query syntax mismatch.

Using which planner is controlled by https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L28.

I am looking for a way of @RunWith(Parameterized.class) so it's easy to run tests for both dialect by an annotation.

Copy link
Contributor Author

@11moon11 11moon11 Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I agree this is a good idea, but we should hold off on doing this until #9737 is in. We will need to relocate common tests into another package.

Copy link
Contributor

@amaliujia amaliujia Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apilloud

I don't agree though. At least a duplicate test can be created but run for ZetaSQL only and then we can have a migration. It could be a new test file associated with ZetaSQL dialect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding pipeline.getOptions().as(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl"); to the test to see if it will work using ZetaSqlPlanner, but I get Class not found exception.
I assume it is because ZetaSQL is not in the build file. After attempting to add a dependency there is the following error: Circular dependency, probably because ZetaSQL depends on BeamSQL?

windowFn,
windowFieldIndex);

this.rowType = rowType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this rowType used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a JIRA issue: https://jira.apache.org/jira/browse/BEAM-7609, when running queries with "SELECT DISTINCT + JOIN", resulting field names are not assigned proper name.
Even though it does not solve this particular issue, preserving the rowType should not hurt (where previously it would just ignore it and set it to null).

Copy link
Contributor Author

@11moon11 11moon11 Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.

I see, in that case I will remove this constructor. Do you think adding deriveRowType(); to the original constructor makes sense or it would be redundant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think deriveRowType() can be called when it's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, reverted BeamAggregationRel back.

@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from f569da9 to 09356af Compare October 7, 2019 16:17
@11moon11 11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 09356af to eff9eb1 Compare October 7, 2019 16:18
@11moon11
Copy link
Contributor Author

11moon11 commented Oct 7, 2019

Run JavaBeamZetaSQL PreCommit

@apilloud
Copy link
Member

apilloud commented Oct 7, 2019

LGTM

@apilloud apilloud merged commit 5d83b15 into apache:master Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants