[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

11moon11 · 2019-10-01T18:41:26Z

Beam basic aggregation rule should not be applied on Calc, Project, and Filter when their parents/they utilize windowed functions.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza	Spark
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

11moon11 · 2019-10-01T18:43:11Z

R: @apilloud

apilloud · 2019-10-01T20:54:18Z

Run Direct Runner Nexmark Tests

apilloud

LGTM

apilloud · 2019-10-01T21:00:12Z

...sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java

@@ -40,15 +50,21 @@

  public BeamBasicAggregationRule(
      Class<? extends Aggregate> aggregateClass, RelBuilderFactory relBuilderFactory) {
-    super(operand(aggregateClass, operand(TableScan.class, any())), relBuilderFactory, null);
+    super(operand(aggregateClass, operand(AbstractRelNode.class, any())), relBuilderFactory, null);


Looking at examples of this in Calcite, I think RelNode is preferable to AbstractRelNode.

You are correct, updated match condition to use RelNode instead.

apilloud · 2019-10-01T21:01:03Z

...sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamBasicAggregationRule.java

-    RelNode newTableScan = tableScan.copy(tableScan.getTraitSet(), tableScan.getInputs());
+    if (relNode instanceof Project || relNode instanceof Calc || relNode instanceof Filter) {
+      if (isWindowed(relNode) || hasWindowedParents(relNode)) {
+        return;


Probably worth adding a comment here that this case is expected to be handled by BeamAggregationRule.

Added a comment.

apilloud · 2019-10-01T21:38:57Z

Run Direct Runner Nexmark Tests

11moon11 · 2019-10-02T20:29:15Z

Run Direct Runner Nexmark Tests

apilloud · 2019-10-03T22:15:33Z

can you do a git pull origin && git rebase origin/master on this?

amaliujia · 2019-10-03T22:34:19Z

cc: @amaliujia

amaliujia · 2019-10-03T22:55:02Z

...tensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java

@@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() throws Exception {
  }

  @Test
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-8317")
  public void testSupportsAggregationWithFilterWithoutProjection() throws Exception {


@11moon11 @apilloud

What I really want to propose is when we add new test cases with SQL queries, run the test for both dialects unless there is a query syntax mismatch.

Using which planner is controlled by https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L28.

I am looking for a way of @RunWith(Parameterized.class) so it's easy to run tests for both dialect by an annotation.

Found a useful reference link with examples: https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/usage/SamplesOfUsageTest.java
And this one: https://github.com/junit-team/junit4/wiki/parameterized-tests

Cool. I agree this is a good idea, but we should hold off on doing this until #9737 is in. We will need to relocate common tests into another package.

@apilloud

I don't agree though. At least a duplicate test can be created but run for ZetaSQL only and then we can have a migration. It could be a new test file associated with ZetaSQL dialect.

I tried adding pipeline.getOptions().as(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl"); to the test to see if it will work using ZetaSqlPlanner, but I get Class not found exception.
I assume it is because ZetaSQL is not in the build file. After attempting to add a dependency there is the following error: Circular dependency, probably because ZetaSQL depends on BeamSQL?

amaliujia · 2019-10-04T17:59:39Z

...nsions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java

+        windowFn,
+        windowFieldIndex);
+
+    this.rowType = rowType;


Where is this rowType used?

RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.

There is a JIRA issue: https://jira.apache.org/jira/browse/BEAM-7609, when running queries with "SELECT DISTINCT + JOIN", resulting field names are not assigned proper name.
Even though it does not solve this particular issue, preserving the rowType should not hurt (where previously it would just ignore it and set it to null).

RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.

I see, in that case I will remove this constructor. Do you think adding deriveRowType(); to the original constructor makes sense or it would be redundant?

I think deriveRowType() can be called when it's needed?

Agreed, reverted BeamAggregationRel back.

…indowing is not used

11moon11 · 2019-10-07T16:25:41Z

Run JavaBeamZetaSQL PreCommit

apilloud · 2019-10-07T21:40:31Z

LGTM

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch 2 times, most recently from b86e8a6 to 8d581ab Compare October 1, 2019 19:41

apilloud approved these changes Oct 1, 2019

View reviewed changes

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch 3 times, most recently from fd5025b to 7a56f35 Compare October 1, 2019 21:16

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 7a56f35 to 60b06c5 Compare October 2, 2019 17:56

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 60b06c5 to 8687d90 Compare October 3, 2019 22:33

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 8687d90 to f569da9 Compare October 3, 2019 22:35

amaliujia reviewed Oct 3, 2019

View reviewed changes

amaliujia reviewed Oct 4, 2019

View reviewed changes

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from f569da9 to 09356af Compare October 7, 2019 16:17

[BEAM-6995] Modify BeamBasicAggregationRule to only be applied when w…

eff9eb1

…indowing is not used

11moon11 force-pushed the BeamBasicAggregationRule_OnlyWhenNotWindowed branch from 09356af to eff9eb1 Compare October 7, 2019 16:18

apilloud merged commit 5d83b15 into apache:master Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

11moon11 commented Oct 1, 2019 •

edited

11moon11 commented Oct 1, 2019

apilloud commented Oct 1, 2019

apilloud left a comment

apilloud Oct 1, 2019

11moon11 Oct 1, 2019 •

edited

apilloud Oct 1, 2019

11moon11 Oct 1, 2019

apilloud commented Oct 1, 2019

11moon11 commented Oct 2, 2019

apilloud commented Oct 3, 2019

amaliujia commented Oct 3, 2019

amaliujia Oct 3, 2019

11moon11 Oct 3, 2019 •

edited

apilloud Oct 7, 2019

amaliujia Oct 7, 2019 •

edited

11moon11 Oct 8, 2019

amaliujia Oct 4, 2019

amaliujia Oct 4, 2019

11moon11 Oct 4, 2019

11moon11 Oct 4, 2019 •

edited

amaliujia Oct 4, 2019

11moon11 Oct 7, 2019

11moon11 commented Oct 7, 2019

apilloud commented Oct 7, 2019

[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

[BEAM-6995] Beam basic aggregation rule only when not windowed #9703

Conversation

11moon11 commented Oct 1, 2019 • edited

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

11moon11 commented Oct 1, 2019

apilloud commented Oct 1, 2019

apilloud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

11moon11 Oct 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apilloud commented Oct 1, 2019

11moon11 commented Oct 2, 2019

apilloud commented Oct 3, 2019

amaliujia commented Oct 3, 2019

Choose a reason for hiding this comment

11moon11 Oct 3, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaliujia Oct 7, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

11moon11 Oct 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

11moon11 commented Oct 7, 2019

apilloud commented Oct 7, 2019

11moon11 commented Oct 1, 2019 •

edited

11moon11 Oct 1, 2019 •

edited

11moon11 Oct 3, 2019 •

edited

amaliujia Oct 7, 2019 •

edited

11moon11 Oct 4, 2019 •

edited