New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-6995] Beam basic aggregation rule only when not windowed #9703
[BEAM-6995] Beam basic aggregation rule only when not windowed #9703
Conversation
R: @apilloud |
b86e8a6
to
8d581ab
Compare
Run Direct Runner Nexmark Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -40,15 +50,21 @@ | |||
|
|||
public BeamBasicAggregationRule( | |||
Class<? extends Aggregate> aggregateClass, RelBuilderFactory relBuilderFactory) { | |||
super(operand(aggregateClass, operand(TableScan.class, any())), relBuilderFactory, null); | |||
super(operand(aggregateClass, operand(AbstractRelNode.class, any())), relBuilderFactory, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at examples of this in Calcite, I think RelNode
is preferable to AbstractRelNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, updated match condition to use RelNode instead.
RelNode newTableScan = tableScan.copy(tableScan.getTraitSet(), tableScan.getInputs()); | ||
if (relNode instanceof Project || relNode instanceof Calc || relNode instanceof Filter) { | ||
if (isWindowed(relNode) || hasWindowedParents(relNode)) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding a comment here that this case is expected to be handled by BeamAggregationRule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
fd5025b
to
7a56f35
Compare
Run Direct Runner Nexmark Tests |
7a56f35
to
60b06c5
Compare
Run Direct Runner Nexmark Tests |
can you do a |
60b06c5
to
8687d90
Compare
cc: @amaliujia |
8687d90
to
f569da9
Compare
@@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() throws Exception { | |||
} | |||
|
|||
@Test | |||
@Ignore("https://issues.apache.org/jira/browse/BEAM-8317") | |||
public void testSupportsAggregationWithFilterWithoutProjection() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I really want to propose is when we add new test cases with SQL queries, run the test for both dialects unless there is a query syntax mismatch.
Using which planner is controlled by https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L28.
I am looking for a way of @RunWith(Parameterized.class)
so it's easy to run tests for both dialect by an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a useful reference link with examples: https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/usage/SamplesOfUsageTest.java
And this one: https://github.com/junit-team/junit4/wiki/parameterized-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I agree this is a good idea, but we should hold off on doing this until #9737 is in. We will need to relocate common tests into another package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree though. At least a duplicate test can be created but run for ZetaSQL only and then we can have a migration. It could be a new test file associated with ZetaSQL dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding pipeline.getOptions().as(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl");
to the test to see if it will work using ZetaSqlPlanner
, but I get Class not found exception
.
I assume it is because ZetaSQL is not in the build file. After attempting to add a dependency there is the following error: Circular dependency
, probably because ZetaSQL depends on BeamSQL?
windowFn, | ||
windowFieldIndex); | ||
|
||
this.rowType = rowType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this rowType used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a JIRA issue: https://jira.apache.org/jira/browse/BEAM-7609, when running queries with "SELECT DISTINCT + JOIN", resulting field names are not assigned proper name.
Even though it does not solve this particular issue, preserving the rowType should not hurt (where previously it would just ignore it and set it to null).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RelNode type is already inferred from input nodes. Usually when you need to use it, you can use getRowType() function to get it than save it as a class member.
I see, in that case I will remove this constructor. Do you think adding deriveRowType();
to the original constructor makes sense or it would be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deriveRowType()
can be called when it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, reverted BeamAggregationRel back.
f569da9
to
09356af
Compare
…indowing is not used
09356af
to
eff9eb1
Compare
Run JavaBeamZetaSQL PreCommit |
LGTM |
Beam basic aggregation rule should not be applied on Calc, Project, and Filter when their parents/they utilize windowed functions.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.