Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE][TEST ONLY] Add once-policy rule check #22060

Closed
wants to merge 4 commits into from

Conversation

maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

Rules like HandleNullInputsForUDF (https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can apply new changes to a plan indefinitely) and can cause problems like SQL cache mismatching.
Ideally, all rules whether in a once-policy batch or a fixed-point-policy batch should stabilize after the number of runs specified. Once-policy should be considered a performance improvement, a assumption that the rule can stabilize after just one run rather than an assumption that the rule won't be applied more than once. Those once-policy rules should be able to run fine with fixed-point policy rule as well.
Currently we already have a check for fixed-point and throws an exception if maximum number of runs is reached and the plan is still changing. Here, in this PR, a similar check is added for once-policy and throws an exception if the plan changes between the first run and the second run of a once-policy rule.

From this test result, we can find out which of the analysis rules break this check so we can fix them later.

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Aug 9, 2018

Test build #94513 has finished for PR 22060 at commit 3236568.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

AliasViewChild is the only rule? Can we whitelist it first? It sounds like many tests are skipped.

@maryannxue
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2018

Test build #94540 has finished for PR 22060 at commit 3236568.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

It seems that the testing is finished. :)
If you don't mind, shall we close this test PR, @maryannxue ?

@dongjoon-hyun
Copy link
Member

Gentle ping, @maryannxue .

@dongjoon-hyun
Copy link
Member

Ping, @maryannxue .

@gatorsmile
Copy link
Member

@maropu Are you willing to take this over?

@maropu
Copy link
Member

maropu commented Sep 28, 2018

Yea, I can next week (I'm now in Canada and I'm going back to Japan now...)

@maryannxue
Copy link
Contributor Author

Sorry for the late reply. The purpose of this is to find out the rules that violate the once-policy assumption and also tests that can reproduce the issues. I think we should eventually turn this check on after we've fixed all those rules and extend this check to optimizer too.

@maropu
Copy link
Member

maropu commented Oct 4, 2018

@maryannxue ah, I see. Do you still keep working on this?

@maryannxue
Copy link
Contributor Author

retest this please

@maryannxue
Copy link
Contributor Author

@maropu I'll follow up on this. I started the test again and I'll keep track of "which rules violate the assumption" and "which tests can reproduce the violation" in this PR.

@SparkQA
Copy link

SparkQA commented Oct 4, 2018

Test build #96940 has finished for PR 22060 at commit 3236568.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #97002 has finished for PR 22060 at commit 7986a43.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #97009 has finished for PR 22060 at commit d595a0c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2018

Test build #97023 has finished for PR 22060 at commit 7fc1d11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97840 has started for PR 22060 at commit 7fc1d11.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97875 has started for PR 22060 at commit 7fc1d11.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

asfgit pushed a commit that referenced this pull request Oct 31, 2018
…o compare attributes

## What changes were proposed in this pull request?

When we compare attributes, in general, we should always refer to semantic equality, as the default `equal` method can return false when there are "cosmetic" differences between them, but still they are the same thing; at least we have to consider them so when analyzing/optimizing queries.

The PR focuses on the usage and comparison of the `output` of a `LogicalPlan`, which is a `Seq[Attribute]` in `AliasViewChild`. In this case, using equality implicitly fails to check the semantic equality. This results in the operator failing to stabilize.

## How was this patch tested?

running the tests with the patch provided by maryannxue in #22060

Closes #22713 from mgaido91/SPARK-25691.

Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@HyukjinKwon
Copy link
Member

hey @maryannxue, where are we here? Let's close this if it's going to be inactive a couple of weeks.

@maryannxue
Copy link
Contributor Author

Thank you for reminding me, @HyukjinKwon! And thanks to @mgaido91's contribution, this has been fixed already.

@maryannxue maryannxue closed this Nov 11, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…o compare attributes

## What changes were proposed in this pull request?

When we compare attributes, in general, we should always refer to semantic equality, as the default `equal` method can return false when there are "cosmetic" differences between them, but still they are the same thing; at least we have to consider them so when analyzing/optimizing queries.

The PR focuses on the usage and comparison of the `output` of a `LogicalPlan`, which is a `Seq[Attribute]` in `AliasViewChild`. In this case, using equality implicitly fails to check the semantic equality. This results in the operator failing to stabilize.

## How was this patch tested?

running the tests with the patch provided by maryannxue in apache#22060

Closes apache#22713 from mgaido91/SPARK-25691.

Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants