New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE][TEST ONLY] Add once-policy rule check #22060
Conversation
Test build #94513 has finished for PR 22060 at commit
|
|
retest this please |
Test build #94540 has finished for PR 22060 at commit
|
It seems that the testing is finished. :) |
Gentle ping, @maryannxue . |
Ping, @maryannxue . |
@maropu Are you willing to take this over? |
Yea, I can next week (I'm now in Canada and I'm going back to Japan now...) |
Sorry for the late reply. The purpose of this is to find out the rules that violate the once-policy assumption and also tests that can reproduce the issues. I think we should eventually turn this check on after we've fixed all those rules and extend this check to optimizer too. |
@maryannxue ah, I see. Do you still keep working on this? |
retest this please |
@maropu I'll follow up on this. I started the test again and I'll keep track of "which rules violate the assumption" and "which tests can reproduce the violation" in this PR. |
Test build #96940 has finished for PR 22060 at commit
|
Test build #97002 has finished for PR 22060 at commit
|
Test build #97009 has finished for PR 22060 at commit
|
Test build #97023 has finished for PR 22060 at commit
|
Test build #97840 has started for PR 22060 at commit |
Test build #97875 has started for PR 22060 at commit |
Merged build finished. Test FAILed. |
…o compare attributes ## What changes were proposed in this pull request? When we compare attributes, in general, we should always refer to semantic equality, as the default `equal` method can return false when there are "cosmetic" differences between them, but still they are the same thing; at least we have to consider them so when analyzing/optimizing queries. The PR focuses on the usage and comparison of the `output` of a `LogicalPlan`, which is a `Seq[Attribute]` in `AliasViewChild`. In this case, using equality implicitly fails to check the semantic equality. This results in the operator failing to stabilize. ## How was this patch tested? running the tests with the patch provided by maryannxue in #22060 Closes #22713 from mgaido91/SPARK-25691. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
hey @maryannxue, where are we here? Let's close this if it's going to be inactive a couple of weeks. |
Thank you for reminding me, @HyukjinKwon! And thanks to @mgaido91's contribution, this has been fixed already. |
…o compare attributes ## What changes were proposed in this pull request? When we compare attributes, in general, we should always refer to semantic equality, as the default `equal` method can return false when there are "cosmetic" differences between them, but still they are the same thing; at least we have to consider them so when analyzing/optimizing queries. The PR focuses on the usage and comparison of the `output` of a `LogicalPlan`, which is a `Seq[Attribute]` in `AliasViewChild`. In this case, using equality implicitly fails to check the semantic equality. This results in the operator failing to stabilize. ## How was this patch tested? running the tests with the patch provided by maryannxue in apache#22060 Closes apache#22713 from mgaido91/SPARK-25691. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Rules like
HandleNullInputsForUDF
(https://issues.apache.org/jira/browse/SPARK-24891) do not stabilize (can apply new changes to a plan indefinitely) and can cause problems like SQL cache mismatching.Ideally, all rules whether in a once-policy batch or a fixed-point-policy batch should stabilize after the number of runs specified. Once-policy should be considered a performance improvement, a assumption that the rule can stabilize after just one run rather than an assumption that the rule won't be applied more than once. Those once-policy rules should be able to run fine with fixed-point policy rule as well.
Currently we already have a check for fixed-point and throws an exception if maximum number of runs is reached and the plan is still changing. Here, in this PR, a similar check is added for once-policy and throws an exception if the plan changes between the first run and the second run of a once-policy rule.
From this test result, we can find out which of the analysis rules break this check so we can fix them later.
How was this patch tested?
N/A