-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27633][SQL] Remove redundant aliases in NestedColumnAliasing #24525
Conversation
Test build #105128 has finished for PR 24525 at commit
|
Test build #105129 has finished for PR 24525 at commit
|
Test build #105136 has finished for PR 24525 at commit
|
Retest this please. |
Test build #105267 has finished for PR 24525 at commit
|
ping @dongjoon-hyun |
@viirya . Actually, I've been investigating this PR multiple times from the first day. But I couldn't make me sure about this approach. One big thing is I'm interested in this PR and testing this. Could you test this more seriously once more, too? |
Thanks for review @dongjoon-hyun I understood your concerns. Thanks for test, also. Can you share few cases that you think might be problematic, if you find any? In case here, the intent is to see the child of a nested field accessor is already presented. Do you think it is more robust, if comparing them exactly equally, not semantically? |
val nestedRelation = LocalRelation('a.struct('b.struct('c.int, | ||
'd.struct('f.int, 'g.int), 'e.int))) | ||
|
||
val first = GetStructField('a, 0, Some("b")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use expression DSL to write tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems only Add
have DSL? Don't see DSL for GetStructField
. I replaced Add
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. We have getField
.
Test build #105622 has finished for PR 24525 at commit
|
Test build #105649 has finished for PR 24525 at commit
|
@cloud-fan @dongjoon-hyun Please let me know if you have more comments or thoughts on this? thanks. |
@viirya . I has been not sure about this PR due to the old reason, but I don't have a counter example you asked. Sorry about that. If the other committers think okay, I will not be against this PR. Thanks for working on nested column improvement, @viirya . cc @cloud-fan , @dbtsai , @gatorsmile |
Test build #108795 has finished for PR 24525 at commit
|
retest this please. |
@dongjoon-hyun I see. Thanks for the comment. Do you more agree on this if changing semanticEquals to exact comparing? Can it resolve your concern? |
It's been just my personal concern, @viirya . You may not need to revise your PR~ Let's wait and get some advice from the seniors. |
Test build #108845 has finished for PR 24525 at commit
|
cc @cloud-fan Do you have more thoughts on this change? |
case a @ GetArrayStructFields(child, _, _, _, _) => | ||
nestedFields.forall(f => f == a || child.find(_.semanticEquals(f)).isEmpty) | ||
case n @ GetStructField(child, _, _) => | ||
nestedFields.forall(f => f == n || child.find(_.semanticEquals(f)).isEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge the two entry into one? e.g.,
case e @ (_: GetStructField | _: GetArrayStructFields) =>
nestedFields.forall(f => f == e || e.children.head.find(_.semanticEquals(f)).isEmpty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it seems the added test below passes without f == n
? If so, can you add a test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Thanks. Will address two points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think f == a condition is not necessary now. It should be left here by previous commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check! Ur, one more comment: can we avoid the loop of length(nestedFields)^2
? in: https://github.com/apache/spark/pull/24525/files#diff-43334bab9616cc53e8797b9afa9fc7aaR129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for reply late.
I thought about this. We can check for a nested field accessor in other accessors which have less ExtractValue. For example, we don't need to check if a.b
is redundant by looking at a.b.c
, a.b.c.d
, etc.
However to do this, we first need to collect the info about number of ExtractValue. Is it worth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I see. Yea, I think its ok as it is. thanks for the check.
Test build #109372 has finished for PR 24525 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
retest this please |
Test build #119144 has finished for PR 24525 at commit
|
retest this please |
@HyukjinKwon Thanks for reopening this. I think I need to sync it up with latest change. |
Test build #119199 has finished for PR 24525 at commit
|
Test build #119206 has finished for PR 24525 at commit
|
retest this please |
1 similar comment
retest this please |
Test build #119213 has finished for PR 24525 at commit
|
ping @cloud-fan |
@viirya, can you rebase this please? I merged a couple of your PRs and seems it caused the conflicts. |
@HyukjinKwon rebased, thanks. |
Test build #123950 has finished for PR 24525 at commit
|
retest this please |
Test build #123974 has finished for PR 24525 at commit
|
...alyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
Outdated
Show resolved
Hide resolved
Test build #124009 has finished for PR 24525 at commit
|
Merged to master. |
What changes were proposed in this pull request?
In NestedColumnAliasing rule, we create aliases for nested field access in project list. We considered that top level parent field and nested fields under it were both accessed. In the case, we don't create the aliases because they are redundant.
There is another case, where a nested parent field and nested fields under it were both accessed, which we don't consider now. We don't need to create aliases in this case too.
How was this patch tested?
Added test.