-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField #24599
Conversation
Test build #105381 has finished for PR 24599 at commit
|
@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.dsl.plans._ | |||
import org.apache.spark.sql.catalyst.expressions._ | |||
import org.apache.spark.sql.catalyst.plans.logical._ | |||
import org.apache.spark.sql.catalyst.rules.RuleExecutor | |||
import org.apache.spark.sql.types.{StringType, StructType} | |||
import org.apache.spark.sql.types.{StringType, StructField, StructType} | |||
|
|||
class NestedColumnAliasingSuite extends SchemaPruningTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There still are many usage of GetStructField
in this test suite. Maybe make a minor PR to rewrite them.
Retest this please. |
Test build #106314 has finished for PR 24599 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya . Since this is important improvement, could you add a benchmark case to NestedSchemaPruningBenchmark
? Also, please enumerate some newly support examples explicitly instead of more nested field cases
in the PR description (at least).
@dongjoon-hyun Thanks for looking into this. I will add the benchmark case. The PR title and description were updated. |
Test build #106347 has finished for PR 24599 at commit
|
...e/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
Show resolved
Hide resolved
Test build #106353 has finished for PR 24599 at commit
|
Test build #106366 has finished for PR 24599 at commit
|
...e/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
Test build #106383 has finished for PR 24599 at commit
|
Test build #106384 has finished for PR 24599 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the This PR (@viirya ) is irrelevant to that.master
branch, it seems that there is a regression only in Orc (v1). I verified that Parquet/OrcV2 are consistent in master
branch.
cc @gatorsmile
Hi, @viirya . I made a benchmark result PR to you. Could you review and merge that? |
@@ -51,7 +55,7 @@ abstract class NestedSchemaPruningBenchmark extends SqlBasedBenchmark { | |||
withTempPath { dir => | |||
val path = dir.getCanonicalPath | |||
|
|||
Seq(1, 2).foreach { i => | |||
Seq(1, 2, 3).foreach { i => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
EC2 result
Thanks @dongjoon-hyun! Merged the benchmark results now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you so much, @viirya .
The last commit is .txt
-file only updates about benchmark result.
Merged to master.
Test build #106398 has finished for PR 24599 at commit
|
…d cases including GetArrayStructField ## What changes were proposed in this pull request? `NestedColumnAliasing` rule covers `GetStructField` only, currently. It means that some nested field extraction expressions aren't pruned. For example, if only accessing a nested field in an array of struct (`GetArrayStructFields`), this column isn't pruned. This patch extends the rule to cover general nested field cases, including `GetArrayStructFields`. ## How was this patch tested? Added tests. Closes apache#24599 from viirya/nested-pruning-extract-value. Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
NestedColumnAliasing
rule coversGetStructField
only, currently. It means that some nested field extraction expressions aren't pruned. For example, if only accessing a nested field in an array of struct (GetArrayStructFields
), this column isn't pruned.This patch extends the rule to cover general nested field cases, including
GetArrayStructFields
.How was this patch tested?
Added tests.