-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54454][SQL] Enable variant shredding and variant logical type annotation configs by default #53164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-54454][SQL] Enable variant shredding and variant logical type annotation configs by default #53164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for Apache Spark 4.2.0. Thank you, @harshmotw-db . Please make the CI happy.
| .version("4.1.0") | ||
| .booleanConf | ||
| .createWithDefault(false) | ||
| .createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least enable this config in 4.1, to conform to the Parquet spec. If many people start to use variant type with Spark 4.1, then the entire ecosystem may be forced to support the Spark-specific variant type in Parquet.
@harshmotw-db can we open a separate PR for this config? And also cc @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in order to re-evaluate the scope, please make working PRs which passes all the CIs successfully, @harshmotw-db and @cloud-fan .
It's because a release manager cannot make any decision (not only for branch-4.1, but also for master) on the broken proposal.
|
@dongjoon-hyun @cloud-fan CI should be happy now. Only one test failure was a real issue - |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshmotw-db . Please spin off all other code change from this PR because it's very misleading. Technically, the config change PR focuses on only the flag switch.
In short,
- Make this PR have only
SQLConf.scalachange. - Make a new PR containing all other changes like (Cast.scala, ResolveDefaultColumnsUtil.scala, *Suite.scala).
We had better proceed (2) first.
|
@dongjoon-hyun @cloud-fan This PR has the non-config changes. |
### What changes were proposed in this pull request? [This PR](#53164) enables shredding and variant logical type annotation configs by default. However, some test suites assume the old behavior. This PR fixes those tests to also work with the new default configs. This PR also fixes a bug we discovered in the previous PR where variant default resolution would fail when pushVariantIntoScan was enabled. ### Why are the changes needed? To fix the bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53224 from harshmotw-db/harshmotw-db/shredding_fixes. Lead-authored-by: Harsh Motwani <harsh.motwani@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? [This PR](#53164) enables shredding and variant logical type annotation configs by default. However, some test suites assume the old behavior. This PR fixes those tests to also work with the new default configs. This PR also fixes a bug we discovered in the previous PR where variant default resolution would fail when pushVariantIntoScan was enabled. ### Why are the changes needed? To fix the bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53224 from harshmotw-db/harshmotw-db/shredding_fixes. Lead-authored-by: Harsh Motwani <harsh.motwani@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d36bd62) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
@harshmotw-db let's update this PR to be config flipping only. |
|
+1 for @cloud-fan 's advice. |
1c8d539 to
23e6a1d
Compare
|
@dongjoon-hyun I had missed two changes in the other PR that I have put in this PR. Once that PR is merged, this PR can simply contain the flag flips. Currently this PR also contains those minor test changes. If you are okay with merging this PR with those changes, you can go ahead |
What changes were proposed in this pull request?
This PR enables the annotation of the variant parquet logical type and shredded writes and reads by default.
Why are the changes needed?
Does this PR introduce any user-facing change?
Yes, variant data written by Spark would be annotated with the variant logical type annotation and variant shredding would be enabled by default.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No