Revert "[SPARK-41498] Propagate metadata through Union"#40371
Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
Closed
Revert "[SPARK-41498] Propagate metadata through Union"#40371cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
This reverts commit 827ca9b.
Contributor
Author
Contributor
Author
|
also cc @xinrong-meng , this should be included in 3.4.0 |
dongjoon-hyun
approved these changes
Mar 10, 2023
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. I agree with @cloud-fan .
Member
|
Merged to master/3.4. |
dongjoon-hyun
pushed a commit
that referenced
this pull request
Mar 10, 2023
This reverts commit 827ca9b. ### What changes were proposed in this pull request? After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source `_metadata` metadata column such as `row_index`, `block_start`, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze. To be future-proof, let's revert this support. ### Why are the changes needed? to make the analysis behavior more robust. ### Does this PR introduce _any_ user-facing change? Yes, but propagating metadata columns through Union is not released yet. ### How was this patch tested? N/A Closes #40371 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 164db5b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan
pushed a commit
to snmvaughan/spark
that referenced
this pull request
Jun 20, 2023
This reverts commit 827ca9b. ### What changes were proposed in this pull request? After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source `_metadata` metadata column such as `row_index`, `block_start`, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze. To be future-proof, let's revert this support. ### Why are the changes needed? to make the analysis behavior more robust. ### Does this PR introduce _any_ user-facing change? Yes, but propagating metadata columns through Union is not released yet. ### How was this patch tested? N/A Closes apache#40371 from cloud-fan/revert. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 164db5b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This reverts commit 827ca9b.
What changes were proposed in this pull request?
After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source
_metadatametadata column such asrow_index,block_start, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze.To be future-proof, let's revert this support.
Why are the changes needed?
to make the analysis behavior more robust.
Does this PR introduce any user-facing change?
Yes, but propagating metadata columns through Union is not released yet.
How was this patch tested?
N/A