Skip to content

Revert "[SPARK-41498] Propagate metadata through Union"#40371

Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:revert
Closed

Revert "[SPARK-41498] Propagate metadata through Union"#40371
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:revert

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

This reverts commit 827ca9b.

What changes were proposed in this pull request?

After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source _metadata metadata column such as row_index, block_start, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze.

To be future-proof, let's revert this support.

Why are the changes needed?

to make the analysis behavior more robust.

Does this PR introduce any user-facing change?

Yes, but propagating metadata columns through Union is not released yet.

How was this patch tested?

N/A

@github-actions github-actions Bot added the SQL label Mar 10, 2023
@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @fred-db @bart-samwel

@cloud-fan
Copy link
Copy Markdown
Contributor Author

also cc @xinrong-meng , this should be included in 3.4.0

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I agree with @cloud-fan .

@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master/3.4.

dongjoon-hyun pushed a commit that referenced this pull request Mar 10, 2023
This reverts commit 827ca9b.

### What changes were proposed in this pull request?

After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source `_metadata` metadata column such as `row_index`, `block_start`, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze.

To be future-proof, let's revert this support.

### Why are the changes needed?

to make the analysis behavior more robust.

### Does this PR introduce _any_ user-facing change?

Yes, but propagating metadata columns through Union is not released yet.

### How was this patch tested?

N/A

Closes #40371 from cloud-fan/revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 164db5b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
This reverts commit 827ca9b.

### What changes were proposed in this pull request?

After more thinking, it's a bit fragile to propagate metadata columns through Union. We have added quite some new fields in the file source `_metadata` metadata column such as `row_index`, `block_start`, etc. Some are parquet only. The same thing may happen in other data sources as well. If one day one table under Union adds a new metadata column (or add a new field if the metadata column is a struct type), but other tables under Union do not have this new column, then Union can't propagate metadata columns and the query will suddenly fail to analyze.

To be future-proof, let's revert this support.

### Why are the changes needed?

to make the analysis behavior more robust.

### Does this PR introduce _any_ user-facing change?

Yes, but propagating metadata columns through Union is not released yet.

### How was this patch tested?

N/A

Closes apache#40371 from cloud-fan/revert.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 164db5b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants