Spark: Add `__metadata_col` metadata in metadata columns when doing Schema Conversion by singhpk234 · Pull Request #5075 · apache/iceberg

singhpk234 · 2022-06-17T11:18:30Z

About the change :

Presently when doing schema conversion we were setting metadata for all the the columns, but we should add ____metadata_col in metadata column which can be used by spark to check if the column is metadata col and drop it if required like,
here :
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala#L221-L225

Note : MetadataAttribute extractor uses the above key in attribute meta data to find if the attribute is metadata attribute or not.

This PR includes :
(i) Fix for above
(ii) Fix an existing minor typo in TestSparkSchemaUtil

Testing Done

Added an UT for the same.

cc @rdblue @aokolnychyi @jackye1995 @RussellSpitzer

…o spark schema

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java

rdblue · 2022-06-28T02:07:51Z

Thanks, @singhpk234!

…pache#5075) Co-authored-by: Prashant Singh <psinghvk@amazon.com>

Add metadata in metadata columns when doing conversion from iceberg t…

c1c0c17

…o spark schema

github-actions bot added core spark labels Jun 17, 2022

singhpk234 mentioned this pull request Jun 17, 2022

Spark: Support Spark 3.3 #5056

Closed

rdblue reviewed Jun 24, 2022

View reviewed changes

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java Outdated Show resolved Hide resolved

address review feedback - round1

bb1ee1e

rdblue approved these changes Jun 28, 2022

View reviewed changes

rdblue merged commit 7d6bbc4 into apache:master Jun 28, 2022

singhpk234 mentioned this pull request Jun 28, 2022

Spark: Add Spark 3.2 copy on top of Spark 3.3 #5094

Merged

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Spark: Add __metadata_col for metadata columns when converting types (a…

ee89e31

…pache#5075) Co-authored-by: Prashant Singh <psinghvk@amazon.com>

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Spark: Add __metadata_col for metadata columns when converting types (a…

c882d11

…pache#5075) Co-authored-by: Prashant Singh <psinghvk@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Add `__metadata_col` metadata in metadata columns when doing Schema Conversion#5075

Spark: Add `__metadata_col` metadata in metadata columns when doing Schema Conversion#5075
rdblue merged 2 commits intoapache:masterfrom
singhpk234:fix/meta_column_attribute_metadata

singhpk234 commented Jun 17, 2022

Uh oh!

Uh oh!

rdblue commented Jun 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singhpk234 commented Jun 17, 2022

About the change :

Testing Done

Uh oh!

Uh oh!

rdblue commented Jun 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants