Spark: support read of partition metadata column when table is over 1k #10547

dramaticlly · 2024-06-21T00:58:49Z

support of
SELECT *, _partition from iceberg.foo.bar when foo.bar table has over 1000 columns defined

…k columns

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkMetadataColumns.java

dramaticlly · 2024-06-28T18:55:17Z

Looks like PRB failed due to junit clean up of temp directory as mentioned in #10569

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

szehon-ho

Looks good to me, thanks @dramaticlly

szehon-ho · 2024-07-05T16:49:23Z

Merged, thanks @dramaticlly

pan3793 · 2024-07-10T04:24:55Z

... when foo.bar table has over 1000 columns defined

@dramaticlly can you clarify why it was not supported before? where does the restriction come from? I don't see the magic number 1000 in the codebase

dramaticlly · 2024-07-10T04:31:44Z

... when foo.bar table has over 1000 columns defined

@dramaticlly can you clarify why it was not supported before? where does the restriction come from? I don't see the magic number 1000 in the codebase

@pan3793 so it will only fix the scenario when selecting all fields together with partition metadata column on iceberg table with more than 1000 columns. The unit test shall reproduce the problem if fix is missing. As for the reasoning, the 1000 is coming from the default field id to be assigned for inner partition struct of iceberg table, more detailed analysis can be found in #9923

…ver 1k columns (apache#10547)

Spark: support read of partition metadata column when table is over 1…

84f9342

…k columns

github-actions bot added the spark label Jun 21, 2024

Maintain forward compatibility

a1243d1

dramaticlly closed this Jun 21, 2024

dramaticlly reopened this Jun 21, 2024

dramaticlly mentioned this pull request Jun 21, 2024

Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns #10020

Merged

szehon-ho reviewed Jun 27, 2024

View reviewed changes

Address review feedback

62e1f15

Refactor idsToReassign

a95769b

szehon-ho reviewed Jun 29, 2024

View reviewed changes

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Outdated Show resolved Hide resolved

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Show resolved Hide resolved

Leverage Sets.difference

d0dbe96

szehon-ho reviewed Jul 1, 2024

View reviewed changes

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Outdated Show resolved Hide resolved

Only deduplicate if _partition was explicitly requested

aa895cd

szehon-ho reviewed Jul 2, 2024

View reviewed changes

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Outdated Show resolved Hide resolved

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Outdated Show resolved Hide resolved

Only index _partition inner struct

452f297

szehon-ho reviewed Jul 2, 2024

View reviewed changes

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Outdated Show resolved Hide resolved

Combine partition field lookup

bb851a6

szehon-ho reviewed Jul 3, 2024

View reviewed changes

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java Show resolved Hide resolved

Style fix with optional.get

c33448e

szehon-ho approved these changes Jul 3, 2024

View reviewed changes

szehon-ho merged commit 6223708 into apache:main Jul 5, 2024
35 checks passed

dramaticlly deleted the partitionAndOver1kColumns branch July 5, 2024 17:07

dramaticlly mentioned this pull request Jul 5, 2024

Spark 3.3/3.4: support read of partition metadata column when table is over 1k #10641

Merged

jasonf20 pushed a commit to jasonf20/iceberg that referenced this pull request Aug 4, 2024

Spark 3.5: Support read of partition metadata column when table has o…

ee8f17d

…ver 1k columns (apache#10547)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: support read of partition metadata column when table is over 1k #10547

Spark: support read of partition metadata column when table is over 1k #10547

dramaticlly commented Jun 21, 2024 •

edited

Loading

dramaticlly commented Jun 28, 2024

szehon-ho left a comment

szehon-ho commented Jul 5, 2024

pan3793 commented Jul 10, 2024

dramaticlly commented Jul 10, 2024 •

edited

Loading

Spark: support read of partition metadata column when table is over 1k #10547

Spark: support read of partition metadata column when table is over 1k #10547

Conversation

dramaticlly commented Jun 21, 2024 • edited Loading

dramaticlly commented Jun 28, 2024

szehon-ho left a comment

Choose a reason for hiding this comment

szehon-ho commented Jul 5, 2024

pan3793 commented Jul 10, 2024

dramaticlly commented Jul 10, 2024 • edited Loading

dramaticlly commented Jun 21, 2024 •

edited

Loading

dramaticlly commented Jul 10, 2024 •

edited

Loading