Spark 3.3: Re-Enable TwoLevel Parquet List UT by singhpk234 · Pull Request #5179 · apache/iceberg

singhpk234 · 2022-07-02T03:47:59Z

About the changes

Address #5094 (comment)
Spark was writing 3 level list rather than 2 level list which was expected in the UT.

On debugging this more found that, since the schema was passed via spark.read().schema(sparkSchema).json and as of spark 3.3 spark will not respect the nullability in the schema passed via above by default (ref. this).

Now since the nullability is not respected (will be considered nullable) by spark by default the Parquet writer despite writeLegacyParquetFormat being true, will write in Three level list. CodePointer

This pr adds the conf to respect the nullability provided presently and hence preserves the existing behaviour.

P.S : A good long term fix would be to get rid of this form of specifying schema from our test / test utils, can pick this in a follow-up.

Testing Done

Re-enabled the UT, which was ignored in version upgrade.

cc @rdblue

rdblue · 2022-07-03T20:10:47Z

Thanks, @singhpk234! Looks great.

Co-authored-by: Prashant Singh <psinghvk@amazon.com>

github-actions bot added the spark label Jul 2, 2022

singhpk234 changed the title ~~Spark 3.3: Re-Enable TwoLevel List in Parquet UT~~ Spark 3.3: Re-Enable TwoLevel Parquet List UT Jul 2, 2022

ReEnable TwoLevel parquet list UT

8fdb98f

singhpk234 force-pushed the fix/re-enable-two-level-list-ut branch from 27ed9cb to 8fdb98f Compare July 2, 2022 04:24

rdblue approved these changes Jul 3, 2022

View reviewed changes

rdblue merged commit 36d0b91 into apache:master Jul 3, 2022

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Spark 3.3: Re-enable 2-level Parquet list test (apache#5179)

6321c64

Co-authored-by: Prashant Singh <psinghvk@amazon.com>

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022

Spark 3.3: Re-enable 2-level Parquet list test (apache#5179)

5f4f58a

Co-authored-by: Prashant Singh <psinghvk@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.3: Re-Enable TwoLevel Parquet List UT#5179

Spark 3.3: Re-Enable TwoLevel Parquet List UT#5179
rdblue merged 1 commit intoapache:masterfrom
singhpk234:fix/re-enable-two-level-list-ut

singhpk234 commented Jul 2, 2022 •

edited

Loading

Uh oh!

rdblue commented Jul 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

singhpk234 commented Jul 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About the changes

Testing Done

Uh oh!

rdblue commented Jul 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

singhpk234 commented Jul 2, 2022 •

edited

Loading