Core: Optimize manifest evaluation for super wide tables #9147

irshadcc · 2023-11-24T13:25:42Z

During the snapshot commit process of MergingSnapshotProducer, Iceberg tries to merge the manifest files to increase the planning performance. During operations like overwrite, MergingSnapshotProducer finds the matching manifest files by deleteExpression and rewrites the manifests by filtering out the deleted manifest entries.

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L330

While finding the matching manifests by deleteExpression, we found that the Schema is created every time the expression needs to be bound. This has proven very expensive when there are large number of manifest files (~25,000 manifest files) for a super wide table (~35,000 columns).

https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/NamedReference.java#L41

For optimising the manifest evaluation, we can add a method called asSchema() in the StructType class to avoid creating the new Schema every time the filter needs to be bound.

irshadcc · 2023-11-24T13:29:14Z

Resolves issue #9118

irshadcc · 2023-11-30T10:44:13Z

@rdblue I would really appreciate if you could take a look

api/src/main/java/org/apache/iceberg/expressions/NamedReference.java

api/src/main/java/org/apache/iceberg/types/Types.java

Fokko

Thanks for raising this @irshadcc, this looks good to me. I've left two small comments, could you take a peek at those? Thanks for fixing this! 🙌

irshadcc · 2023-12-07T15:32:35Z

Thanks for raising this @irshadcc, this looks good to me. I've left two small comments, could you take a peek at those? Thanks for fixing this! 🙌

I've added the Javadoc and removed the empty line.

irshadcc · 2023-12-09T13:00:33Z

@Fokko Can we merge this PR ?

irshadcc · 2023-12-24T04:22:53Z

kind ping @Fokko

Fokko · 2023-12-24T14:07:33Z

Hey @irshadcc Sorry for the long wait here, and thanks for pinging me. Let's get this in 🚀

irshadcc added 2 commits November 24, 2023 07:59

Core: Optimise manifest evaluation

79bef36

Fix codestyles and import errors

25947b7

github-actions bot added the API label Nov 24, 2023

irshadcc mentioned this pull request Nov 24, 2023

Core : Optimise manifest evaluation for tables with large number of columns (30x faster) #9118

Closed

Fokko reviewed Dec 1, 2023

View reviewed changes

api/src/main/java/org/apache/iceberg/expressions/NamedReference.java Outdated Show resolved Hide resolved

Fokko reviewed Dec 1, 2023

View reviewed changes

api/src/main/java/org/apache/iceberg/types/Types.java Show resolved Hide resolved

Fokko approved these changes Dec 1, 2023

View reviewed changes

Remove space and add Java doc for toSchema()

829b073

irshadcc requested a review from Fokko December 9, 2023 13:00

Add java doc

05badc4

irshadcc force-pushed the core/optimise_manifest_evaluation branch from c10ff54 to 05badc4 Compare December 16, 2023 09:02

Fokko merged commit 197b61e into apache:main Dec 25, 2023
42 checks passed

lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024

Core: Optimize manifest evaluation for super wide tables (apache#9147)

f68c51c

geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024

Core: Optimize manifest evaluation for super wide tables (apache#9147)

87a6369

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Core: Optimize manifest evaluation for super wide tables (apache#9147)

cbfb365

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Optimize manifest evaluation for super wide tables #9147

Core: Optimize manifest evaluation for super wide tables #9147

irshadcc commented Nov 24, 2023 •

edited

irshadcc commented Nov 24, 2023

irshadcc commented Nov 30, 2023

Fokko left a comment

irshadcc commented Dec 7, 2023 •

edited

irshadcc commented Dec 9, 2023

irshadcc commented Dec 24, 2023

Fokko commented Dec 24, 2023

Core: Optimize manifest evaluation for super wide tables #9147

Core: Optimize manifest evaluation for super wide tables #9147

Conversation

irshadcc commented Nov 24, 2023 • edited

irshadcc commented Nov 24, 2023

irshadcc commented Nov 30, 2023

Fokko left a comment

Choose a reason for hiding this comment

irshadcc commented Dec 7, 2023 • edited

irshadcc commented Dec 9, 2023

irshadcc commented Dec 24, 2023

Fokko commented Dec 24, 2023

irshadcc commented Nov 24, 2023 •

edited

irshadcc commented Dec 7, 2023 •

edited