Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Optimize manifest evaluation for super wide tables #9147

Merged
merged 4 commits into from
Dec 25, 2023

Conversation

irshadcc
Copy link
Contributor

@irshadcc irshadcc commented Nov 24, 2023

During the snapshot commit process of MergingSnapshotProducer, Iceberg tries to merge the manifest files to increase the planning performance. During operations like overwrite, MergingSnapshotProducer finds the matching manifest files by deleteExpression and rewrites the manifests by filtering out the deleted manifest entries.

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L330

While finding the matching manifests by deleteExpression, we found that the Schema is created every time the expression needs to be bound. This has proven very expensive when there are large number of manifest files (~25,000 manifest files) for a super wide table (~35,000 columns).

https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/NamedReference.java#L41

For optimising the manifest evaluation, we can add a method called asSchema() in the StructType class to avoid creating the new Schema every time the filter needs to be bound.

@github-actions github-actions bot added the API label Nov 24, 2023
@irshadcc
Copy link
Contributor Author

Resolves issue #9118

@irshadcc
Copy link
Contributor Author

@rdblue I would really appreciate if you could take a look

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @irshadcc, this looks good to me. I've left two small comments, could you take a peek at those? Thanks for fixing this! 🙌

@irshadcc
Copy link
Contributor Author

irshadcc commented Dec 7, 2023

Thanks for raising this @irshadcc, this looks good to me. I've left two small comments, could you take a peek at those? Thanks for fixing this! 🙌

I've added the Javadoc and removed the empty line.

@irshadcc
Copy link
Contributor Author

irshadcc commented Dec 9, 2023

@Fokko Can we merge this PR ?

@irshadcc irshadcc requested a review from Fokko December 9, 2023 13:00
@irshadcc irshadcc force-pushed the core/optimise_manifest_evaluation branch from c10ff54 to 05badc4 Compare December 16, 2023 09:02
@irshadcc
Copy link
Contributor Author

kind ping @Fokko

@Fokko
Copy link
Contributor

Fokko commented Dec 24, 2023

Hey @irshadcc Sorry for the long wait here, and thanks for pinging me. Let's get this in 🚀

@Fokko Fokko merged commit 197b61e into apache:main Dec 25, 2023
42 checks passed
lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants