fix: HUDI- Skip partition path extraction for non partitioned tables#17482
Open
PavithranRick wants to merge 2 commits intoapache:masterfrom
Open
fix: HUDI- Skip partition path extraction for non partitioned tables#17482PavithranRick wants to merge 2 commits intoapache:masterfrom
PavithranRick wants to merge 2 commits intoapache:masterfrom
Conversation
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
Current write-path logic performs partition path resolution and partition-switch checks for every record, even when the table is non-partitioned.
For large datasets (e.g., terabyte scale), this results in significant overhead because:
canWrite()check is invoked for every record throughHoodieRowCreateHandle→BulkInsertDataInternalWriterHelper.This PR optimizes the write path by completely bypassing partition path lookup and write-handle switching for non-partitioned tables.
Summary and Changelog
This PR introduces the following improvements:
HoodieTable/HoodieTableMetaClientto determine if the table is partitioned.canWrite(),BaseCreateHandleandHoodieRowCreateHandlefor unpartitioned tables.These changes significantly reduce CPU cost for bulk insert workloads on non-partitioned tables.
Impact
Risk Level
Low
Documentation Update
None.
This PR does not introduce new configs or change user-facing behavior.
Contributor's checklist