Skip to content

fix: HUDI- Skip partition path extraction for non partitioned tables#17482

Open
PavithranRick wants to merge 2 commits intoapache:masterfrom
PavithranRick:pavi-partition-canwrite
Open

fix: HUDI- Skip partition path extraction for non partitioned tables#17482
PavithranRick wants to merge 2 commits intoapache:masterfrom
PavithranRick:pavi-partition-canwrite

Conversation

@PavithranRick
Copy link
Contributor

Describe the issue this Pull Request addresses

Current write-path logic performs partition path resolution and partition-switch checks for every record, even when the table is non-partitioned.

For large datasets (e.g., terabyte scale), this results in significant overhead because:

  • Fetching the partition path requires materializing/deserializing the full binary record, which is expensive.
  • The canWrite() check is invoked for every record through HoodieRowCreateHandleBulkInsertDataInternalWriterHelper.
  • For non-partitioned tables, this logic is unnecessary because the write handle never needs to switch.

This PR optimizes the write path by completely bypassing partition path lookup and write-handle switching for non-partitioned tables.


Summary and Changelog

This PR introduces the following improvements:

  • Add early detection using HoodieTable / HoodieTableMetaClient to determine if the table is partitioned.
  • If the table is unpartitioned, skip:
    • partition path extraction,
    • partition switch checks in canWrite(),
    • any record-level partition materialization.
  • Allow records to be streamed directly without triggering heavy byte-array deserialization.
  • Prevent unnecessary overhead in BaseCreateHandle and HoodieRowCreateHandle for unpartitioned tables.

These changes significantly reduce CPU cost for bulk insert workloads on non-partitioned tables.


Impact

  • Performance: Major improvement for non-partitioned tables, especially for binary-encoded record formats.
  • Behavior: No functional change for partitioned tables.
  • API: No user-facing API changes; logic is internal.

Risk Level

Low

  • Optimization path is only taken when the table is explicitly detected as non-partitioned.
  • Behavior for partitioned tables remains unchanged.
  • Write-path tests mitigate regression risk.

Documentation Update

None.
This PR does not introduce new configs or change user-facing behavior.


Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@PavithranRick PavithranRick changed the title feature: HUDI- Skip partition path extraction for non partitioned tables bug: HUDI- Skip partition path extraction for non partitioned tables Dec 4, 2025
@PavithranRick PavithranRick changed the title bug: HUDI- Skip partition path extraction for non partitioned tables fix: HUDI- Skip partition path extraction for non partitioned tables Dec 4, 2025
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 4, 2025
@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 4, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants