fix: HUDI- Skip partition path extraction for non partitioned tables by PavithranRick · Pull Request #17482 · apache/hudi

PavithranRick · 2025-12-04T04:18:39Z

Describe the issue this Pull Request addresses

Current write-path logic performs partition path resolution and partition-switch checks for every record, even when the table is non-partitioned.

For large datasets (e.g., terabyte scale), this results in significant overhead because:

Fetching the partition path requires materializing/deserializing the full binary record, which is expensive.
The canWrite() check is invoked for every record through HoodieRowCreateHandle → BulkInsertDataInternalWriterHelper.
For non-partitioned tables, this logic is unnecessary because the write handle never needs to switch.

This PR optimizes the write path by completely bypassing partition path lookup and write-handle switching for non-partitioned tables.

Summary and Changelog

This PR introduces the following improvements:

Add early detection using HoodieTable / HoodieTableMetaClient to determine if the table is partitioned.
If the table is unpartitioned, skip:
- partition path extraction,
- partition switch checks in canWrite(),
- any record-level partition materialization.
Allow records to be streamed directly without triggering heavy byte-array deserialization.
Prevent unnecessary overhead in BaseCreateHandle and HoodieRowCreateHandle for unpartitioned tables.

These changes significantly reduce CPU cost for bulk insert workloads on non-partitioned tables.

Impact

Performance: Major improvement for non-partitioned tables, especially for binary-encoded record formats.
Behavior: No functional change for partitioned tables.
API: No user-facing API changes; logic is internal.

Risk Level

Low

Optimization path is only taken when the table is explicitly detected as non-partitioned.
Behavior for partitioned tables remains unchanged.
Write-path tests mitigate regression risk.

Documentation Update

None.
This PR does not introduce new configs or change user-facing behavior.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

hudi-bot · 2025-12-04T06:35:56Z

CI report:

3b0a3ca Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

HUDI-9765 - Skip partition path extraction for non partitioned tables

877702e

PavithranRick changed the title ~~feature: HUDI- Skip partition path extraction for non partitioned tables~~ bug: HUDI- Skip partition path extraction for non partitioned tables Dec 4, 2025

PavithranRick changed the title ~~bug: HUDI- Skip partition path extraction for non partitioned tables~~ fix: HUDI- Skip partition path extraction for non partitioned tables Dec 4, 2025

HUDI-9765 - Skip partition path extraction for non partitioned tables

3b0a3ca

github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: HUDI- Skip partition path extraction for non partitioned tables#17482

fix: HUDI- Skip partition path extraction for non partitioned tables#17482
PavithranRick wants to merge 2 commits intoapache:masterfrom
PavithranRick:pavi-partition-canwrite

PavithranRick commented Dec 4, 2025

Uh oh!

hudi-bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PavithranRick commented Dec 4, 2025

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-bot commented Dec 4, 2025

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants