Skip to content

[HUDI-8371] Fix Column Stats Record Key using full partition path#18315

Open
vamsikarnika wants to merge 2 commits intoapache:release-0.14.2-prepfrom
vamsikarnika:fix_col_stats_corruption
Open

[HUDI-8371] Fix Column Stats Record Key using full partition path#18315
vamsikarnika wants to merge 2 commits intoapache:release-0.14.2-prepfrom
vamsikarnika:fix_col_stats_corruption

Conversation

@vamsikarnika
Copy link
Collaborator

@vamsikarnika vamsikarnika commented Mar 12, 2026

Describe the issue this Pull Request addresses

When column stats index is enabled on a table that already has the FILES metadata partition initialized listAllPartitionsFromMDT is used to bootstrap the column stats partition. The method was passing the absolute partition path (e.g., hdfs://host/table/partition1) as the first argument to DirectoryInfo instead of the relative path (e.g., partition1). This caused the column stats index to be keyed on wrong paths, resulting in empty or incorrect column stats lookups during data skipping.

Summary and Changelog

Fix: In HoodieBackedTableMetadataWriter.listAllPartitionsFromMDT, compute the relative partition path using FSUtils.getRelativePartitionPath(basePath, absolutePath) before constructing each DirectoryInfo, instead of passing the absolute map key directly.

Changes:

  • HoodieBackedTableMetadataWriter.java: Fixed listAllPartitionsFromMDT to use relative partition paths when constructing DirectoryInfo entries.

Impact

No public API or config changes. Users who enable column stats on an existing table (i.e., FILES partition already initialized but column stats was not) will now get a correctly populated column stats index, enabling data skipping to work as expected instead of silently returning no stats.

Risk Level

Low

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 12, 2026
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Collaborator

@linliu-code linliu-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants