Skip to content

Fix Inconsistent key encoding for bootstrapping column stats in 0.x #18376

@vamsikarnika

Description

@vamsikarnika

Bug Description

What happened:
When column stats is enabled on 0.x branches which already contains files partitions, during bootstrapping key encoding considers entire partition path while readers only consider partition value alone.

This doesn't impact data consistency/correctness, but can affect the column stats pruning effectiveness

What you expected:

Steps to reproduce:

  1. Create a Hudi table with metadata table enabled, but column stats disabled
  2. After a commit, enable column stats
  3. column stats key encoding differs from expected key using partition value.

Environment

Hudi version: 0.14, 0.15
Query engine: (Spark/Flink/Trino etc)
Relevant configs:

Logs and Stack Trace

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions