Skip to content

Fix reading partition path field on metadata bootstrap table #17358

@hudi-bot

Description

@hudi-bot

When adding strict data validation within testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals that the partition path field reading fails (returns null) for some update records. 

JIRA info


Comments

10/Jan/25 00:12;yihua;The test is added in #12490. Right now the validation excludes partition column.  When adding that in the validation, the validation fails.

 
{code:java}
def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = {
    assertEquals(df1.count, df2.count)
    // TODO(HUDI-8723): fix reading partition path field on metadata bootstrap table
    assertEquals(0, df1.drop(partitionColName).except(df2.drop(partitionColName)).count)
    assertEquals(0, df2.drop(partitionColName).except(df1.drop(partitionColName)).count)
  } {code}
 

 ;;;


10/Jan/25 00:26;daviszhang;so we can remove the .drop(partitionColName) in the validation func you mentioned, ran all tests in the test suite, all green. Assigned back to you;;;


28/Jan/25 01:05;yihua;This is still an issue for reading the partition column value out from a bootstrapped file slice (merging skeleton and data files), using the file group reader only. Deferring this ticket to 1.0.2 release.;;;

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions