-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
When adding strict data validation within testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals that the partition path field reading fails (returns null) for some update records.
JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8837
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9108
- Fix version(s):
- 1.1.0
Comments
10/Jan/25 00:12;yihua;The test is added in #12490. Right now the validation excludes partition column. When adding that in the validation, the validation fails.
{code:java}
def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = {
assertEquals(df1.count, df2.count)
// TODO(HUDI-8723): fix reading partition path field on metadata bootstrap table
assertEquals(0, df1.drop(partitionColName).except(df2.drop(partitionColName)).count)
assertEquals(0, df2.drop(partitionColName).except(df1.drop(partitionColName)).count)
} {code}
;;;
10/Jan/25 00:26;daviszhang;so we can remove the .drop(partitionColName) in the validation func you mentioned, ran all tests in the test suite, all green. Assigned back to you;;;
28/Jan/25 01:05;yihua;This is still an issue for reading the partition column value out from a bootstrapped file slice (merging skeleton and data files), using the file group reader only. Deferring this ticket to 1.0.2 release.;;;