[HUDI-5394] Fix tests for RowCustomColumnsSortPartitioner by xushiyan · Pull Request #8741 · apache/hudi

xushiyan · 2023-05-17T14:37:27Z

Change Logs

Fix test to use RowCustomColumnsSortPartitioner as global sort partitioner.

This blocks #8445

Impact

NA

Risk level

Low.

Documentation Update

NA

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

xushiyan · 2023-05-17T14:59:56Z

...test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitionerForRows.java

                                                boolean populateMetaFields) {
-    Dataset<Row> records1 = generateTestRecords();
-    Dataset<Row> records2 = generateTestRecords();
+    Dataset<Row> records = generateTestRecords();


not sure why the existing test case runs the same logic twice with records1 and records2. @boneanxs any thoughts?

testCustomColumnSortPartitionerWithRows was copied from testBulkInsertInternalPartitioner. And I looked org.apache.hudi.execution.bulkinsert.TestBulkInsertInternalPartitioner#testBulkInsertInternalPartitioner:177, it actually generates two records sets with different union times:

JavaRDD<HoodieRecord> records1 = generateTestRecordsForBulkInsert(jsc); JavaRDD<HoodieRecord> records2 = generateTripleTestRecordsForBulkInsert(jsc);

So I think this should be a mistake, and I think union it twice should be enough(Here different union times for different partitions?)

There is no change for any codes in the write path, so why the tests run successfully for Spark 3.1 or 2.4 ?

the test only passes spark 2.4 which is an coincident. The existing test logic asserts 2 rdd partitions after re-partition by the partitioner. with spark 2.4's sort and coalesce, it gives 2 and passes the test as a local partitioner. The correct expectation is the partitioner is doing global sort and the resulting num partition should be 2 or less, which is what spark 3 gives us.

hudi-bot · 2023-05-17T22:49:52Z

CI report:

4cd14cd Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

xushiyan · 2023-05-18T03:52:37Z

...test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitionerForRows.java

-        records2, true, false, true, generateExpectedPartitionNumRecords(records2), Option.of(comparator), true);
+        records, true, true, true, generateExpectedPartitionNumRecords(records), Option.of(comparator), true);


the existing test treated the partitioner as non-global and hence failed the test scenario under spark 3.2

boneanxs

Sorry for the mistake. +1 for this

boneanxs · 2023-05-18T04:13:56Z

...test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitionerForRows.java

                                                boolean populateMetaFields) {
-    Dataset<Row> records1 = generateTestRecords();
-    Dataset<Row> records2 = generateTestRecords();
+    Dataset<Row> records = generateTestRecords();


testCustomColumnSortPartitionerWithRows was copied from testBulkInsertInternalPartitioner. And I looked org.apache.hudi.execution.bulkinsert.TestBulkInsertInternalPartitioner#testBulkInsertInternalPartitioner:177, it actually generates two records sets with different union times:

JavaRDD<HoodieRecord> records1 = generateTestRecordsForBulkInsert(jsc); JavaRDD<HoodieRecord> records2 = generateTripleTestRecordsForBulkInsert(jsc);

So I think this should be a mistake, and I think union it twice should be enough(Here different union times for different partitions?)

danny0405

+1

[HUDI-5394] Fix tests for RowCustomColumnsSortPartitioner

4cd14cd

xushiyan commented May 17, 2023

View reviewed changes

xushiyan requested a review from yihua May 17, 2023 17:15

xushiyan commented May 18, 2023

View reviewed changes

boneanxs approved these changes May 18, 2023

View reviewed changes

xushiyan requested review from danny0405 and removed request for yihua May 18, 2023 07:27

danny0405 approved these changes May 18, 2023

View reviewed changes

xushiyan merged commit 9ef7bd8 into apache:master May 18, 2023

xushiyan deleted the HUDI-5394-fix-partitioner-ut branch May 18, 2023 11:08

xushiyan added the priority:high Significant impact; potential bugs label May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5394] Fix tests for RowCustomColumnsSortPartitioner#8741

[HUDI-5394] Fix tests for RowCustomColumnsSortPartitioner#8741
xushiyan merged 1 commit intoapache:masterfrom
xushiyan:HUDI-5394-fix-partitioner-ut

xushiyan commented May 17, 2023 •

edited

Loading

Uh oh!

xushiyan May 17, 2023 •

edited

Loading

Uh oh!

boneanxs May 18, 2023

Uh oh!

danny0405 May 18, 2023

Uh oh!

xushiyan May 18, 2023

Uh oh!

hudi-bot commented May 17, 2023

Uh oh!

xushiyan May 18, 2023

Uh oh!

boneanxs left a comment

Uh oh!

boneanxs May 18, 2023

Uh oh!

danny0405 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		records2, true, false, true, generateExpectedPartitionNumRecords(records2), Option.of(comparator), true);
		records, true, true, true, generateExpectedPartitionNumRecords(records), Option.of(comparator), true);

Conversation

xushiyan commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level

Documentation Update

Contributor's checklist

Uh oh!

xushiyan May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

boneanxs May 18, 2023

Choose a reason for hiding this comment

Uh oh!

danny0405 May 18, 2023

Choose a reason for hiding this comment

Uh oh!

xushiyan May 18, 2023

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented May 17, 2023

CI report:

Uh oh!

xushiyan May 18, 2023

Choose a reason for hiding this comment

Uh oh!

boneanxs left a comment

Choose a reason for hiding this comment

Uh oh!

boneanxs May 18, 2023

Choose a reason for hiding this comment

Uh oh!

danny0405 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xushiyan commented May 17, 2023 •

edited

Loading

xushiyan May 17, 2023 •

edited

Loading