-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords #8300
Conversation
@danny0405 Can you help review it? Thanks! |
...ient/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java
Show resolved
Hide resolved
Thanks for the contribution, I have reviewed and attached a patch which is based on the latest master: |
@danny0405 So I think it may be more reasonable to use a list, put each field value in the list, and then use the comparison function of each item alone. |
return StringUtils.objToString(recordValue); | ||
} | ||
Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled); | ||
return FlatLists.ofComparableArray(columnValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the NULL still obeying the null_first order? Could the nulls throw exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not asking you to throw exception for nulls here, the original code has the NULL_FIRST semantics, that means a null is always greater than any other non_nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No exception will be thrown here, it's null_first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior is null_last, the original comment is wrong, it returned empty string for nulls, empty string should be always smaller than non empty strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix JavaCustomColumnsSortPartitioner
too.
return values1.toString().compareTo(values2.toString()); | ||
FlatLists.ComparableList<Comparable> cmp1 = FlatLists.ofComparableArray( | ||
HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord) o1, sortColumnNames, schema, consistentLogicalTimestampEnabled) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmp1 -> values1, cmp2 -> values2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}, (o1, o2) -> { | ||
FlatLists.ComparableList obj1 = FlatLists.ofComparableArray(o1.toArray()); | ||
FlatLists.ComparableList obj2 = FlatLists.ofComparableArray(o2.toArray()); | ||
return obj1.compareTo(obj2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj1 -> values1, obj2 -> values2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…pache#8300) The concatenation of multiple field strings could introudce ambiguity, for e.g, the comparison of '21,3' and '2,13' does not return the correct order. This patch fixes the comparison in form of ComparableList.
…pache#8300) The concatenation of multiple field strings could introudce ambiguity, for e.g, the comparison of '21,3' and '2,13' does not return the correct order. This patch fixes the comparison in form of ComparableList.
Change Logs
When obtaining multiple specified fields, the return value is actually an array, but here it is directly obtained as an object:
Object recordValue = record.getColumnValues(...)
So it is converted into a string later:
StringUtils.objToString(recordValue)
,in fact, the address of the previous array is obtained, resulting in a sorting error.
Impact
Modify the sort function.
Risk level (write none, low medium or high below)
low.
Documentation Update
none
Contributor's checklist