New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5031] Fix MERGE INTO creates empty partition files when source table has partitions but target table does not #6983
Conversation
...ent/hudi-client-common/src/main/java/org/apache/hudi/execution/CopyOnWriteInsertHandler.java
Outdated
Show resolved
Hide resolved
@weimingdiit i think only |
@YannByron yes, it is |
58464a6
to
f5a8b04
Compare
if (handle == null) { | ||
// If the records are sorted, this means that we encounter a new partition path | ||
// and the records for the previous partition path are all written, | ||
// so we can safely closely existing open handle to reduce memory footprint. | ||
if (areRecordsSorted) { | ||
closeOpenHandles(); | ||
if (handle == null || !handle.canWrite(payload.record)) { | ||
if (handle == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this logic flow? pls revert unnecessary changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no change this logic,Some code is duplicate:
in my first version :
first judge ignored record, other code no change,but every record need to judge,it is unnecessary
second version:
i move ignored record logic into if condition, but it need twice,if(handle == null) {...} and if(!handle.canWrite(payload.record)) {...}, and writeHandleFactory.create() also call twice, it is duplicate.
so i change this logic flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xushiyan how about this? If worried about causing other hidden problems, I can change the code , revert logic flow and return early
insertPayload.getPartitionPath(), idPrefix, taskContextSupplier); | ||
insertPayload.getPartitionPath(), idPrefix, taskContextSupplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
Option<IndexedRecord> insertRecord = payload.insertValue; | ||
// just skip the ignored record,do not make partitions on fs | ||
if (insertRecord.isPresent() && insertRecord.get().equals(IGNORE_RECORD)) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to return based on these conditions, why not return early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after judge handle, so you don't have to judge every Record, Just judge when the if condition is met,Most records do not need to be judged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xushiyan how about that?
f5a8b04
to
5d6e723
Compare
5d6e723
to
862be37
Compare
@xushiyan @yihua @XuQianJin-Stars hi, please take a review for this pr. thks! |
48ef299
to
139a4b2
Compare
139a4b2
to
d1694b0
Compare
d1694b0
to
d2f4ce7
Compare
@weimingdiit : FYI: Have rebased and made slight changes. Will help take this to completion |
…table has partitions but target table does not
…rce table has partitions and the target table does not
Change Logs
Hudi merge into creates empty partition files when the source table has partitions and the target table does not
Impact
No impact for users, this is a internal code refactoring.
Risk level (write none, low medium or high below)
medium,see ut for detail
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist