Skip to content

[HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal #2096

Closed
Karl-WangSK wants to merge 0 commit intoapache:masterfrom
Karl-WangSK:HUDI-1284
Closed

[HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal #2096
Karl-WangSK wants to merge 0 commit intoapache:masterfrom
Karl-WangSK:HUDI-1284

Conversation

@Karl-WangSK
Copy link
Contributor

Tips

What is the purpose of the pull request

preCombine all HoodieRecords and update all fields(which is not DefaultValue) according to orderingVal

Brief change log

When more than one HoodieRecord have the same HoodieKey, this function combines all fields(which is not DefaultValue)
before attempting to insert/upsert (if combining turned on in HoodieClientConfig).
eg: 1)

In preCombine: (suppose Default value is null)
id   name     age      money      ts
1    Karl    null      30         0.0   (orderingVal=1)
1    null     18       40         0.0   (orderingVal=2)
After:
id   name     age      money      ts
1     Karl    18         40      0.0

Verify this pull request

Added one test in TestOverwriteWithLatestAvroPayload to verify the change

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@Karl-WangSK
Copy link
Contributor Author

@vinothchandar hi. Can you look at this pr when you are freeeee?

@vinothchandar vinothchandar self-assigned this Sep 21, 2020
@vinothchandar
Copy link
Member

@Karl-WangSK Will do!

@Karl-WangSK Karl-WangSK changed the title [HUDI-1284] preCombine all HoodieRecords and update all fields(which is not DefaultValue) according to orderingVal [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal Sep 22, 2020
@Karl-WangSK
Copy link
Contributor Author

@leesf

Comment on lines +193 to +205
HoodieWriteMetadata result = table.upsert(jsc, instantTime, records, null);
if (result.getIndexLookupDuration().isPresent()) {
metrics.updateIndexMetrics(LOOKUP_STR, result.getIndexLookupDuration().get().toMillis());
}
return postWrite(result, instantTime, table);
}

public JavaRDD<WriteStatus> upsert(JavaRDD<HoodieRecord<T>> records, final String instantTime, String schema) {
HoodieTable<T> table = getTableAndInitCtx(WriteOperationType.UPSERT, instantTime);
table.validateUpsertSchema();
setOperationType(WriteOperationType.UPSERT);
this.asyncCleanerService = AsyncCleanerService.startAsyncCleaningIfEnabled(this, instantTime);
HoodieWriteMetadata result = table.upsert(jsc, instantTime, records, schema);
Copy link
Contributor

@leesf leesf Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an unrelated change?


/**
* Upserts the given prepared records into the Hoodie table, at the supplied instantTime.
* Upserts the given prepared records into the Hoodie table,/TestHoodieClientOnCopyOnWriteStorage at the supplied instantTime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@leesf
Copy link
Contributor

leesf commented Sep 23, 2020

@Karl-WangSK Thanks for your contributing, I see there are lots of unrelated changes, pls rebase to latest master branch, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants