Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1884] MergeInto Support Partial Update For COW #3154

Merged
merged 1 commit into from
Jul 17, 2021

Conversation

pengzhiwei2018
Copy link

What is the purpose of the pull request

MergeInto Support Partial Update For COW.

Brief change log

  • Complete the missing UPDATE ACTION fields with the target table fields. e.g.

merge into h0 using s0 on h0.id = s0.id when matched then update set price = s0.price, ts = s0.ts;

we complete the missing fields to the update action, like this:

merge into h0 using s0 on h0.id = s0.id when matched then update set id = h0.id, name = h0.name, price = s0.price, ts = s0.ts;

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link

hudi-bot commented Jun 25, 2021

CI report:

  • 63cf96134656dbb41eaab4e03e747776599255b6 UNKNOWN
  • 82b4b8bef8d143a14d445ae35fa5ec8fbbd5889f UNKNOWN
  • 5c55583 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2021

Codecov Report

Merging #3154 (5c55583) into master (c8aaf00) will decrease coverage by 0.00%.
The diff coverage is 62.85%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3154      +/-   ##
============================================
- Coverage     47.82%   47.81%   -0.01%     
+ Complexity     5568     5566       -2     
============================================
  Files           936      936              
  Lines         41629    41655      +26     
  Branches       4188     4195       +7     
============================================
+ Hits          19907    19916       +9     
- Misses        19954    19968      +14     
- Partials       1768     1771       +3     
Flag Coverage Δ
hudicli 39.97% <ø> (ø)
hudiclient 34.51% <ø> (ø)
hudicommon 48.67% <ø> (-0.02%) ⬇️
hudiflink 59.61% <ø> (ø)
hudihadoopmr 52.02% <ø> (ø)
hudisparkdatasource 67.27% <62.85%> (-0.18%) ⬇️
hudisync 55.97% <ø> (ø)
huditimelineservice 64.07% <ø> (ø)
hudiutilities 59.26% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sql/hudi/command/MergeIntoHoodieTableCommand.scala 70.04% <0.00%> (+0.70%) ⬆️
...a/org/apache/spark/sql/adapter/Spark2Adapter.scala 0.00% <ø> (ø)
...a/org/apache/spark/sql/adapter/Spark3Adapter.scala 0.00% <ø> (ø)
...ala/org/apache/spark/sql/hudi/HoodieSqlUtils.scala 66.12% <40.00%> (-2.30%) ⬇️
...pache/spark/sql/hudi/analysis/HoodieAnalysis.scala 63.42% <71.42%> (+1.68%) ⬆️
...in/scala/org/apache/hudi/HoodieStreamingSink.scala 28.00% <0.00%> (-6.40%) ⬇️
...e/hudi/common/table/log/HoodieLogFormatWriter.java 78.12% <0.00%> (-1.57%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8aaf00...5c55583. Read the comment docs.

// e.g. If the update action missing 'id' attribute, we fill a "id = target.id" to the update action.
val newAssignments = removeMetaFields(target.output)
.map(attr => {
// TODO support partial update for MOR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for partial update, do we need another payload like this? #2666?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can support partial update for cow table by the ExpressionPayload added in the sql support.

@vinothchandar
Copy link
Member

@umehrot2 @zhedoubushishi @rmpifer can one of you please review this?

@wangxianghu wangxianghu added the spark Issues related to spark label Jun 29, 2021
@pengzhiwei2018 pengzhiwei2018 force-pushed the dev_part_update branch 3 times, most recently from 63cf961 to a9cc94f Compare July 6, 2021 02:21
@pengzhiwei2018
Copy link
Author

@umehrot2 @xiarixiaoyao can you take a look at this PR?

@@ -102,7 +103,7 @@ case class HoodieResolveReferences(sparkSession: SparkSession) extends Rule[Logi

def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
// Resolve merge into
case MergeIntoTable(target, source, mergeCondition, matchedActions, notMatchedActions)
case mergeInto@MergeIntoTable(target, source, mergeCondition, matchedActions, notMatchedActions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better keep style like spark。 case mergeInto @ MergeIntoTable(target, source, mergeCondition, matchedActions, notMatchedActions)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will fix this format.

s" for MOR table. Currently we cannot support partial update for MOR," +
s" please complete all the target fields just like '...update set id = s0.id, name = s0.name ....'")
}
if (preCombineField.isDefined && preCombineField.get.equalsIgnoreCase(attr.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should we specify value for the preCombineField in merge-into update action。 maybe we can delete this judgment logic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table has defined preCombineField, we must know the mapping of preCombineField in target table to the source field. So we must specify the value for preCombineField, or not, we cannot read the preCombineField field from the source.

@xiarixiaoyao
Copy link
Contributor

LGTM

@pengzhiwei2018 pengzhiwei2018 force-pushed the dev_part_update branch 4 times, most recently from f551185 to f5f0fd3 Compare July 16, 2021 03:49
@pengzhiwei2018
Copy link
Author

Hi @n3nash can you take a at the test case TestHoodieDeltaStreamer.testUpsertsCOWContinuousModeWithMultipleWriters and TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict. It often crash and block the ci.
Here is the CI logs:
https://api.travis-ci.com/v3/job/524829045/log.txt
https://api.travis-ci.com/v3/job/524829044/log.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark Issues related to spark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants