Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-770] Organize upsert/insert API implementation under a single package #1495

Merged
merged 1 commit into from Apr 13, 2020

Conversation

bvaradar
Copy link
Contributor

@bvaradar bvaradar commented Apr 8, 2020

[HUDI-770] Organize upsert/insert API implementation under a single package

  1. Created ActionExecutor for performing upsert/insert.

There is inherent complexity with supporting auto commit vs non-auto commit. Once that is resolved, we will be able to remove much of the code in HoodieWriteClient.

@vinothchandar vinothchandar self-assigned this Apr 8, 2020
@codecov-io
Copy link

codecov-io commented Apr 8, 2020

Codecov Report

Merging #1495 into master will decrease coverage by 0.07%.
The diff coverage is 81.48%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1495      +/-   ##
============================================
- Coverage     72.25%   72.17%   -0.08%     
  Complexity      289      289              
============================================
  Files           338      365      +27     
  Lines         15946    16228     +282     
  Branches       1624     1632       +8     
============================================
+ Hits          11521    11712     +191     
- Misses         3697     3783      +86     
- Partials        728      733       +5     
Impacted Files Coverage Δ Complexity Δ
...c/main/java/org/apache/hudi/table/HoodieTable.java 79.64% <ø> (ø) 0.00 <0.00> (ø)
...g/apache/hudi/table/action/BaseActionExecutor.java 100.00% <ø> (ø) 0.00 <0.00> (ø)
...ltacommit/BulkInsertDeltaCommitActionExecutor.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...it/BulkInsertPreppedDeltaCommitActionExecutor.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...commit/InsertPreppedDeltaCommitActionExecutor.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...commit/UpsertPreppedDeltaCommitActionExecutor.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
.../org/apache/hudi/table/HoodieMergeOnReadTable.java 73.33% <22.22%> (-9.80%) 0.00 <0.00> (ø)
.../action/commit/BulkInsertCommitActionExecutor.java 55.55% <55.55%> (ø) 0.00 <0.00> (?)
.../commit/BulkInsertPreppedCommitActionExecutor.java 55.55% <55.55%> (ø) 0.00 <0.00> (?)
.../apache/hudi/table/action/commit/DeleteHelper.java 64.28% <64.28%> (ø) 0.00 <0.00> (?)
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0f96e0...8a3b612. Read the comment docs.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions!

} else {
return table.getInsertPartitioner(profile, jsc);
HoodieTable<T> hoodieTable) {
CommitActionResult result = hoodieTable.ingest(jsc, preppedRecords, instantTime, getOperationType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should just mirror the same APIs upsert, insert on the table.. ingest is confusing, since it also implies that we are reading from some source, whihc we are not..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return true;
}

void doPostCommitAndEmitCommitMetrics(String instantTime, HoodieCommitMetadata metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this into two methods? one to do the post commit and one to emit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
return table.getInsertPartitioner(profile, jsc);
HoodieTable<T> hoodieTable) {
CommitActionResult result = hoodieTable.ingest(jsc, preppedRecords, instantTime, getOperationType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and can we return HoodieCommitMetadata instead? I guess the issue is we have to pass this to the caller (deltastreamer or datasource) to decide whether to commit or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Supporting externally triggered commit is the reason why we have CommitActionResult.


import scala.Tuple2;

public abstract class AbstractBaseCommitActionExecutor<T extends HoodieRecordPayload<T>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to just BaseCommitActionExecutor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public class CommitActionResult {

private JavaRDD<WriteStatus> writeStatuses;
private Duration indexUpdateDuration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tagging the index is outside this action, then updating should be moved out as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed as as part of creating separate executors for each operation.


WriteOperationType(String value) {
WriteOperationType(String value, boolean isUpsert) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is nt this really debt? can we get rid of this boolean and just replace with a helper method that takes in the operation type and returns true/false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -69,4 +71,8 @@ public static WriteOperationType fromValue(String value) {
throw new HoodieException("Invalid value of Type.");
}
}

public boolean isUpsert() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this method, can be just a static method that returns true/false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Contains metadata, write-statuses and latency times corresponding to a commit/delta-commit action.
*/
public class CommitActionResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we are able to return HoodieCommitMetadata... but since we can't, may be still rename this to HoodieWriteMetadata to stay consistent with the other objects we are returning now. (RollbackMetadata, RestoreMetadata etc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bvaradar bvaradar force-pushed the hudi-408 branch 2 times, most recently from 5359fd0 to ce8cc69 Compare April 12, 2020 23:05
@bvaradar
Copy link
Contributor Author

@vinothchandar : Addressed review comments. Please take a look when you get a chance.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on naming.. Overall, felt we can organize for reuse more.. but this already improves things substantially.. So please land after you go over the comments and CI is happy

@bvaradar
Copy link
Contributor Author

@vinothchandar : Addressed comments. Will merge once the CI passes.

@bvaradar bvaradar merged commit 17bf930 into apache:master Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants