-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata #1157
Conversation
@hddong can you explain why add operation type to HoodieCommitMetadata? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hddong : Thanks for taking this task. Have added few comments. Looks to be going in correct direction.
@@ -171,7 +172,7 @@ public static SparkConf registerClasses(SparkConf conf) { | |||
JavaRDD<HoodieRecord<T>> taggedRecords = index.tagLocation(dedupedRecords, jsc, table); | |||
metrics.updateIndexMetrics(LOOKUP_STR, metrics.getDurationInMs(indexTimer == null ? 0L : indexTimer.stop())); | |||
indexTimer = null; | |||
return upsertRecordsInternal(taggedRecords, commitTime, table, true); | |||
return upsertRecordsInternal(taggedRecords, commitTime, table, true, Type.UPSERT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an enum OperationType in HoodieWriteClient. It is more fine-grained in that it is able to distinguish between PREPPED and non-PREDDED version of operations. Can we use that enum instead of Type. You can move it to a separate enum class in the package org.apache.hudi.common.model and name it as WriteOperationType
@@ -106,6 +150,14 @@ public void setCompacted(Boolean compacted) { | |||
return filePaths; | |||
} | |||
|
|||
public void setOperateType(Type type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add the enum type to the avro schema hudi-common/src/main/avro/HoodieCommitMetadata.avsc
For archiving, we use the avro class org.apache.hudi.avro.model.HoodieCommitMetadata instead of org.apache.hudi.common.model.HoodieCommitMetadata
HoodieCommitArchiveLog.commitMetadataConverter needs to change to copy the operation types to avro objects.
@@ -46,4 +47,25 @@ public void testPerfStatPresenceInHoodieMetadata() throws Exception { | |||
Assert.assertTrue(metadata.getTotalScanTime() == 0); | |||
Assert.assertTrue(metadata.getTotalLogFilesCompacted() > 0); | |||
} | |||
|
|||
@Test | |||
public void testCompatibilityWithoutOperateType() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the compatibility tests.
@bvaradar thanks for your review and suggestion, I will modify later. |
@bvaradar please re-review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hddong : Just finished the review with some comments. Thanks for the contribution.
@@ -129,6 +129,11 @@ | |||
}], | |||
"default": null | |||
}, | |||
{ | |||
"name":"operateType", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: operationtionType
@@ -106,6 +108,14 @@ public void setCompacted(Boolean compacted) { | |||
return filePaths; | |||
} | |||
|
|||
public void setOperateType(WriteOperationType type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to operationType along with getters/setters.
val INSERT_OPERATION_OPT_VAL = "insert" | ||
val UPSERT_OPERATION_OPT_VAL = "upsert" | ||
val DELETE_OPERATION_OPT_VAL = "delete" | ||
val BULK_INSERT_OPERATION_OPT_VAL = WriteOperationType.BULK_INSERT.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us not change these configuration values as it would cause backwards compatibility issues.
@@ -129,6 +129,11 @@ | |||
}], | |||
"default": null | |||
}, | |||
{ | |||
"name":"operateType", | |||
"type":["null","string"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use enum instead of string type ?
{ | ||
"name":"operateType", | ||
"type":["null","string"], | ||
"default": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Can you confirm if the operation type is stored in the avro objects when archiving ?
@@ -510,21 +515,21 @@ private Partitioner getPartitioner(HoodieTable table, boolean isUpsert, Workload | |||
/** | |||
* Commit changes performed at the given commitTime marker. | |||
*/ | |||
public boolean commit(String commitTime, JavaRDD<WriteStatus> writeStatuses) { | |||
return commit(commitTime, writeStatuses, Option.empty()); | |||
public boolean commit(String commitTime, JavaRDD<WriteStatus> writeStatuses, WriteOperationType operationType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As only one hudi write operation is outstanding at a time, can you cache the last operation type in instance variables within HoodieWriteClient object so that users don't need to explicitly pass them in this commit() call
@bvaradar Operation type is stored in the avro objects when archiving, but there are a error here, it throw |
@bvaradar I found it cause by AVRO-1676 and fixed in AVRO-1.8.0. So can i roll back |
IMO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hddong : I am ok with the string type due to the enum deep copying issue. Once you address other comments, we can merge.
@@ -98,7 +99,7 @@ | |||
private final transient HoodieMetrics metrics; | |||
private final transient HoodieCleanClient<T> cleanClient; | |||
private transient Timer.Context compactionTimer; | |||
|
|||
private transient WriteOperationType operationType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this AbstractHoodieWriteClient and use setters/getters in HoodieWriteClient
@@ -492,6 +501,11 @@ protected void postCommit(HoodieCommitMetadata metadata, String instantTime, | |||
} | |||
} | |||
|
|||
@Override | |||
protected void updateOperationType(HoodieCommitMetadata metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move the operationType instance to base-class, you no longer need this overridden method.
@@ -397,10 +402,81 @@ public void testArchiveCommitCompactionNoHole() throws IOException { | |||
timeline.containsInstant(new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, "107"))); | |||
} | |||
|
|||
@Test | |||
public void testArchiveCommitAndDeepCopy() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used to test the enum issue in Avro? If so, you can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used to test the enum issue in Avro? If so, you can remove it.
Yes, be removed.
(org.apache.hudi.avro.model.HoodieCommitMetadata) commitMetadataConverter.invoke(archiveLog, hoodieCommitMetadata); | ||
assertEquals(expectedCommitMetadata.getOperationType(), WriteOperationType.INSERT.toString()); | ||
} catch (NoSuchMethodException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of e.printStackTrace() in all these catch blocks, can you throw these exception for the test to fail
@bvaradar Thanks very much for your review, all of them be addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some concerns on backwards compatibility.. We don't write commit metadata as avro , so may be this works for now?
@@ -129,6 +129,11 @@ | |||
}], | |||
"default": null | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this at the end? is nt that need for this to be backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is nt that need for this to be backwards compatible
Move it to the end. IMO, it has no impact on compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Vinoth, Avro schema evolution expects new fields to be appended.
@vinothchandar It works for archiving now, no for write commit metadata as avro. |
@hddong : lgtm. Once you rebase this PR and resolve conflicts, we can merge |
@bvaradar @vinothchandar Thanks. rebased this and there are no conflicts now. |
@hddong : Did you forgot to push the diff. Still seeing conflicts. |
@bvaradar this was weird. In my side, this web page shows there are no conflicts, it's all passed and ready to merge. |
@hddong : If possible try to see if you can run it again ? pull master, rebase and force-push |
f048357
to
68e61ab
Compare
@bvaradar I tried what you suggest, but there are some other problems. I tried another way and tried rebase locally, it seems ok. |
@bvaradar @vinothchandar please review this again. |
Already using https://github.com/apache/incubator-hudi/blob/f27c7a16c6d437efaa83e50a7117b83e5201ac49/pom.xml#L96 |
@hmatu : Sorry for the long delay. Was offline for a while. I agree that operationType should be string in both avro and json structure to keep it simple. Once you make the final change and resolve merge conflict, I will merge this PR. |
759cde9
to
68e61ab
Compare
68e61ab
to
9ad174a
Compare
@bvaradar As you said keep operationType string here, and I had resolve all conflict. |
|
||
Class<?> clazz = HoodieCommitArchiveLog.class; | ||
try { | ||
Method commitMetadataConverter = clazz.getDeclaredMethod("commitMetadataConverter", HoodieCommitMetadata.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hddong One final comment : Can you make commitMetadataConverter() in HoodieCommitArchiveLog with default access (instead of private). This way, you dont need to deal with reflection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hddong One final comment : Can you make commitMetadataConverter() in HoodieCommitArchiveLog with default access (instead of private). This way, you dont need to deal with reflection.
@bvaradar Change to publice access, because TestHoodieCommitArchiveLog
and HoodieCommitArchiveLog
not in the same package path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment. Otherwise ready to merge.
Codecov Report
@@ Coverage Diff @@
## master #1157 +/- ##
===========================================
- Coverage 67.09% 66.99% -0.1%
Complexity 223 223
===========================================
Files 333 334 +1
Lines 16216 16269 +53
Branches 1659 1660 +1
===========================================
+ Hits 10880 10900 +20
- Misses 4598 4632 +34
+ Partials 738 737 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for the patience @hddong
…dieCommitMetadata (apache#1157) [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (apache#1157)
What is the purpose of the pull request
Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
Brief change log
Verify this pull request
This pull request is a trivial rework / code cleanup without any test coverage.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.