Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata #1157

Merged
merged 5 commits into from
Mar 3, 2020

Conversation

hddong
Copy link
Contributor

@hddong hddong commented Dec 30, 2019

What is the purpose of the pull request

Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

Brief change log

  • Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

Verify this pull request

This pull request is a trivial rework / code cleanup without any test coverage.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hddong hddong changed the title Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata Dec 30, 2019
@hmatu
Copy link
Contributor

hmatu commented Dec 30, 2019

@hddong can you explain why add operation type to HoodieCommitMetadata? thanks

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong : Thanks for taking this task. Have added few comments. Looks to be going in correct direction.

@@ -171,7 +172,7 @@ public static SparkConf registerClasses(SparkConf conf) {
JavaRDD<HoodieRecord<T>> taggedRecords = index.tagLocation(dedupedRecords, jsc, table);
metrics.updateIndexMetrics(LOOKUP_STR, metrics.getDurationInMs(indexTimer == null ? 0L : indexTimer.stop()));
indexTimer = null;
return upsertRecordsInternal(taggedRecords, commitTime, table, true);
return upsertRecordsInternal(taggedRecords, commitTime, table, true, Type.UPSERT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an enum OperationType in HoodieWriteClient. It is more fine-grained in that it is able to distinguish between PREPPED and non-PREDDED version of operations. Can we use that enum instead of Type. You can move it to a separate enum class in the package org.apache.hudi.common.model and name it as WriteOperationType

@@ -106,6 +150,14 @@ public void setCompacted(Boolean compacted) {
return filePaths;
}

public void setOperateType(Type type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add the enum type to the avro schema hudi-common/src/main/avro/HoodieCommitMetadata.avsc

For archiving, we use the avro class org.apache.hudi.avro.model.HoodieCommitMetadata instead of org.apache.hudi.common.model.HoodieCommitMetadata

HoodieCommitArchiveLog.commitMetadataConverter needs to change to copy the operation types to avro objects.

@@ -46,4 +47,25 @@ public void testPerfStatPresenceInHoodieMetadata() throws Exception {
Assert.assertTrue(metadata.getTotalScanTime() == 0);
Assert.assertTrue(metadata.getTotalLogFilesCompacted() > 0);
}

@Test
public void testCompatibilityWithoutOperateType() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the compatibility tests.

@bvaradar bvaradar self-assigned this Dec 31, 2019
@hddong
Copy link
Contributor Author

hddong commented Dec 31, 2019

@hddong can you explain why add operation type to HoodieCommitMetadata? thanks

@hmatu It has no special function except for information tagging and tracing at this patch now.

@hddong
Copy link
Contributor Author

hddong commented Dec 31, 2019

@bvaradar thanks for your review and suggestion, I will modify later.

@hddong hddong requested a review from bvaradar January 2, 2020 09:49
@hddong
Copy link
Contributor Author

hddong commented Jan 2, 2020

@bvaradar please re-review this.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong : Just finished the review with some comments. Thanks for the contribution.

@@ -129,6 +129,11 @@
}],
"default": null
},
{
"name":"operateType",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: operationtionType

@@ -106,6 +108,14 @@ public void setCompacted(Boolean compacted) {
return filePaths;
}

public void setOperateType(WriteOperationType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to operationType along with getters/setters.

val INSERT_OPERATION_OPT_VAL = "insert"
val UPSERT_OPERATION_OPT_VAL = "upsert"
val DELETE_OPERATION_OPT_VAL = "delete"
val BULK_INSERT_OPERATION_OPT_VAL = WriteOperationType.BULK_INSERT.toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us not change these configuration values as it would cause backwards compatibility issues.

@@ -129,6 +129,11 @@
}],
"default": null
},
{
"name":"operateType",
"type":["null","string"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use enum instead of string type ?

{
"name":"operateType",
"type":["null","string"],
"default": null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Can you confirm if the operation type is stored in the avro objects when archiving ?

@@ -510,21 +515,21 @@ private Partitioner getPartitioner(HoodieTable table, boolean isUpsert, Workload
/**
* Commit changes performed at the given commitTime marker.
*/
public boolean commit(String commitTime, JavaRDD<WriteStatus> writeStatuses) {
return commit(commitTime, writeStatuses, Option.empty());
public boolean commit(String commitTime, JavaRDD<WriteStatus> writeStatuses, WriteOperationType operationType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As only one hudi write operation is outstanding at a time, can you cache the last operation type in instance variables within HoodieWriteClient object so that users don't need to explicitly pass them in this commit() call

@hddong
Copy link
Contributor Author

hddong commented Jan 7, 2020

@bvaradar Operation type is stored in the avro objects when archiving, but there are a error here, it throw ClassCastException: org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to org.apache.hudi.avro.model.WriteOperationType with deepCopy block in 'show archived commit stats' . Can you give any suggestion when you're free.

@hddong
Copy link
Contributor Author

hddong commented Jan 8, 2020

@bvaradar Operation type is stored in the avro objects when archiving, but there are a error here, it throw ClassCastException: org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to org.apache.hudi.avro.model.WriteOperationType with deepCopy block in 'show archived commit stats' . Can you give any suggestion when you're free.

@bvaradar I found it cause by AVRO-1676 and fixed in AVRO-1.8.0. So can i roll back operationType to String type? It may cause other compatibility problem If upgrade avro.

@hddong
Copy link
Contributor Author

hddong commented Jan 10, 2020

@bvaradar All has been fixed, please review again. And operationType now be String, due to AVRO-1676 .

@hmatu
Copy link
Contributor

hmatu commented Jan 10, 2020

IMO, WriteOperationType duplicate to OperationType, keep one is a better way.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong : I am ok with the string type due to the enum deep copying issue. Once you address other comments, we can merge.

@@ -98,7 +99,7 @@
private final transient HoodieMetrics metrics;
private final transient HoodieCleanClient<T> cleanClient;
private transient Timer.Context compactionTimer;

private transient WriteOperationType operationType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this AbstractHoodieWriteClient and use setters/getters in HoodieWriteClient

@@ -492,6 +501,11 @@ protected void postCommit(HoodieCommitMetadata metadata, String instantTime,
}
}

@Override
protected void updateOperationType(HoodieCommitMetadata metadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the operationType instance to base-class, you no longer need this overridden method.

@@ -397,10 +402,81 @@ public void testArchiveCommitCompactionNoHole() throws IOException {
timeline.containsInstant(new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, "107")));
}

@Test
public void testArchiveCommitAndDeepCopy() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used to test the enum issue in Avro? If so, you can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used to test the enum issue in Avro? If so, you can remove it.

Yes, be removed.

(org.apache.hudi.avro.model.HoodieCommitMetadata) commitMetadataConverter.invoke(archiveLog, hoodieCommitMetadata);
assertEquals(expectedCommitMetadata.getOperationType(), WriteOperationType.INSERT.toString());
} catch (NoSuchMethodException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of e.printStackTrace() in all these catch blocks, can you throw these exception for the test to fail

@hddong
Copy link
Contributor Author

hddong commented Jan 10, 2020

@bvaradar Thanks very much for your review, all of them be addressed.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some concerns on backwards compatibility.. We don't write commit metadata as avro , so may be this works for now?

@@ -129,6 +129,11 @@
}],
"default": null
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this at the end? is nt that need for this to be backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is nt that need for this to be backwards compatible

Move it to the end. IMO, it has no impact on compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Vinoth, Avro schema evolution expects new fields to be appended.

@hddong
Copy link
Contributor Author

hddong commented Jan 13, 2020

Have some concerns on backwards compatibility.. We don't write commit metadata as avro , so may be this works for now?

@vinothchandar It works for archiving now, no for write commit metadata as avro.

@vinothchandar
Copy link
Member

@hddong thanks! avro works based on field positions, so reordering them was my concern. Thanks for addressing this. over to @bvaradar

@bvaradar
Copy link
Contributor

@hddong : lgtm. Once you rebase this PR and resolve conflicts, we can merge

@hddong
Copy link
Contributor Author

hddong commented Jan 14, 2020

@bvaradar @vinothchandar Thanks. rebased this and there are no conflicts now.

@bvaradar
Copy link
Contributor

@hddong : Did you forgot to push the diff. Still seeing conflicts.

@hddong
Copy link
Contributor Author

hddong commented Jan 16, 2020

@bvaradar this was weird. In my side, this web page shows there are no conflicts, it's all passed and ready to merge.

@bvaradar
Copy link
Contributor

Screen Shot 2020-01-15 at 5 47 57 PM

@bvaradar
Copy link
Contributor

@hddong : If possible try to see if you can run it again ? pull master, rebase and force-push

@hddong hddong force-pushed the hudi-332-add-operation-type branch 2 times, most recently from f048357 to 68e61ab Compare January 16, 2020 06:55
@hddong
Copy link
Contributor Author

hddong commented Jan 16, 2020

@bvaradar I tried what you suggest, but there are some other problems. I tried another way and tried rebase locally, it seems ok.

@hddong
Copy link
Contributor Author

hddong commented Jan 31, 2020

@bvaradar @vinothchandar please review this again.

@hmatu
Copy link
Contributor

hmatu commented Jan 31, 2020

@bvaradar Operation type is stored in the avro objects when archiving, but there are a error here, it throw ClassCastException: org.apache.avro.generic.GenericData$EnumSymbol cannot be cast to org.apache.hudi.avro.model.WriteOperationType with deepCopy block in 'show archived commit stats' . Can you give any suggestion when you're free.

@bvaradar I found it cause by AVRO-1676 and fixed in AVRO-1.8.0. So can i roll back operationType to String type? It may cause other compatibility problem If upgrade avro.

Already using avro-1.8.2 dependency, String type is redundant now, Enum type is enough.

https://github.com/apache/incubator-hudi/blob/f27c7a16c6d437efaa83e50a7117b83e5201ac49/pom.xml#L96

@bvaradar
Copy link
Contributor

@hmatu : Sorry for the long delay. Was offline for a while. I agree that operationType should be string in both avro and json structure to keep it simple. Once you make the final change and resolve merge conflict, I will merge this PR.

@hddong hddong force-pushed the hudi-332-add-operation-type branch from 759cde9 to 68e61ab Compare March 2, 2020 12:43
@hddong hddong force-pushed the hudi-332-add-operation-type branch from 68e61ab to 9ad174a Compare March 2, 2020 12:54
@hddong
Copy link
Contributor Author

hddong commented Mar 2, 2020

@bvaradar As you said keep operationType string here, and I had resolve all conflict.


Class<?> clazz = HoodieCommitArchiveLog.class;
try {
Method commitMetadataConverter = clazz.getDeclaredMethod("commitMetadataConverter", HoodieCommitMetadata.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong One final comment : Can you make commitMetadataConverter() in HoodieCommitArchiveLog with default access (instead of private). This way, you dont need to deal with reflection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong One final comment : Can you make commitMetadataConverter() in HoodieCommitArchiveLog with default access (instead of private). This way, you dont need to deal with reflection.

@bvaradar Change to publice access, because TestHoodieCommitArchiveLog and HoodieCommitArchiveLog not in the same package path.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment. Otherwise ready to merge.

@codecov-io
Copy link

Codecov Report

Merging #1157 into master will decrease coverage by 0.09%.
The diff coverage is 80%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master    #1157     +/-   ##
===========================================
- Coverage     67.09%   66.99%   -0.1%     
  Complexity      223      223             
===========================================
  Files           333      334      +1     
  Lines         16216    16269     +53     
  Branches       1659     1660      +1     
===========================================
+ Hits          10880    10900     +20     
- Misses         4598     4632     +34     
+ Partials        738      737      -1
Impacted Files Coverage Δ Complexity Δ
.../org/apache/hudi/table/HoodieCommitArchiveLog.java 75% <ø> (ø) 0 <0> (ø) ⬇️
.../apache/hudi/client/AbstractHoodieWriteClient.java 70.89% <100%> (-4.24%) 0 <0> (ø)
...apache/hudi/common/model/HoodieCommitMetadata.java 54.58% <100%> (-3.5%) 0 <0> (ø)
...java/org/apache/hudi/client/HoodieWriteClient.java 69.77% <100%> (+0.54%) 0 <0> (ø) ⬇️
...g/apache/hudi/common/model/WriteOperationType.java 57.14% <57.14%> (ø) 0 <0> (?)
...n/java/org/apache/hudi/common/model/HoodieKey.java 94.44% <0%> (+5.55%) 0% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d04014...e34024f. Read the comment docs.

Copy link
Contributor

@bvaradar bvaradar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the patience @hddong

@bvaradar bvaradar merged commit 8306205 into apache:master Mar 3, 2020
lyogev pushed a commit to YotpoLtd/incubator-hudi that referenced this pull request Mar 30, 2020
…dieCommitMetadata (apache#1157)

[HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata (apache#1157)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants