-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset #1009
Conversation
9d84e21
to
093c8c5
Compare
882710e
to
d2b87d7
Compare
@vinothchandar @n3nash : Ready for review. |
d2b87d7
to
0eac6a0
Compare
hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-client/src/test/java/org/apache/hudi/TestClientRollback.java
Outdated
Show resolved
Hide resolved
hudi-client/src/test/java/org/apache/hudi/index/TestHbaseIndex.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
Outdated
Show resolved
Hide resolved
fromInstant.getFileName()))); | ||
// Use Write Once to create Target File | ||
writeFileOnceInPath(new Path(metaClient.getMetaPath(), toInstant.getFileName()), data); | ||
System.out.println("Create new file for toInstant ?" + new Path(metaClient.getMetaPath(), toInstant.getFileName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
Show resolved
Hide resolved
@bvaradar left some comments. In general, I couldn't understand how will existing tables move from VERSION_0 metadata to VERSION_1. Is the new version only supported for new tables ? If yes, what is the plan for the existing tables, if not, what is the migration strategy for existing tables ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level approach looks fine. fact that this did not need boiling the ocean is a testament that our code is in good shape actually :)
But left a bunch of comments. Will do a closer pass of rollback path in that context.
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java
Outdated
Show resolved
Hide resolved
return new HoodieDefaultTimeline(instants.stream().filter(instant -> { | ||
return instant.isInflight() && (!instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION)); | ||
return (!instant.isCompleted()) && (!instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wont this provide both inflight and requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was the intention. One of the places where we are using is rolling back pending commits
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java
Outdated
Show resolved
Hide resolved
hudi-client/src/test/java/org/apache/hudi/index/TestHbaseIndex.java
Outdated
Show resolved
Hide resolved
HoodieActiveTimeline rawActiveTimeline = new HoodieActiveTimeline(metaClient, false); | ||
Map<Pair<String, String>, List<HoodieInstant>> groupByTsAction = rawActiveTimeline.getInstants() | ||
.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), | ||
x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? HoodieTimeline.COMMIT_ACTION : x.getAction()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we have similar logic in timeline class itself? can we consolidate there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored a bit to be used here.
0535625
to
25ffe70
Compare
@vinothchandar @n3nash : Redid the migration handling and addressed your comments High-Level Changes since the previous review.
|
25ffe70
to
f3d1f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly final cosmetic changes.. One clarification : existing tables have to explicitly opt-in for this.. right?
You can merge once you do the final round and push again
hudi-cli/src/main/java/org/apache/hudi/cli/commands/DatasetsCommand.java
Outdated
Show resolved
Hide resolved
hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/model/TimelineLayoutVersion.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Outdated
Show resolved
Hide resolved
@@ -87,11 +90,14 @@ public HoodieTableMetaClient(Configuration conf, String basePath) throws Dataset | |||
} | |||
|
|||
public HoodieTableMetaClient(Configuration conf, String basePath, boolean loadActiveTimelineOnLoad) { | |||
this(conf, basePath, loadActiveTimelineOnLoad, ConsistencyGuardConfig.newBuilder().build()); | |||
this(conf, basePath, loadActiveTimelineOnLoad, ConsistencyGuardConfig.newBuilder().build(), | |||
// Readers will use latest version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even writers use HoodieTableMetaClient
right? can you clarify this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this is misleading. What I meant was MetaClient Readers ( use-cases which just lists the .hoodie folder) as opposed to MetaClient Writer (performing action transitions in .hoodie folder). Will remove this comment
})).map(HoodieInstant::new); | ||
|
||
if (applyLayoutVersionFilters) { | ||
instantStream = TimelineLayout.getLayout(getTimelineLayoutVersion()).filterHoodieInstants(instantStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the applyLayoutVersionFilters
is set selectively using which HoodieActiveTimeline constructor is invoked? Would this be fragile.. Thinking out loud, applying filters on V0, has no effect since there are nothing to get rid off. Only thing that could do wrong is not filtering V1.. hmmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key case here is for archival where we need all instants without filtering. May be introduce couple of factory methods which instantiate HoodieActiveTimeline w/o filtering ? HUDI-414
hudi-common/src/main/java/org/apache/hudi/common/table/TimelineLayout.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure this is an opt-in functionality and please remember to squash commits (I tend to forget sometime so just pointing it out :))
f3d1f61
to
c7e7bcd
Compare
@bvaradar Feel free to merge when you feel this is ready |
[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
With this PR, Hudi Timeline management no longer uses rename to mark state transitions. As renames can be non-atomic in some cloud stores, this PR addresses this issue in a clean way.
Related Changes:
Introduce new metadata layout version to Hudi table properties and use this to determine if renames should be used or not while writing. Any existing table created prior to 0.5.1 will preserve old semantics. Newer tables that are created after 0.5.1 will automatically avoid renames. Hudi Query Engine integration should be able to handle both cases. We expect the deployment to first upgrade query engines before upgrading writer
As the new format enforces write once semantics, there is no longer any need to write compaction and cleaner plan in both places (.hoodie and .hoodie/.aux). Code changes handles this
Commits/DeltaCommits also follow requested -> inflight -> completed state transitions. Rollback for "requested" state (failure during index lookup) is trivial as no side-effects happened.
Commit Archiving handles the case of intermediate state files also being present