Skip to content

[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset#1009

Merged
bvaradar merged 1 commit intoapache:masterfrom
bvaradar:vb_fix_rename
Dec 16, 2019
Merged

[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset#1009
bvaradar merged 1 commit intoapache:masterfrom
bvaradar:vb_fix_rename

Conversation

@bvaradar
Copy link
Copy Markdown
Contributor

@bvaradar bvaradar commented Nov 11, 2019

[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset

With this PR, Hudi Timeline management no longer uses rename to mark state transitions. As renames can be non-atomic in some cloud stores, this PR addresses this issue in a clean way.

Related Changes:

  1. Introduce new metadata layout version to Hudi table properties and use this to determine if renames should be used or not while writing. Any existing table created prior to 0.5.1 will preserve old semantics. Newer tables that are created after 0.5.1 will automatically avoid renames. Hudi Query Engine integration should be able to handle both cases. We expect the deployment to first upgrade query engines before upgrading writer

  2. As the new format enforces write once semantics, there is no longer any need to write compaction and cleaner plan in both places (.hoodie and .hoodie/.aux). Code changes handles this

  3. Commits/DeltaCommits also follow requested -> inflight -> completed state transitions. Rollback for "requested" state (failure during index lookup) is trivial as no side-effects happened.

  4. Commit Archiving handles the case of intermediate state files also being present

@bvaradar bvaradar added the status:in-progress Work in progress label Nov 11, 2019
@bvaradar bvaradar force-pushed the vb_fix_rename branch 3 times, most recently from 882710e to d2b87d7 Compare November 13, 2019 19:02
@bvaradar bvaradar removed the status:in-progress Work in progress label Nov 13, 2019
@bvaradar bvaradar changed the title [WIP] [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset Nov 13, 2019
@bvaradar
Copy link
Copy Markdown
Contributor Author

@vinothchandar @n3nash : Ready for review.

Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieCleanClient.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java Outdated
Comment thread hudi-client/src/test/java/org/apache/hudi/TestClientRollback.java Outdated
Comment thread hudi-client/src/test/java/org/apache/hudi/index/TestHbaseIndex.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@n3nash
Copy link
Copy Markdown
Contributor

n3nash commented Nov 19, 2019

@bvaradar left some comments. In general, I couldn't understand how will existing tables move from VERSION_0 metadata to VERSION_1. Is the new version only supported for new tables ? If yes, what is the plan for the existing tables, if not, what is the migration strategy for existing tables ?

Copy link
Copy Markdown
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level approach looks fine. fact that this did not need boiling the ocean is a testament that our code is in good shape actually :)

But left a bunch of comments. Will do a closer pass of rollback path in that context.

Comment thread hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont this provide both inflight and requested?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the intention. One of the places where we are using is rolling back pending commits

Comment thread hudi-client/src/test/java/org/apache/hudi/index/TestHbaseIndex.java Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have similar logic in timeline class itself? can we consolidate there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored a bit to be used here.

@bvaradar bvaradar force-pushed the vb_fix_rename branch 5 times, most recently from 0535625 to 25ffe70 Compare December 5, 2019 18:27
@bvaradar
Copy link
Copy Markdown
Contributor Author

bvaradar commented Dec 5, 2019

@vinothchandar @n3nash : Redid the migration handling and addressed your comments

High-Level Changes since the previous review.

  1. Hudi WriteConfig can be used to control timeline layout upgrade. Default will be to start avoiding renames in writer as soon as we deploy the new hudi jar on existing datasets. (unit-tests added)
  2. Readers are assumed to upgrade before Writers
  3. Added missing actions to archiving. Now, Archiving logs all states of each action
  4. Changed some code in Cleaner Migration (cc @leesf - please take a look). If this is too painful to review, I will try to create a separate diff.

@leesf leesf self-assigned this Dec 8, 2019
Copy link
Copy Markdown
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly final cosmetic changes.. One clarification : existing tables have to explicitly opt-in for this.. right?
You can merge once you do the final round and push again

Comment thread hudi-cli/src/main/java/org/apache/hudi/cli/commands/DatasetsCommand.java Outdated
Comment thread hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/model/TimelineLayoutVersion.java Outdated
Comment thread hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even writers use HoodieTableMetaClient right? can you clarify this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this is misleading. What I meant was MetaClient Readers ( use-cases which just lists the .hoodie folder) as opposed to MetaClient Writer (performing action transitions in .hoodie folder). Will remove this comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the applyLayoutVersionFilters is set selectively using which HoodieActiveTimeline constructor is invoked? Would this be fragile.. Thinking out loud, applying filters on V0, has no effect since there are nothing to get rid off. Only thing that could do wrong is not filtering V1.. hmmm

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key case here is for archival where we need all instants without filtering. May be introduce couple of factory methods which instantiate HoodieActiveTimeline w/o filtering ? HUDI-414

Comment thread hudi-common/src/main/java/org/apache/hudi/common/table/TimelineLayout.java Outdated
@n3nash n3nash self-requested a review December 13, 2019 02:41
Copy link
Copy Markdown
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this is an opt-in functionality and please remember to squash commits (I tend to forget sometime so just pointing it out :))

@vinothchandar
Copy link
Copy Markdown
Member

@bvaradar Feel free to merge when you feel this is ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants