Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-337] Fixes to ensure MOR incr pull provides consistent results #996

Merged

Conversation

n3nash
Copy link
Contributor

@n3nash n3nash commented Nov 5, 2019

  • Ensure that rollback instant is always created before the next commit instant.This especially affects IncrementalPull for MOR tables since we can end up pulling in
    log blocks for uncommitted data
  • Ensure that generated commit instants are 1 second apart

Consider the following example :

1.deltacommit
2.inflight
3.rollback <- because rollback was initiated after startCommit

Now, for the logReader, we pass the latestInstant allowed to be getCompletedInstants(COMMIT, DELTACOMMIT, ROLLBACK).
Now, since the latestInstant from the call above is "3", the inflight logblocks will be returned leading to invalid values.

@n3nash n3nash force-pushed the incremental_pull_fix_committime_order branch from cbcf293 to 450ff94 Compare November 5, 2019 21:53
String newCommitTime;
do {
newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date());
} while (oldVal == newCommitTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be defensive, oldVal <= newCommitTime ? And sleep() to avoid busy-wait ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid an extra sleep since in most cases we should be 1 second apart. wdyt ?

*/
public static String createNewCommitTime() {
return HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date());
lastInstantTime.updateAndGet((oldVal) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with this change, can we also make commit timestamp millisec granularity ? High level, don't see any backwards compatibility moving to lower granularity as long as all commit timestamps (for all actions) uses same granularity ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm open to doing this since (fyi that might cause regression for people who are listing hoodie directory and building tools on top but that's the side effect of directly accessing files)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : I am fine with revisiting this later. Would you mind opening a jira to decide if we need lower time granularity ?

@n3nash n3nash force-pushed the incremental_pull_fix_committime_order branch from 450ff94 to ab6aac1 Compare November 8, 2019 21:03
@vinothchandar
Copy link
Member

@n3nash do you have a JIRA for this.. can you please link to PR and title?

*/
public static String createNewCommitTime() {
return HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date());
lastInstantTime.updateAndGet((oldVal) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : I am fine with revisiting this later. Would you mind opening a jira to decide if we need lower time granularity ?

@bvaradar
Copy link
Contributor

@n3nash : Once you resolve the conflicts and the tests passes, we can merge the change

@n3nash n3nash changed the title Fixes to ensure MOR incr pull provides consistent results HUDI-337 : Fixes to ensure MOR incr pull provides consistent results Nov 16, 2019
@n3nash
Copy link
Contributor Author

n3nash commented Nov 16, 2019

Here is the Jira for time granularity : https://issues.apache.org/jira/browse/HUDI-338.

@bvaradar ready to be merged

@n3nash n3nash force-pushed the incremental_pull_fix_committime_order branch from ab6aac1 to 4a02a31 Compare November 16, 2019 11:12
@n3nash n3nash changed the title HUDI-337 : Fixes to ensure MOR incr pull provides consistent results [HUDI-337] : Fixes to ensure MOR incr pull provides consistent results Nov 16, 2019
@n3nash n3nash changed the title [HUDI-337] : Fixes to ensure MOR incr pull provides consistent results [HUDI-337] Fixes to ensure MOR incr pull provides consistent results Nov 16, 2019
@n3nash n3nash force-pushed the incremental_pull_fix_committime_order branch from 4a02a31 to 3c773dc Compare November 17, 2019 07:18
…it instant.

  This especially affects IncrementalPull for MOR tables since we can end up pulling in
  log blocks for uncommitted data
- Ensure that generated commit instants are 1 second apart
@n3nash n3nash force-pushed the incremental_pull_fix_committime_order branch from 3c773dc to 1a167d6 Compare November 17, 2019 10:32
@bvaradar bvaradar merged commit f82e589 into apache:master Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants