New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-337] Fixes to ensure MOR incr pull provides consistent results #996
[HUDI-337] Fixes to ensure MOR incr pull provides consistent results #996
Conversation
cbcf293
to
450ff94
Compare
String newCommitTime; | ||
do { | ||
newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date()); | ||
} while (oldVal == newCommitTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be defensive, oldVal <= newCommitTime ? And sleep() to avoid busy-wait ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid an extra sleep since in most cases we should be 1 second apart. wdyt ?
*/ | ||
public static String createNewCommitTime() { | ||
return HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date()); | ||
lastInstantTime.updateAndGet((oldVal) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with this change, can we also make commit timestamp millisec granularity ? High level, don't see any backwards compatibility moving to lower granularity as long as all commit timestamps (for all actions) uses same granularity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm open to doing this since (fyi that might cause regression for people who are listing hoodie directory and building tools on top but that's the side effect of directly accessing files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : I am fine with revisiting this later. Would you mind opening a jira to decide if we need lower time granularity ?
450ff94
to
ab6aac1
Compare
@n3nash do you have a JIRA for this.. can you please link to PR and title? |
*/ | ||
public static String createNewCommitTime() { | ||
return HoodieActiveTimeline.COMMIT_FORMATTER.format(new Date()); | ||
lastInstantTime.updateAndGet((oldVal) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : I am fine with revisiting this later. Would you mind opening a jira to decide if we need lower time granularity ?
@n3nash : Once you resolve the conflicts and the tests passes, we can merge the change |
Here is the Jira for time granularity : https://issues.apache.org/jira/browse/HUDI-338. @bvaradar ready to be merged |
ab6aac1
to
4a02a31
Compare
4a02a31
to
3c773dc
Compare
…it instant. This especially affects IncrementalPull for MOR tables since we can end up pulling in log blocks for uncommitted data - Ensure that generated commit instants are 1 second apart
3c773dc
to
1a167d6
Compare
log blocks for uncommitted data
Consider the following example :
1.deltacommit
2.inflight
3.rollback <- because rollback was initiated after startCommit
Now, for the logReader, we pass the latestInstant allowed to be getCompletedInstants(COMMIT, DELTACOMMIT, ROLLBACK).
Now, since the latestInstant from the call above is "3", the inflight logblocks will be returned leading to invalid values.