-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1746] Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles #2678
[HUDI-1746] Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles #2678
Conversation
…ow_write_stats, commit showfiles
/* | ||
Checks whether a commit or replacecommit action exists in the timeline. | ||
* */ | ||
private Option<HoodieInstant> getCommitOrReplaceCommitInstant(HoodieTimeline timeline, String instantTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider changing signature to return Option of HoodieCommitMetadata and deserialize instant details inside this method. This would avoid repetition to get instant details in multiple places. You can also do additional validation. for example: for replace commit, deserialize using HoodieReplaceCommitMetadata class
if (!timeline.containsInstant(hoodieInstant)) { | ||
hoodieInstant = new HoodieInstant(false, HoodieTimeline.REPLACE_COMMIT_ACTION, instantTime); | ||
if (!timeline.containsInstant(hoodieInstant)) { | ||
return Option.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also include DELTA COMMIT here?
private static void createReplaceCommitFileWithMetadata(String basePath, String commitTime, Configuration configuration, | ||
String fileId1, String fileId2, Option<Integer> writes, | ||
Option<Integer> updates) throws Exception { | ||
List<String> commitFileNames = Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse replace commit generator from other places? HoodieTestTable for example?
|
||
HoodieInstant hoodieInstant = hoodieInstantOptional.get(); | ||
|
||
HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(), | ||
HoodieCommitMetadata.class); | ||
List<Comparable[]> rows = new ArrayList<>(); | ||
for (Map.Entry<String, List<HoodieWriteStat>> entry : meta.getPartitionToWriteStats().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be nice to compute totalfFilesReplaced and show it in the table. It could be 0 for regular commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I will pick up in next PR along with showing extraMetadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. Its important for debugging to at least show commit action type (commit vs deltacommit vs replacecommit) in the output. If possible, add that information now. If not, please ping me when you have next PR.
@jsbali Please file a jira ticket and add it to the heading of this PR |
Can you please create a jira and link the same |
Created the jira https://issues.apache.org/jira/browse/HUDI-1746. |
Codecov Report
@@ Coverage Diff @@
## master #2678 +/- ##
============================================
+ Coverage 51.99% 61.91% +9.91%
+ Complexity 3566 334 -3232
============================================
Files 465 54 -411
Lines 22187 1993 -20194
Branches 2360 235 -2125
============================================
- Hits 11537 1234 -10303
+ Misses 9649 638 -9011
+ Partials 1001 121 -880
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@satishkotha I have made the changes as requested. PTAL |
@jsbali looks like there are test failures? Can you please fix them? I can review after that. |
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you may want to change intellij editor settings. its generally discouraged to use '*' imports.
Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(timeline::containsInstant).findAny()); | ||
|
||
if (hoodieInstant.isPresent()) { | ||
return Option.of(HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(hoodieInstant.get()).get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the earlier implementation where we are actually parsing this as HoodieReplaceCommitMetadata for 'REPLACE_COMMIT'. That allows callers to print additional replace specific information.
new HoodieInstant(false, HoodieTimeline.REPLACE_COMMIT_ACTION, instantTime), | ||
new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, instantTime)); | ||
|
||
Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(timeline::containsInstant).findAny()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: timeline.containsInstant is linear search in all active instants. Number of instants in timeline is expected to be small, so this may not be a big issue. if its not lot of work consider trim timeline to specified instant time using findInstantsBeforeOrEquals().getReverseOrderedInstants().findFirst()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so instead of 3*n we do a linear search and then do 3 comparisons. Is this what you meant?
Option<HoodieInstant> instant = Option.fromJavaOptional(timeline.findInstantsBeforeOrEquals(instantTime).getReverseOrderedInstants().findFirst());
if (instant.isPresent()) { Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(i -> i.equals(instant.get())).findAny()); return hoodieInstant; }
|
||
HoodieInstant hoodieInstant = hoodieInstantOptional.get(); | ||
|
||
HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(), | ||
HoodieCommitMetadata.class); | ||
List<Comparable[]> rows = new ArrayList<>(); | ||
for (Map.Entry<String, List<HoodieWriteStat>> entry : meta.getPartitionToWriteStats().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. Its important for debugging to at least show commit action type (commit vs deltacommit vs replacecommit) in the output. If possible, add that information now. If not, please ping me when you have next PR.
import org.apache.hudi.common.testutils.HoodieTestTable; | ||
import org.apache.hudi.common.util.Option; | ||
|
||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid * import
Codecov Report
@@ Coverage Diff @@
## master #2678 +/- ##
============================================
+ Coverage 51.99% 52.59% +0.59%
- Complexity 3566 3711 +145
============================================
Files 465 485 +20
Lines 22187 23244 +1057
Branches 2360 2467 +107
============================================
+ Hits 11537 12226 +689
- Misses 9649 9938 +289
- Partials 1001 1080 +79
Flags with carried forward coverage won't be shown. Click here to find out more. |
Tips
What is the purpose of the pull request
Add support for replace commit in hudi-cli
Brief change log
Currently hudi-cli doesn't support replace commits in the commit show* functions. This adds the foundation for that.
This PR still doesn't support the extraMetadata of the replace commit which will be added in subsequent PR's.
Verify this pull request
This PR is one part of adding replace commit support in hudi-cli.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.