Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1746] Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles #2678

Merged
merged 3 commits into from
Apr 21, 2021

Conversation

jsbali
Copy link
Contributor

@jsbali jsbali commented Mar 15, 2021

Tips

What is the purpose of the pull request

Add support for replace commit in hudi-cli

Brief change log

Currently hudi-cli doesn't support replace commits in the commit show* functions. This adds the foundation for that.
This PR still doesn't support the extraMetadata of the replace commit which will be added in subsequent PR's.

Verify this pull request

This PR is one part of adding replace commit support in hudi-cli.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

/*
Checks whether a commit or replacecommit action exists in the timeline.
* */
private Option<HoodieInstant> getCommitOrReplaceCommitInstant(HoodieTimeline timeline, String instantTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider changing signature to return Option of HoodieCommitMetadata and deserialize instant details inside this method. This would avoid repetition to get instant details in multiple places. You can also do additional validation. for example: for replace commit, deserialize using HoodieReplaceCommitMetadata class

if (!timeline.containsInstant(hoodieInstant)) {
hoodieInstant = new HoodieInstant(false, HoodieTimeline.REPLACE_COMMIT_ACTION, instantTime);
if (!timeline.containsInstant(hoodieInstant)) {
return Option.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also include DELTA COMMIT here?

private static void createReplaceCommitFileWithMetadata(String basePath, String commitTime, Configuration configuration,
String fileId1, String fileId2, Option<Integer> writes,
Option<Integer> updates) throws Exception {
List<String> commitFileNames = Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse replace commit generator from other places? HoodieTestTable for example?


HoodieInstant hoodieInstant = hoodieInstantOptional.get();

HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(),
HoodieCommitMetadata.class);
List<Comparable[]> rows = new ArrayList<>();
for (Map.Entry<String, List<HoodieWriteStat>> entry : meta.getPartitionToWriteStats().entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to compute totalfFilesReplaced and show it in the table. It could be 0 for regular commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I will pick up in next PR along with showing extraMetadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. Its important for debugging to at least show commit action type (commit vs deltacommit vs replacecommit) in the output. If possible, add that information now. If not, please ping me when you have next PR.

@n3nash
Copy link
Contributor

n3nash commented Mar 25, 2021

@jsbali Please file a jira ticket and add it to the heading of this PR

@nsivabalan
Copy link
Contributor

Can you please create a jira and link the same

@nsivabalan nsivabalan added the priority:minor everything else; usability gaps; questions; feature reqs label Mar 30, 2021
@jsbali
Copy link
Contributor Author

jsbali commented Mar 31, 2021

Created the jira https://issues.apache.org/jira/browse/HUDI-1746.

@nsivabalan nsivabalan changed the title Added support for replace commits in commit showpartitions, commit sh… [HUDI-1746] Added support for replace commits in commit showpartitions, commit sh… Apr 1, 2021
@nsivabalan nsivabalan changed the title [HUDI-1746] Added support for replace commits in commit showpartitions, commit sh… [HUDI-1746] Added support for replace commits in commit showpartitions, commit show_write_stats, commit showfiles Apr 1, 2021
@codecov-io
Copy link

Codecov Report

Merging #2678 (3356a28) into master (e8e6708) will increase coverage by 9.91%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2678      +/-   ##
============================================
+ Coverage     51.99%   61.91%   +9.91%     
+ Complexity     3566      334    -3232     
============================================
  Files           465       54     -411     
  Lines         22187     1993   -20194     
  Branches       2360      235    -2125     
============================================
- Hits          11537     1234   -10303     
+ Misses         9649      638    -9011     
+ Partials       1001      121     -880     
Flag Coverage Δ Complexity Δ
hudicli ? ?
hudiclient ? ?
hudicommon ? ?
hudiflink ? ?
hudihadoopmr ? ?
hudisparkdatasource ? ?
hudisync ? ?
huditimelineservice ? ?
hudiutilities 61.91% <ø> (-7.57%) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...ies/exception/HoodieSnapshotExporterException.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
.../apache/hudi/utilities/HoodieSnapshotExporter.java 5.17% <0.00%> (-83.63%) 0.00% <0.00%> (-28.00%)
...hudi/utilities/schema/JdbcbasedSchemaProvider.java 0.00% <0.00%> (-72.23%) 0.00% <0.00%> (-2.00%)
...he/hudi/utilities/transform/AWSDmsTransformer.java 0.00% <0.00%> (-66.67%) 0.00% <0.00%> (-2.00%)
...in/java/org/apache/hudi/utilities/UtilHelpers.java 40.69% <0.00%> (-23.84%) 27.00% <0.00%> (-6.00%)
...ies/sources/helpers/DatePartitionPathSelector.java 54.68% <0.00%> (-1.57%) 13.00% <0.00%> (ø%)
...g/apache/hudi/utilities/schema/SchemaProvider.java 100.00% <0.00%> (ø) 3.00% <0.00%> (+1.00%)
...apache/hudi/utilities/sources/AvroKafkaSource.java 0.00% <0.00%> (ø) 0.00% <0.00%> (ø%)
...s/deltastreamer/HoodieMultiTableDeltaStreamer.java 78.39% <0.00%> (ø) 18.00% <0.00%> (ø%)
.../hadoop/realtime/AbstractRealtimeRecordReader.java
... and 414 more

@jsbali
Copy link
Contributor Author

jsbali commented Apr 14, 2021

@satishkotha I have made the changes as requested. PTAL

@satishkotha
Copy link
Member

@jsbali looks like there are test failures? Can you please fix them? I can review after that.

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you may want to change intellij editor settings. its generally discouraged to use '*' imports.

Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(timeline::containsInstant).findAny());

if (hoodieInstant.isPresent()) {
return Option.of(HoodieCommitMetadata.fromBytes(timeline.getInstantDetails(hoodieInstant.get()).get(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the earlier implementation where we are actually parsing this as HoodieReplaceCommitMetadata for 'REPLACE_COMMIT'. That allows callers to print additional replace specific information.

new HoodieInstant(false, HoodieTimeline.REPLACE_COMMIT_ACTION, instantTime),
new HoodieInstant(false, HoodieTimeline.DELTA_COMMIT_ACTION, instantTime));

Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(timeline::containsInstant).findAny());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: timeline.containsInstant is linear search in all active instants. Number of instants in timeline is expected to be small, so this may not be a big issue. if its not lot of work consider trim timeline to specified instant time using findInstantsBeforeOrEquals().getReverseOrderedInstants().findFirst()

Copy link
Contributor Author

@jsbali jsbali Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so instead of 3*n we do a linear search and then do 3 comparisons. Is this what you meant?
Option<HoodieInstant> instant = Option.fromJavaOptional(timeline.findInstantsBeforeOrEquals(instantTime).getReverseOrderedInstants().findFirst());
if (instant.isPresent()) { Option<HoodieInstant> hoodieInstant = Option.fromJavaOptional(instants.stream().filter(i -> i.equals(instant.get())).findAny()); return hoodieInstant; }


HoodieInstant hoodieInstant = hoodieInstantOptional.get();

HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(),
HoodieCommitMetadata.class);
List<Comparable[]> rows = new ArrayList<>();
for (Map.Entry<String, List<HoodieWriteStat>> entry : meta.getPartitionToWriteStats().entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. Its important for debugging to at least show commit action type (commit vs deltacommit vs replacecommit) in the output. If possible, add that information now. If not, please ping me when you have next PR.

import org.apache.hudi.common.testutils.HoodieTestTable;
import org.apache.hudi.common.util.Option;

import java.util.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid * import

@vinothchandar vinothchandar added this to Ready For Review in PR Tracker Board Apr 15, 2021
@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2021

Codecov Report

Merging #2678 (b654923) into master (e8e6708) will increase coverage by 0.59%.
The diff coverage is 62.50%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2678      +/-   ##
============================================
+ Coverage     51.99%   52.59%   +0.59%     
- Complexity     3566     3711     +145     
============================================
  Files           465      485      +20     
  Lines         22187    23244    +1057     
  Branches       2360     2467     +107     
============================================
+ Hits          11537    12226     +689     
- Misses         9649     9938     +289     
- Partials       1001     1080      +79     
Flag Coverage Δ Complexity Δ
hudicli 40.53% <62.50%> (+3.51%) 218.00 <3.00> (+23.00)
hudiclient ∅ <ø> (∅) 0.00 <ø> (ø)
hudicommon 50.66% <ø> (-0.89%) 1976.00 <ø> (+3.00) ⬇️
hudiflink 56.51% <ø> (+3.02%) 516.00 <ø> (+63.00)
hudihadoopmr 33.33% <ø> (-0.12%) 198.00 <ø> (+1.00) ⬇️
hudisparkdatasource 72.06% <ø> (+2.21%) 237.00 <ø> (+42.00)
hudisync 45.70% <ø> (-3.92%) 131.00 <ø> (+3.00) ⬇️
huditimelineservice 64.36% <ø> (ø) 62.00 <ø> (ø)
hudiutilities 69.79% <ø> (+0.30%) 373.00 <ø> (+10.00)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...a/org/apache/hudi/cli/commands/CommitsCommand.java 54.83% <62.50%> (+1.33%) 18.00 <3.00> (+3.00)
.../java/org/apache/hudi/common/util/CommitUtils.java 40.47% <0.00%> (-31.53%) 6.00% <0.00%> (ø%)
...ache/hudi/common/fs/inline/InMemoryFileSystem.java 79.31% <0.00%> (-10.35%) 15.00% <0.00%> (-1.00%)
...hadoop/realtime/RealtimeCompactedRecordReader.java 64.06% <0.00%> (-8.67%) 13.00% <0.00%> (+1.00%) ⬇️
.../main/scala/org/apache/hudi/HoodieSparkUtils.scala 83.33% <0.00%> (-5.56%) 0.00% <0.00%> (ø%)
...src/main/scala/org/apache/hudi/DefaultSource.scala 78.78% <0.00%> (-5.36%) 31.00% <0.00%> (+14.00%) ⬇️
...main/scala/org/apache/hudi/HoodieWriterUtils.scala 83.33% <0.00%> (-5.24%) 0.00% <0.00%> (ø%)
...di/common/table/timeline/HoodieActiveTimeline.java 66.81% <0.00%> (-3.97%) 43.00% <0.00%> (ø%)
...g/apache/hudi/common/model/WriteOperationType.java 50.00% <0.00%> (-3.13%) 2.00% <0.00%> (ø%)
...n/java/org/apache/hudi/common/model/HoodieKey.java 41.66% <0.00%> (-2.78%) 7.00% <0.00%> (+1.00%) ⬇️
... and 141 more

@vinothchandar vinothchandar moved this from Opened PRs to Ready for Review in PR Tracker Board Apr 19, 2021
@vinothchandar vinothchandar moved this from Ready for Review to Review in progress in PR Tracker Board Apr 19, 2021
@satishkotha satishkotha merged commit 4a34318 into apache:master Apr 21, 2021
PR Tracker Board automation moved this from Review in progress to Done Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:minor everything else; usability gaps; questions; feature reqs
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants