New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove empty avro files during compaction #2158
Conversation
@htran1 Can you review? |
this.eventSubmitter.submit(CompactionSlaEventHelper.COMPACTION_RECORD_COUNT_EVENT, eventMetadataMap); | ||
} | ||
} | ||
} | ||
|
||
private boolean isFailedPath(Path path, List<TaskCompletionEvent> failedEvents) { | ||
return failedEvents.stream() | ||
.filter(event -> path.toString().contains(event.getTaskAttemptId().toString())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only care about one match, so limit(1) can be added. Also, should put the path separator before and after the attempt id to avoid an incorrect match if the attempt id is a prefix of another attempt id. Like if attempt abc_10 passes, we don't want failed attempt abc_1 to match it.
// remove all invalid empty files due to speculative task execution | ||
List<TaskCompletionEvent> failedEvents = CompactionAvroJobConfigurator.getUnsuccessfulTaskCompletionEvent(job); | ||
List<Path> allFilePaths = DatasetHelper.getApplicableFilePaths(this.tmpFs, this.dataset.outputTmpPath(), Lists.newArrayList("avro")); | ||
List<Path> goodPaths = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code shows up in CompactionCompleteFileOperationAction.java too. Can you make a common method for this?
@@ -352,6 +370,11 @@ public void run() { | |||
} | |||
} | |||
|
|||
private boolean isFailedPath(Path path, List<TaskCompletionEvent> failedEvents) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is also in CompactionCompleteFileOperationAction.java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
Closes apache#2158 from yukuai518/empty
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Remove all the files that was generated by a failed task attempt to avoid zero sized files.
Tests
No unit test changed
Commits