-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPREDUCE-7435. Manifest Committer OOM on abfs #5519
Merged
steveloughran
merged 18 commits into
apache:trunk
from
steveloughran:mapreduce/MAPREDUCE-7435-committer-oom
Jun 9, 2023
Merged
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
17db68e
MAPREDUCE-7435. Manifest Committer OOM on abfs
steveloughran e153791
MAPREDUCE-7435. committer OOM
steveloughran 16422e4
MAPREDUCE-7435. oom: switch to sequence file for storage of the files.
steveloughran f95a926
MAPREDUCE-7435. oom: switch to sequence file for storage of the files.
steveloughran 3046b08
MAPREDUCE-7435. starting to get write/read chain working
steveloughran 3182c97
MAPREDUCE-7435. starting to get write/read chain working
steveloughran f18e9da
MAPREDUCE-7435. Async queue/write working
steveloughran 30d90e0
MAPREDUCE-7435. following chain through to validation
steveloughran ed04b54
MAPREDUCE-7435. Parallel writing test
steveloughran 61a6846
MAPREDUCE-7435. checkstyle, remote iterator work, azure tuning
steveloughran ffb25e7
MAPREDUCE-7435. improve ITestAbfsLoadManifestsStage performance
steveloughran 9874acb
MAPREDUCE-7435. tweak test performance by disabling parallel TA dir c…
steveloughran 7af5ab2
MAPREDUCE-7435. reduce delete overhead on renaming by a HEAD
steveloughran f969993
MAPREDUCE-7435. javadocs and other warnings, *not spotbugs*
steveloughran 8e83fdc
MAPREDUCE-7435. Mehakmeet comments, excluding timeouts.
steveloughran b289707
MAPREDUCE-7435. validation reporting missing files
steveloughran 355fa35
MAPREDUCE-7435. Mehakmeet review
steveloughran 070c788
MAPREDUCE-7435. checkstyle: remove unused imports.
steveloughran File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,7 @@ | |
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import org.apache.commons.lang3.tuple.Pair; | ||
import org.apache.commons.lang3.tuple.Triple; | ||
import org.apache.hadoop.fs.Path; | ||
import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot; | ||
import org.apache.hadoop.fs.statistics.impl.IOStatisticsStore; | ||
|
@@ -35,6 +35,7 @@ | |
|
||
import static java.util.Objects.requireNonNull; | ||
import static org.apache.commons.lang3.StringUtils.isNotBlank; | ||
import static org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterConstants.SUCCESS_MARKER_FILE_LIMIT; | ||
import static org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterStatisticNames.COMMITTER_BYTES_COMMITTED_COUNT; | ||
import static org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterStatisticNames.COMMITTER_FILES_COMMITTED_COUNT; | ||
import static org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterStatisticNames.OP_STAGE_JOB_COMMIT; | ||
|
@@ -84,23 +85,22 @@ protected CommitJobStage.Result executeStage( | |
LoadManifestsStage.Result result = new LoadManifestsStage(stageConfig).apply( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: We can include a duration tracker to know the time taken to load manifests in the final stats. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. already done in AbstractJobOrTaskStage |
||
new LoadManifestsStage.Arguments( | ||
File.createTempFile("manifest", ".list"), | ||
false, /* do not cache manifests */ | ||
/* do not cache manifests */ | ||
stageConfig.getWriterQueueCapacity())); | ||
LoadManifestsStage.SummaryInfo summary = result.getSummary(); | ||
LoadManifestsStage.SummaryInfo loadedManifestSummary = result.getSummary(); | ||
loadedManifestData = result.getLoadedManifestData(); | ||
|
||
LOG.debug("{}: Job Summary {}", getName(), summary); | ||
LOG.debug("{}: Job Summary {}", getName(), loadedManifestSummary); | ||
LOG.info("{}: Committing job with file count: {}; total size {} bytes", | ||
getName(), | ||
summary.getFileCount(), | ||
String.format("%,d", summary.getTotalFileSize())); | ||
loadedManifestSummary.getFileCount(), | ||
String.format("%,d", loadedManifestSummary.getTotalFileSize())); | ||
addHeapInformation(heapInfo, OP_STAGE_JOB_LOAD_MANIFESTS); | ||
|
||
|
||
// add in the manifest statistics to our local IOStatistics for | ||
// reporting. | ||
IOStatisticsStore iostats = getIOStatistics(); | ||
iostats.aggregate(summary.getIOStatistics()); | ||
iostats.aggregate(loadedManifestSummary.getIOStatistics()); | ||
|
||
// prepare destination directories. | ||
final CreateOutputDirectoriesStage.Result dirStageResults = | ||
|
@@ -113,7 +113,9 @@ protected CommitJobStage.Result executeStage( | |
// and hence all aggregate stats from the tasks. | ||
ManifestSuccessData successData; | ||
successData = new RenameFilesStage(stageConfig).apply( | ||
Pair.of(loadedManifestData, dirStageResults.getCreatedDirectories())); | ||
Triple.of(loadedManifestData, | ||
dirStageResults.getCreatedDirectories(), | ||
stageConfig.getSuccessMarkerFileLimit())); | ||
if (LOG.isDebugEnabled()) { | ||
LOG.debug("{}: _SUCCESS file summary {}", getName(), successData.toJson()); | ||
} | ||
|
@@ -124,10 +126,10 @@ protected CommitJobStage.Result executeStage( | |
// aggregating tasks. | ||
iostats.setCounter( | ||
COMMITTER_FILES_COMMITTED_COUNT, | ||
summary.getFileCount()); | ||
loadedManifestSummary.getFileCount()); | ||
iostats.setCounter( | ||
COMMITTER_BYTES_COMMITTED_COUNT, | ||
summary.getTotalFileSize()); | ||
loadedManifestSummary.getTotalFileSize()); | ||
successData.snapshotIOStatistics(iostats); | ||
successData.getIOStatistics().aggregate(heapInfo); | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I missed these constants being added, don't you think these should be configurable, just for some kind of fallback sakes, so that these values never cause any issues and are easily changeable? I guess if it waits for this long then, we can assume it's just hanging as well. Your call on it being configurable or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my view if things are this bad it is a disaster and the job is failing as either the thread concurrency is broken or the local fs has failed.