[SPARK-17119][Core]allow the history server to delete .inprogress files(configurable) #16293
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The History Server (HS) currently only considers completed applications when deleting event logs from spark.history.fs.logDirectory (since SPARK-6879). This means that over time, .inprogress files (from failed jobs, jobs where the SparkContext is not closed, spark-shell exits etc...) can accumulate and impact the HS.
Instead of having to manually delete these files, this change add a configurable feature to let user decide if the .inprogress files should also be deleted after a period of time:
spark.history.fs.cleaner.deleteInProgress.enabled
spark.history.fs.cleaner.noProgressMaxAge
How was this patch tested?
verified with manual tests
unit tests added in FsHistoryProviderSuite.scala but I am not able to run ./dev/run-tests for the whole project on my laptop, failed on SparkSinkSuite and network related tests uner org.apache.spark.network.* (all due to java.io.IOException: Failed to connect to /<my_laptop_ip>:62343).
[info] SparkSinkSuite:
[info] - Success with ack *** FAILED *** (1 minute)
[info] java.io.IOException: Error connecting to /0.0.0.0:62298
[info] at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
doc
monitoring.md is also updated