New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31438][CORE][SQL] Support JobCleaned Status in SparkListener #28280
Conversation
Change-Id: I44e4d41aa9f8ddf30c29058a92d94656410c021e
Can one of the admins verify this patch? |
// use private val and synchronized to keep thread safe | ||
private val jobCleanedHooks = new HashMap[Int, Int => Unit]() | ||
|
||
def addCleanedHook(jobId: Int, fun: Int => Unit): Unit = synchronized{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronized {
|
||
override def onJobCleaned(jobCleaned: SparkListenerJobCleaned): Unit = { | ||
jobCleanedHooks.get(jobCleaned.jobId) | ||
.foreach{function => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach { function
// Make sure tmp path deleted while getting Exception before sc.runJob | ||
deleteExternalTmpPath(hadoopConf) | ||
throw new SparkException( | ||
s"Failed inserting ubti table ${table.identifier.quotedString}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubti table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, wrong word
* JobCleanedHookListener is a basic job cleaned listener. It holds jobCleanedHooks for | ||
* jobs and run cleaned hook after a job is cleaned. | ||
*/ | ||
class JobCleanedHookListener extends SparkListener with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to add UT for the new class in SparkListenerSuite or somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am doing it.
Change-Id: Ifda55fb05cb3de8500a15e1e14ee61adf982d3ba
Change-Id: Ibf035e0ab30ba37c3522a6d6bd4c23af841fca1d
kindly remind @rdblue @cloud-fan @xuanyuanking. |
Can you give more details about the use cases? There might be better ways to solve it. |
@cloud-fan Thanks for your reply. Actually, this have been discussed in #28129. There are always some temporary directory left behind after Application Finished. This is happened when we run Here is the driver log for this.
|
is it possible that a task knows it fails to commit and clean up the files? |
Em, I have thought about this. |
Can you give an example about the temp directory structure? |
Here is one example for this.
|
how about the task removing parent directory if it's empty? |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Is there any progress on this issue? When I was insert into hive table with speculative enabled, I encountered the same problem. |
What changes were proposed in this pull request?
In Spark, we need do some hook after job cleaned, such as cleaning hive external temporary paths. This has already discussed in GitHub Pull Request #28129.
The JobEnd Status is not suitable for this. As JobEnd is responsible for Job finished, once all result has generated, it should be finished. After finish, Scheduler will leave the still running tasks to be zombie tasks and delete abnormal tasks asynchronously.
Thus, we add JobCleaned Status to enable user to do some hook after all tasks cleaned in the job and add a JobCleanedListener to do some hooks.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
run all tests