-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19889][SQL] Make TaskContext callbacks thread safe #17244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74323 has finished for PR 17244 at commit
|
|
LGTM |
|
Using |
|
Test build #74327 has finished for PR 17244 at commit
|
| @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( | |||
| // Whether the task has failed. | |||
| @volatile private var failed: Boolean = false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need not be volatile anymore - given that it is updated and queried within a synchronized block.
We could revisit for completed too - though that would be an extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If drop the volatility then we need to make isCompleted synchronized as well; to ensure safe publication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which is why I mentioned it as extension :-)
For failed, it is already valid to remove volatile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @GuardedBy("this") | ||
| override def addTaskCompletionListener(listener: TaskCompletionListener): this.type = { | ||
| onCompleteCallbacks += listener | ||
| synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: method synchronized instead of block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @GuardedBy("this") | ||
| override def addTaskFailureListener(listener: TaskFailureListener): this.type = { | ||
| onFailureCallbacks += listener | ||
| synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: method synchronized instead of block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| if (completed) { | ||
| listener.onTaskCompletion(this) | ||
| } | ||
| // Always add the listener because it is legal to call them multiple times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not realize this, interesting !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was rather surprised about this, but the current code path seems to allow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the doc in TaskContext to reflect this.
|
Test build #74335 has finished for PR 17244 at commit
|
|
Test build #74404 has finished for PR 17244 at commit
|
|
Test build #74405 has finished for PR 17244 at commit
|
sameeragarwal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, LGTM!
| * Adds a (Java friendly) listener to be executed on task completion. | ||
| * This will be called in all situation - success, failure, or cancellation. | ||
| * This will be called in all situation - success, failure, or cancellation. Adding a listener | ||
| * to an already completed task will result in that listeners being called immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro nit: s/listeners/listener here and below
|
|
||
| /** | ||
| * Adds a listener to be executed on task failure. | ||
| * Operations defined here must be idempotent, as `onTaskFailure` can be called multiple times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete this? onTaskFailure can also be called multiple times right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was disabled in #11504. So the comment does not make sense anymore.
| context.addTaskCompletionListener(_ => invocations += 1) | ||
| assert(invocations == 1) | ||
| context.markTaskCompleted() | ||
| assert(invocations == 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call context.markTaskCompleted() once again and assert invocations == 2 to have a test for idempotency?
|
Ok, had a small discussion offline. It seems weird that we have different calling policies for failure and completion listeners. I am going to change the invocation of completion listeners to exactly once as well. |
|
Test build #74466 has finished for PR 17244 at commit
|
|
LGTM |
|
Thanks for the reviews! Merging to master. |
| override def addTaskCompletionListener(listener: TaskCompletionListener) | ||
| : this.type = synchronized { | ||
| if (completed) { | ||
| listener.onTaskCompletion(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also try catch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or call the invokeListeners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we do that, if we are going to rethrow the exception anyway? The only difference is that it would be a TaskCompletionListenerException instead. Calling invokeListeners would also call already invoked listeners, which is what we are trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invokeListeners takes a list of listeners, so we are able to only call this listener.
I think it's better to make these listeners consistent, i.e. throw TaskCompletionListenerException when failure happens during calling listener.
What changes were proposed in this pull request?
It is sometimes useful to use multiple threads in a task to parallelize tasks. These threads might register some completion/failure listeners to clean up when the task completes or fails. We currently cannot register such a callback and be sure that it will get called, because the context might be in the process of invoking its callbacks, when the the callback gets registered.
This PR improves this by making sure that you cannot add a completion/failure listener from a different thread when the context is being marked as completed/failed in another thread. This is done by synchronizing these methods on the task context itself.
Failure listeners were called only once. Completion listeners now follow the same pattern; this lifts the idempotency requirement for completion listeners and makes it easier to implement them. In some cases we can (accidentally) add a completion/failure listener after the fact, these listeners will be called immediately in order make sure we can safely clean-up after a task.
As a result of this change we could make the
failureandcompletedflags non-volatile. TheisCompleted()method now uses synchronization to ensure that updates are visible across threads.How was this patch tested?
Adding tests to
TaskContestSuiteto test adding listeners to a completed/failed context.