-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-29582][PYSPARK] Support TaskContext.get() in a barrier task from Python side
#26239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @cloud-fan @HyukjinKwon, could you help to review this? Thanks a lot. |
Do you have any use cases that people want to reuse their production code and migrate on to barrier execution mode ? |
|
Yeah, we use rdd.mappartitionsWithIndex to startup a distributed dl model training, which is smilier to horovod.spark. Now we want to switch to barrier mode. And we want to reduce the code difference between scala and python. I think this patch is also useful for other people.
在 2019年10月24日,下午9:12,Jiang Xingbo <notifications@github.com<mailto:notifications@github.com>> 写道:
This is useful when people switch from normal code to barrier code
Do you have any use cases that people want to reuse their production code and migrate on to barrier execution mode ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#26239?email_source=notifications&email_token=ADBEWSE5BQPFGSB6HUI72K3QQGNKJA5CNFSM4JEN6GRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECE625I#issuecomment-545910133>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADBEWSDQTLZZPSDEYTRVC5TQQGNKJANCNFSM4JEN6GRA>.
|
|
ok to test |
|
Test build #112617 has finished for PR 26239 at commit
|
|
Test build #112638 has finished for PR 26239 at commit
|
|
Test build #112656 has finished for PR 26239 at commit
|
|
|
||
| # reset task context to None | ||
| TaskContext._setTaskContext(None) | ||
| BarrierTaskContext._setTaskContext(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what happens if it fails with exceptions in the middle of execution in this worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really needed? We always set the global TaskContext and never reset it previouslly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what happens if it fails with exceptions in the middle of execution in this worker?
If exceptions occured, the worker will be closed with sys.exit(-1).
Is it really needed? We always set the global TaskContext and never reset it previouslly.
Previously:
val rdd = ...
val barriered = rdd.barrier().mapPartitions(...)
barriered.mapPartitions(...) # here the BarrierTaskContext still existed.
This code is just a guard program, it shouldn't increase extra overhead or behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is just a guard program, it shouldn't increase extra overhead or behavior change.
I guess that's only when the worker is reused. Can you clarify it with comments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reviewing. updated.
|
Looks making sense to me otherwise |
|
Test build #112818 has finished for PR 26239 at commit
|
TaskContext.get() in a barrier task from Python side
|
Please also update the previous and proposed behaviors in the PR description, like: |
python/pyspark/taskcontext.py
Outdated
| A RuntimeError will raise if it is not in a barrier stage. | ||
| """ | ||
| if not isinstance(cls._taskContext, BarrierTaskContext): | ||
| raise RuntimeError('''It is not in a barrier stage''') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not supported behavior, so please raise Exception instead of RuntimeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think Exception is consistent at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reviewing. updated
|
Test build #112925 has finished for PR 26239 at commit
|
|
Test build #112927 has finished for PR 26239 at commit
|
|
LGTM |
python/pyspark/taskcontext.py
Outdated
| running tasks. | ||
|
|
||
| .. note:: Must be called on the worker, not the driver. Returns None if not initialized. | ||
| A RuntimeError will raise if it is not in a barrier stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A RuntimeError will raise if it is not in a barrier stage. | |
| An exception will raise if it is not in a barrier stage. |
|
Test build #112986 has finished for PR 26239 at commit
|
|
Test build #112989 has finished for PR 26239 at commit
|
|
Merged to master. |
|
thanks all ! |
What changes were proposed in this pull request?
Add support of
TaskContext.get()in a barrier task from Python side, this makes it easier to migrate legacy user code to barrier execution mode.Why are the changes needed?
In Spark Core, there is a
TaskContextobject which is a singleton. We set a task context instance which can be TaskContext or BarrierTaskContext before the task function startup, and unset it to none after the function end. So we can both get TaskContext and BarrierTaskContext with the object. However we can only get the BarrierTaskContext withBarrierTaskContext, we will getNoneif we get it byTaskContext.getin a barrier stage.This is useful when people switch from normal code to barrier code, and only need a little update.
Does this PR introduce any user-facing change?
Yes.
Previously:
Proposed:
How was this patch tested?
New UT tests.