New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25248] [CORE] Audit barrier Scala APIs for 2.4 #22240
Conversation
override val stageId: Int, | ||
override val stageAttemptNumber: Int, | ||
override val partitionId: Int, | ||
override val taskAttemptId: Long, | ||
override val attemptNumber: Int, | ||
override val taskMemoryManager: TaskMemoryManager, | ||
private[spark] override val taskMemoryManager: TaskMemoryManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exposed by TaskContext
.
@@ -68,7 +74,7 @@ class BarrierTaskContext( | |||
* | |||
* CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all | |||
* possible code branches. Otherwise, you may get the job hanging or a SparkException after | |||
* timeout. Some examples of misuses listed below: | |||
* timeout. Some examples of '''misuses''' listed below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use bold font to make sure users don't misread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: listed
-> are listed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just saw it, will include it if Jenkins fails:)
*/ | ||
@Experimental | ||
@Since("2.4.0") | ||
class BarrierTaskContext private[spark] ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the constructor package private to force users get it from #get()
.
Test build #95276 has finished for PR 22240 at commit
|
@@ -28,4 +28,4 @@ import org.apache.spark.annotation.{Experimental, Since} | |||
*/ | |||
@Experimental | |||
@Since("2.4.0") | |||
class BarrierTaskInfo(val address: String) | |||
class BarrierTaskInfo private[spark] (val address: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hide the constructor since this is not to be constructed by user
*/ | ||
@Experimental | ||
@Since("2.4.0") | ||
class RDDBarrier[T: ClassTag] private[spark] (rdd: RDD[T]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also hide the constructor here
Test build #95280 has finished for PR 22240 at commit
|
Test build #95329 has finished for PR 22240 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #95361 has finished for PR 22240 at commit
|
Test build #95379 has finished for PR 22240 at commit
|
test this please |
Test build #95398 has finished for PR 22240 at commit
|
retest this please |
## What changes were proposed in this pull request? I made one pass over the Python APIs for barrier mode and updated them to match the Scala doc in #22240 . Major changes: * export the public classes * expand the docs * add doc for BarrierTaskInfo.addresss cc: jiangxb1987 Closes #22261 from mengxr/SPARK-25248.1. Authored-by: Xiangrui Meng <meng@databricks.com> Signed-off-by: Xiangrui Meng <meng@databricks.com>
Test build #95418 has finished for PR 22240 at commit
|
@@ -82,31 +82,22 @@ private[spark] abstract class Task[T]( | |||
SparkEnv.get.blockManager.registerTask(taskAttemptId) | |||
// TODO SPARK-24874 Allow create BarrierTaskContext based on partitions, instead of whether | |||
// the stage is barrier. | |||
context = if (isBarrier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context
is still used in the following statements.
object BarrierTaskContext { | ||
/** | ||
* :: Experimental :: | ||
* Return the currently active BarrierTaskContext. This can be called inside of user functions to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Return
-> Returns
?
## What changes were proposed in this pull request? I made one pass over the Python APIs for barrier mode and updated them to match the Scala doc in apache#22240 . Major changes: * export the public classes * expand the docs * add doc for BarrierTaskInfo.addresss cc: jiangxb1987 Closes apache#22261 from mengxr/SPARK-25248.1. Authored-by: Xiangrui Meng <meng@databricks.com> Signed-off-by: Xiangrui Meng <meng@databricks.com>
Test build #95639 has finished for PR 22240 at commit
|
Test build #95640 has finished for PR 22240 at commit
|
Test build #95641 has finished for PR 22240 at commit
|
Test build #95642 has finished for PR 22240 at commit
|
Test build #95648 has finished for PR 22240 at commit
|
retest this please |
Test build #95661 has finished for PR 22240 at commit
|
LGTM |
Merged into master. Thanks for review! |
What changes were proposed in this pull request?
I made one pass over barrier APIs added to Spark 2.4 and updates some scopes and docs. I will update Python docs once Scala doc was reviewed.
One major issue is that
BarrierTaskContext
implementsTaskContextImpl
that exposes some public methods. And internally there were several direct references toTaskContextImpl
methods instead ofTaskContext
. This PR moved some methods fromTaskContextImpl
toTaskContext
, remaining package private, and used delegate methods to avoid inheritingTaskContextImp
and exposing unnecessary APIs.TODOs: