-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26867][YARN] Spark Support of YARN Placement Constraint #32804
Conversation
Test build #139410 has finished for PR 32804 at commit
|
Test build #139414 has finished for PR 32804 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status failure |
+CC @otterc, @venkata91 |
Test build #139445 has finished for PR 32804 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
} | ||
|
||
val expression = (nodeAttributes.isDefined, locality.isDefined) match { | ||
case (true, true) => Some(or(and(nodeAttributes.get, locality.get), nodeAttributes.get)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should use below constraint is better.
delayedOr(
timedClockConstraint(and(nodeAttributes, locality), delayedOrIntervalMilliseconds, TimeUnit.MILLISECONDS),
timedClockConstraint(nodeAttributes, delayedOrIntervalMilliseconds * 2, TimeUnit.MILLISECONDS)))
But it always stuck when test in yarn-3.3.0 cluster. Hope for help at this part. It's hard to balance node attribute constraint and locality requirement.
Test build #139553 has finished for PR 32804 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
thanks for working on this, it looks very interesting
Can you add more description about this? This seems like a lot of changes and not what I expected from the description. I was expecting us just to pass the node attributes along to yarn but this is much more then that so please describe in detail how this is working. How does this work exactly with constraint vs data locality? Is there some sort of timed wait or error handling if it never becomes available or cluster doesn't have node with that attribute? You had like 4 examples of things you could do, but are there more, what dependencies does it need. How does an attribute allow you to get cardinality or affinity? is there any constraints on Hadoop version? (I thought it was just introduced in 3.2.0) eventually the documentation .md file would need to be updated. |
@@ -27,6 +27,24 @@ import org.apache.spark.network.util.ByteUnit | |||
package object config extends Logging { | |||
|
|||
/* Common app configuration. */ | |||
private[spark] val SCHEDULING_REQUEST_ENABLED = | |||
ConfigBuilder("spark.yarn.schedulingRequestEnabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to come up with better name and more description. From just reading this, I have no idea what it does. spark always requests containers from yarn to schedule on, so why is it off...(ie that is what I read from the config name)
|
||
|
||
/** | ||
* This strategy is calculating the optimal locality preferences of YARN containers by considering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a direct copy of the ContainerLocalityPreferences javadoc, I'm assuming this is different so please update description, is I missed the difference, I think we need to pull it up to the top to be clear how its different
hostToLocalTaskCount: Map[String, Int], | ||
allocatedHostToContainersMap: HashMap[String, Set[ContainerId]], | ||
localityMatchedPendingAllocations: Seq[SchedulingRequest], | ||
schedulingRequestToNodes: mutable.HashMap[Long, Array[String]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not documented
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Thanks for your effort. Any update on this? @AngersZhuuuu |
What changes were proposed in this pull request?
Support YARN Placement Constraint, In this pr:
LocalityPreferredSchedulingRequestContainerPlacementStrategy
for compute locality for SchedulingRequestContainerImpl
for UT, since the interface Container's common API can't supportsetAllocationRequestId
andgetAllocationRequestId
YarnSchedulingRequestAllocator
as the implement of allocator when use Placement Constraint.Why are the changes needed?
Spark can allow users to configure the Placement Constraint so that users will have more control on where the executors will get placed. For example:
Does this PR introduce any user-facing change?
Use can set
spark.yarn.schedulingRequestEnabled
to use YARN placement constraint and setspark.yarn.executor.nodeAttributes
to set constraint.How was this patch tested?
Added UT and manuel tested this in yarn cluser:
set node attribute
Under dynamic allocation, can work as expect, all allocated container is under NM
ip-xxx
,if we set node attribute as
attr1=2
, we can't allocate any executor container.