[SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled #37268

ivoson · 2022-07-24T15:32:42Z

What changes were proposed in this pull request?

Currently stage level scheduling works for yarn/k8s/standalone cluster when dynamic allocation is enabled, and spark app will acquire executors with different resource profiles and assign tasks to executors with the same resource profile id.

This PR proposed to add stage level scheduling when dynamic allocation is off. In this case, spark app will only have executors with default resource profiles, but different Stages can still customize their task resource requests which should be compatible with default resource profile executor resources. And all these Stages with different task resource requests will reuse/share the same set of executors with default resource profile.

And this PR proposed to:

Introduces a new special ResourceProfile: TaskResourceProfile, it can be used to describe different task resource requests when dynamic allocation is off. And tasks bind to this TaskResourceProfile will reuse executors with default resource profile.
Exception should be thrown if executors with default resource profile can not fulfill the task resource requests.

class TaskResourceProfile(override val taskResources: Map[String, TaskResourceRequest])
  extends ResourceProfile(
    ResourceProfile.getOrCreateDefaultProfile(SparkEnv.get.conf).executorResources,
    taskResources)

DADScheduler and TaskScheduler will schedule tasks with customized ResourceProfile based on resource profile type and resource profile Id, taskSets with TaskResourceProfile can be scheduled to executors with DEFAULT_RESOURCE_PROFILE_ID and other taskSets can be scheduled to executors with exactly same resource profile id.

Why are the changes needed?

When dynamic allocation is disabled, we can also leverage stage level schedule to customize task resource requests for different stages.

Does this PR introduce any user-facing change?

Spark users can specify TaskResourceProfile to customize task resource requests for different stages when dynamic allocation is off.

How was this patch tested?

New UTs added.

AmplabJenkins · 2022-07-25T00:05:26Z

Can one of the admins verify this patch?

ivoson · 2022-07-25T00:57:09Z

cc @Ngone51 , could you please review this PR? Thanks.

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

ivoson · 2022-07-29T06:20:06Z

cc @Ngone51 , code refactored(add a new TaskResourceProfile), could you please take a look? Thanks.

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

Ngone51 · 2022-08-01T07:31:19Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+  /**
+   * Target executor's resource profile id, used for schedule.
+   */
+  override def targetExecutorRpId: Int = ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID


I think we should override _id instead so that ResourceProfile.getNextProfileId doesn't increase for task resource profile.

I am not sure about this. Since TaskResourceProfile is also a special ResourceProfile and _id is to identify different ResourceProfile.

Ngone51 · 2022-08-01T07:32:14Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

@@ -388,14 +388,16 @@ private[spark] class TaskSchedulerImpl(
      val execId = shuffledOffers(i).executorId
      val host = shuffledOffers(i).host
      val taskSetRpID = taskSet.taskSet.resourceProfileId
+      val prof = sc.resourceProfileManager.resourceProfileFromId(taskSetRpID)
+      val targetExecutorRpID = prof.targetExecutorRpId


I think once we do https://github.com/apache/spark/pull/37268/files#r934220131, we'd no longer need this.

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

Ngone51 · 2022-08-01T07:40:10Z

We should also update ResourceProfileBuilder to provide the API for user to create TaskResourceProfile, e.g.,

ResourceProfileBuilder().taskOnly().require(taskReqs).build()

or we could also extend ResourceProfileBuilder to have TaskResourceProfileBuilder.

tgravescs

Can we please add more of a high level design/overview here?

Why is this only being added for standalone mode?

Also I assume the intent here is to reuse executors since no dynamic allocation - please see #33941

tgravescs · 2022-08-01T14:13:24Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+  /**
+   * Target executor's resource profile id, used for schedule.
+   */
+  def targetExecutorRpId: Int = id


I don't understand what this is and how is it different then the id? This needs more explanation.

id is an identifier for a ResourceProfile, the targetExecutorRpId describes which executors shall TaskScheduler assign tasks. Before this PR, TaskScheduler will do exactly id matching between task rp id and executor rp id, so id should be enough.

But if we want to share/reuse executors, TaskScheduler will need to schedule tasks to different/compatible executors, so task's RP Id and executor's RP Id may be not same in that way. The targetExecutorRpId here try to describe which executors to assign for tasks with ResourceProfile, it could be different with id.

Pretty like compatible resource profile id in concept.

this assume resource profile has to know there is a taskResourceProfile. This also seems to limit you to one executor resource profile that is compatible? Why can't it match on any number of compatible executors. I guess with the current implementation you limit it to dynamic allocation off so that is why you can but I don't want to make API changes based on that limit.

Don't think we have the actual task RP id and executor's RP id for now but they're all resource profile id still.

Should override _id as _id=DEFAULT_RESOURCE_PROFILE_ID in TaskResourceProfile help get rid of this API?

(In this way, the task still limits one executor resource profile. But it doesn't mean dynamic allocation has to be turned off, right? )

If we really want to achieve "match on any number of compatible executors" (which I think is a good direction), I think we need to separate the resource profile id into task resource requirement id and executor resource requirement id completely.

while that sounds like an interesting workaround, the id if public, so if I create a task resource profile that has the same id as another one, that seems odd right? Also I think that would break the UI, or at least make it funny without other changes

What if we pass ResourceProfile.taskResources to TaskSet and pass ResourceProfile.executorResourcestoWorkerOffer` directly? It seems like we no longer depend on resource profile id for scheduling in this way.. cc @ivoson @tgravescs

Thanks @Ngone51
I think that's pretty much the idea of sharing/re-use executors with the policy: sharing any executors which can fulfill task's resource requests. The problem may be that do we want to let users to specify re-use policy.

How about we narrow down the scenario here in this PR, schedule tasks with TaskResourceProfile to executors with DEFAULT_RESOURCE_PROFILE directly when dynamic allocation off(without checking rpId).
Even thought the idea is pretty much like reuse compatible executors, it's much simpler in this case. And we can still leave SPARK-36699 to handle the API change, we don't introduce the API change in this PR.

What do you think? @tgravescs @Ngone51

I'm fine narrowing case here but I don't want to have public api affected. so if you just mean hardcode scheduler or something that might be fine. We still need to make sure if user passes TaskResourceProfile we error out if not this specific case.

Yes, we can limit the use case of TaskResourceProfile. I'll make the change and update the PR soon. Thanks.

Even thought the idea is pretty much like reuse compatible executors, it's much simpler in this case. And we can still leave SPARK-36699 to handle the API change, we don't introduce the API change in this PR.

SGTM! Thanks @ivoson

ivoson · 2022-08-01T15:04:24Z

Can we please add more of a high level design/overview here?

Also I assume the intent here is to reuse executors since no dynamic allocation - please see #33941

Hi @tgravescs , thanks for your feedback. You are right, the idea here is to reuse executors with default resource profile when dynamic allocation is disabled so that we can specify task resource profiles when dynamic allocation is off.

This PR introduces a new special ResourceProfile (TaskResourceProfile) which describe different task resources, but share the same set of executors with default resource profile. Pretty similar to the concept of PR #33941, TaskResourceProfile is kind of special ResourceProfile which is compatible to default resource profile, so that we can support stage level task resource profiles when dynamic allocation is off.

Why is this only being added for standalone mode?
It should be the same for Yarn and K8s cluster also. But I didn't test it in Yarn and K8s cluster right now, so I just enabled for standalone cluster in this PR.

tgravescs · 2022-08-01T15:45:59Z

so I would like to see the issue or this PR description have much more details about design, API, and its behavior.
For instance:

Does this PR introduce any user-facing change?

No

This is not true, this is very much user impacting API.

Also the issue I linked to talked about the policy for reusing executors. What is the proposal for that here?
What happens if the task resource specified doesn't fit in the executor resources?
I'd like to make sure we come to agreement on what this api is and does and user impact of that.
Does this api also apply to reusing executor with dynamic allocation on... as commented above seems limiting.
This is a public API so I want to make sure the changes work in various situations.

Docs will need to update to explain new API and behavior.

ivoson · 2022-08-02T00:52:24Z

so I would like to see the issue or this PR description have much more details about design, API, and its behavior. For instance:
Does this PR introduce any user-facing change?

No 
This is not true, this is very much user impacting API.

Also the issue I linked to talked about the policy for reusing executors. What is the proposal for that here? What happens if the task resource specified doesn't fit in the executor resources? I'd like to make sure we come to agreement on what this api is and does and user impact of that. Does this api also apply to reusing executor with dynamic allocation on... as commented above seems limiting. This is a public API so I want to make sure the changes work in various situations.

Docs will need to update to explain new API and behavior.

Thanks @tgravescs , will update the doc and then we can discuss.

ivoson · 2022-08-02T14:07:33Z

Updated the PR's description with more details, please take a look and share your thoughts on this. cc @tgravescs @Ngone51

Proposed new API changes in the description based on previous comments. Will work on the code change after we come to an agreement. Thanks.

tgravescs · 2022-08-03T13:59:36Z

thanks for adding more details.

 /**
   * Return resource profile Ids of executors where tasks can be assigned to.
   */
  def compatibleExecutorRpIds(rpMgr: ResourceProfileManager): Set[Int]

It seems a little bit odd to ask the ResourceProfile to give you compatible other ResourceProfiles. This feels like it should be in the ResourceProfileManager which knows about all the ResourceProfiles. I guess that is why you pass in the ResourceProfileManager here? Is the intention the user could explicitly set which ResourceProfiles its compatible with? If so I definitely would want a way to not have to specify it.

The other issue raised that wasn't addressed was the reuse policy. I guess in this case we are limiting the executor profile to 1 because we don't have dynamic allocation so one could argue that if you use task resource request with that you know what you get. Which I am fine with but we need to be clear that it might very well waste resources.

Also if the intent is to not support TaskResourceProfile with dynamic allocation, I think we should throw an exception if anyone uses it with the dynamic allocation config on.

ivoson · 2022-08-05T13:28:54Z

 /**
   * Return resource profile Ids of executors where tasks can be assigned to.
   */
  def compatibleExecutorRpIds(rpMgr: ResourceProfileManager): Set[Int]
It seems a little bit odd to ask the ResourceProfile to give you compatible other ResourceProfiles. This feels like it should be in the ResourceProfileManager which knows about all the ResourceProfiles. I guess that is why you pass in the ResourceProfileManager here? Is the intention the user could explicitly set which ResourceProfiles its compatible with? If so I definitely would want a way to not have to specify it.

Yes, exactly. I put the ResourceProfileManager here because it knows about all the ResourceProfiles.
Adding this API just want to make sure that we have one interface to get compatible RP Ids for scheduler to assign tasks. And the implementation can be further enriched in future maybe for re-use executors with dynamic allocation on and adding more reuse policy as #33941 does.
And we'll not let user to specify compatible ResourceProfiles. In this case, we will only have TaskResourceProfile compatible with Default ResourceProfile.

The other issue raised that wasn't addressed was the reuse policy. I guess in this case we are limiting the executor profile to 1 because we don't have dynamic allocation so one could argue that if you use task resource request with that you know what you get. Which I am fine with but we need to be clear that it might very well waste resources.

Also if the intent is to not support TaskResourceProfile with dynamic allocation, I think we should throw an exception if anyone uses it with the dynamic allocation config on.

As mentioned above, in this case we will only have TaskResourceProfile re-use executors with Default ResourceProfile when dynamic allocation is off. The behavior will be limited to dynamic allocation off and will throw an exception if user use it with dynamic allocation on.

Thanks @tgravescs for your feedback. Does this behavior change make sense?

tgravescs · 2022-08-08T13:42:13Z

Yes, exactly. I put the ResourceProfileManager here because it knows about all the ResourceProfiles.
Adding this API just want to make sure that we have one interface to get compatible RP Ids for scheduler to assign tasks. And the implementation can be further enriched in future maybe for re-use executors with dynamic allocation on > and adding more reuse policy as #33941 does.
And we'll not let user to specify compatible ResourceProfiles. In this case, we will only have TaskResourceProfile compatible with Default ResourceProfile.

Yeah so to me I think it makes more sense to put into the ResourceProfileManager API to say find me compatible ones but at the same time now that I said it, we may want the user to be able to specify something via the ResourceProfile. That is where like the reuse policy could be specified for instance, or if user really wanted to limit it to exactly one executor resource profile that might be nice. need to think about it a bit more.

Ngone51 · 2022-08-15T04:46:05Z

core/src/main/scala/org/apache/spark/resource/ResourceProfileBuilder.scala

+   *                              normal [[ResourceProfile]]
+   * @return This ResourceProfileBuilder
+   */
+  def taskOnly(isTaskResourceProfile: Boolean = true): this.type = {


Why do we still need isTaskResourceProfile since the function is already named taskOnly()?

@tgravescs This introduces the new API but doesn't affect the existing ones. Are you good with it?

Why do we still need isTaskResourceProfile since the function is already named taskOnly()?

@Ngone51 Just want to provide a method that user can override the property isTaskResourceProfile to reuse the resource profile builder to create normal ResourceProfile, since there are existing methods to override/cleanup resource requests: clearExecutorResourceRequests

Do you think is it necessary? or any other suggestions?

I don't think it's necessary. To me, the name taskOnly() implies isTaskResourceProfile=true. So setting isTaskResourceProfile=false is very wired.

Thanks. Removed the parameter.

If user hasn't specified executor requirements it can just be task only requirements.

Sounds good to me. And we can make TaskResourceProfile internally for now.

With dynamic allocation enabled, user can still create a ResourceProfile without specifying executor requirements, and request new executors for this rp.

Is the cluster manager able to launch the new executor If there're no executor requirements (e.g., no memory, no CPU) specified? @ivoson

Is the cluster manager able to launch the new executor If there're no executor requirements (e.g., no memory, no CPU) specified? @ivoson

Yes, cluster manager will launch new executors based on default memory/cpu. @Ngone51

In this PR, we add a new TaskResourceProfile, which is limited to some scenarios like dynamic allocation off and it will reuse executors. So that we may want user to explicitly specify it's a TaskResourceProfile to avoid misunderstanding.

Why does this the user have to explicitly set it.. isn't it implicit if they don't specify executor resources and they call build()? I guess you could argue you could perhaps give them a better error message if that isn't really what they intended, but that likely still becomes an issue with dynamic allocation support. I'm fine with TaskResourceProfile being public as a shortcut after building, especially if it makes the schedule code cleaner to check instance type.

The only difference I can see between this and with dynamic allocation is with dynamic allocation if you don't specify the executor resources it defaults to use the initial resources but it gets a new executor based on the current implementation behavior. if we support reuse executors with dynamic allocation that restriction goes away and I would expect it to find any executor that would fit - or like the linked issue we would specify some sort of reuse policy that user could indicate which type of executors to reuse. Is there some other use case you had in mind?

Please don't force push the code, it makes reviewing what changed much harder.

Please don't force push the code, it makes reviewing what changed much harder.

Thanks. Will avoid force push in future.

but that likely still becomes an issue with dynamic allocation support.

Yes, I agree. My previous concern is that we try to limit TaskResourceProfile to dynamic allocation off, exception will be thrown if user build a TaskResourceProfile when dynamic allocation enabled. This will introduce behavior change.
If we don't have the limit, then we need to think about the policy for scheduling tasks with TaskResourceProfile when dynamic allocation is enabled.

cc @Ngone51 @tgravescs What do you think? Shall we support dynamic allocation as well here?

…alone cluster

Co-authored-by: wuyi <yi.wu@databricks.com>

…scala Co-authored-by: wuyi <yi.wu@databricks.com>

ivoson · 2022-09-13T07:58:31Z

Gentle ping @tgravescs @Ngone51 . Could you please help to review this PR when you have time? Thanks.

tgravescs · 2022-09-14T14:16:29Z

sorry , might be next Monday before I get a chance.

tgravescs

Mostly looks good. We need to update the docs like:
https://github.com/apache/spark/blob/master/docs/configuration.md#stage-level-scheduling-overview
It says "the current implementation acquires new executors for each ResourceProfile created"

Also should update https://github.com/apache/spark/blob/master/docs/spark-standalone.md#stage-level-scheduling-overview

ivoson · 2022-09-25T14:53:39Z

Mostly looks good. We need to update the docs like: https://github.com/apache/spark/blob/master/docs/configuration.md#stage-level-scheduling-overview It says "the current implementation acquires new executors for each ResourceProfile created"

Also should update https://github.com/apache/spark/blob/master/docs/spark-standalone.md#stage-level-scheduling-overview

Many thanks @tgravescs .

Last commit updated the docs. Please help to review. Thanks. @tgravescs @Ngone51

Ngone51 · 2022-09-26T05:46:12Z

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala

-    // exception or in a real application. Otherwise in all other testing scenarios we want
-    // to skip throwing the exception so that we can test in other modes to make testing easier.
-    if ((notRunningUnitTests || testExceptionThrown) &&
+    if (rp.isInstanceOf[TaskResourceProfile] && !dynamicEnabled) {


Does it mean TaskResourceProfile can be used when the dynamic is enabled? And in that case, TaskResourceProfile seems to never meet the requirement (i.e. taskRpId == executorRpId) in canBeScheduled() (except for the 1st created TaskResourceProfile) , right?

Hey @Ngone51, thanks for the feedback, and for your concerns:

TaskResourceProfile can be used when dynamic allocation is enabled.

When dynamic allocation is enabled, TaskResourceProfile will be treated as a normal ResourceProfile with no specific executor resource requirements, and dynamic allocation manager will also request executors for TaskResourceProfile, so we'll have executors matching the same resource profile id. The behavior will be the same with what we have in master branch.

yes it can be used with dynamic allocation, in that case it uses the default resource profile executor resources but it must acquire new executors. The TaskResourceProfile gets a unique rpid just like standard resource profile and it should go through the same path to get executors via dynamic allocation like a normal ResourceProfile (ie stage submitted kicks off). Is there something I'm not thinking about here?

...default resource profile executor resources but it must acquire new executors.

So it should be the default resource profile executor resources but not the default rp id? Then, it makes sense to me.

Yes, when dynamic allocation is enabled, it is just like a normal resource profile with a unique id, requesting executors based on default executor resources requirement.

docs/spark-standalone.md

tgravescs · 2022-09-26T13:41:58Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+ * when dynamic allocation is disabled, tasks will be scheduled to executors with default resource
+ * profile based on task resources described by this task resource profile.
+ * And when dynamic allocation is enabled, will require new executors for this profile base on
+ * default build-in executor resources and assign tasks by resource profile id.


should say something like: based on the default executor resources requested at startup and assign tasks only on executors created with this resource profile.

Thanks, done.

tgravescs · 2022-09-26T13:49:58Z

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala

+   * with executorRpId.
+   *
+   * Here are the rules:
+   * 1. Tasks with [[TaskResourceProfile]] can be scheduled to executors with


this makes it sounds like that is the only time TaskResourceProfile used, perhaps put dynamic allocation disabled and tasks with TaskResourceProfile.... or possibly add another rule that has dynamic allocation enabled and TaskResourceProfile...

Make sense. Changing the wording in the latest commit.

tgravescs · 2022-09-26T14:15:22Z

core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala

-    // exception or in a real application. Otherwise in all other testing scenarios we want
-    // to skip throwing the exception so that we can test in other modes to make testing easier.
-    if ((notRunningUnitTests || testExceptionThrown) &&
+    if (rp.isInstanceOf[TaskResourceProfile] && !dynamicEnabled) {


yes it can be used with dynamic allocation, in that case it uses the default resource profile executor resources but it must acquire new executors. The TaskResourceProfile gets a unique rpid just like standard resource profile and it should go through the same path to get executors via dynamic allocation like a normal ResourceProfile (ie stage submitted kicks off). Is there something I'm not thinking about here?

Co-authored-by: Thomas Graves <tgraves@apache.org>

Ngone51 · 2022-09-29T01:49:02Z

@tgravescs do you have any other concerns?

Ngone51 · 2022-09-30T01:05:07Z

Thanks, merged to master!

ivoson · 2022-09-30T01:44:36Z

Thanks @tgravescs and @Ngone51

…n cluster when dynamic allocation disabled ### What changes were proposed in this pull request? This PR is a follow-up of #37268 which supports stage level task resource profile for standalone cluster when dynamic allocation disabled. This PR enables stage-level task resource profile for yarn cluster. ### Why are the changes needed? Users who work on spark ML/DL cases running on Yarn would expect stage-level task resource profile feature. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The current tests of #37268 can also cover this PR since both yarn and standalone cluster share the same TaskSchedulerImpl class which implements this feature. Apart from that, modifying the existing test to cover yarn cluster. Apart from that, I also performed some manual tests which have been updated in the comments. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43030 from wbo4958/yarn-task-resoure-profile. Authored-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

…n cluster when dynamic allocation disabled ### What changes were proposed in this pull request? This PR is a follow-up of #37268 which supports stage level task resource profile for standalone cluster when dynamic allocation disabled. This PR enables stage-level task resource profile for yarn cluster. ### Why are the changes needed? Users who work on spark ML/DL cases running on Yarn would expect stage-level task resource profile feature. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The current tests of #37268 can also cover this PR since both yarn and standalone cluster share the same TaskSchedulerImpl class which implements this feature. Apart from that, modifying the existing test to cover yarn cluster. Apart from that, I also performed some manual tests which have been updated in the comments. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43030 from wbo4958/yarn-task-resoure-profile. Authored-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit 5b80639) Signed-off-by: Thomas Graves <tgraves@apache.org>

… cluster when dynamic allocation disabled ### What changes were proposed in this pull request? This PR is a follow-up of #37268 which supports stage-level task resource profile for standalone cluster when dynamic allocation is disabled. This PR enables stage-level task resource profile for the Kubernetes cluster. ### Why are the changes needed? Users who work on spark ML/DL cases running on Kubernetes would expect stage-level task resource profile feature. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The current tests of #37268 can also cover this PR since both Kubernetes and standalone cluster share the same TaskSchedulerImpl class which implements this feature. Apart from that, modifying the existing test to cover the Kubernetes cluster. Apart from that, I also performed some manual tests which have been updated in the comments. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43323 from wbo4958/k8s-stage-level. Authored-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org> (cherry picked from commit 632eabd) Signed-off-by: Thomas Graves <tgraves@apache.org>

… cluster when dynamic allocation disabled ### What changes were proposed in this pull request? This PR is a follow-up of #37268 which supports stage-level task resource profile for standalone cluster when dynamic allocation is disabled. This PR enables stage-level task resource profile for the Kubernetes cluster. ### Why are the changes needed? Users who work on spark ML/DL cases running on Kubernetes would expect stage-level task resource profile feature. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The current tests of #37268 can also cover this PR since both Kubernetes and standalone cluster share the same TaskSchedulerImpl class which implements this feature. Apart from that, modifying the existing test to cover the Kubernetes cluster. Apart from that, I also performed some manual tests which have been updated in the comments. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43323 from wbo4958/k8s-stage-level. Authored-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Thomas Graves <tgraves@apache.org>

github-actions bot added the CORE label Jul 24, 2022

Ngone51 requested review from tgravescs and jiangxb1987 July 25, 2022 02:54

Ngone51 reviewed Jul 25, 2022

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala Outdated Show resolved Hide resolved

Ngone51 reviewed Aug 1, 2022

View reviewed changes

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala Show resolved Hide resolved

Ngone51 reviewed Aug 1, 2022

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala Outdated Show resolved Hide resolved

tgravescs reviewed Aug 1, 2022

View reviewed changes

ivoson force-pushed the stage-schedule-dynamic-off branch from b97e384 to 835638f Compare August 1, 2022 16:01

ivoson changed the title ~~[SPARK-39853][CORE] Support stage level task resource schedule for standalone cluster when dynamic allocation disabled~~ [SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled Aug 2, 2022

Ngone51 reviewed Aug 15, 2022

View reviewed changes

ivoson and others added 7 commits August 15, 2022 18:13

Add support for resource profile without dynamic allocation for stand…

d528ac3

…alone cluster

Add UTs

0afe9d2

address comments

5e456c8

Update core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

5b2dbb4

Co-authored-by: wuyi <yi.wu@databricks.com>

Update core/src/main/scala/org/apache/spark/resource/ResourceProfile.…

844ef71

…scala Co-authored-by: wuyi <yi.wu@databricks.com>

address comments

87726cf

Address comments

9aae110

ivoson and others added 4 commits September 7, 2022 21:23

Update core/src/main/scala/org/apache/spark/resource/ResourceProfile.…

bc4a606

…scala Co-authored-by: wuyi <yi.wu@databricks.com>

Fixing empty spark env issue

a7cc0c5

Fixing scala 2.13 build failure

db47cd4

Address comments

2fc57c8

tgravescs reviewed Sep 21, 2022

View reviewed changes

Update documentation

31c7693

github-actions bot added the DOCS label Sep 25, 2022

Merge branch 'master' into stage-schedule-dynamic-off

ed65f0d

Ngone51 reviewed Sep 26, 2022

View reviewed changes

tgravescs reviewed Sep 26, 2022

View reviewed changes

docs/spark-standalone.md Outdated Show resolved Hide resolved

tgravescs reviewed Sep 26, 2022

View reviewed changes

Ngone51 approved these changes Sep 26, 2022

View reviewed changes

ivoson and others added 2 commits September 26, 2022 23:46

Update docs/spark-standalone.md

0f559a4

Co-authored-by: Thomas Graves <tgraves@apache.org>

Address comments

21d01ce

tgravescs approved these changes Sep 29, 2022

View reviewed changes

Ngone51 closed this in ab49dc2 Sep 30, 2022

ivoson deleted the stage-schedule-dynamic-off branch September 30, 2022 01:44

Ngone51 mentioned this pull request Jan 6, 2023

[SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile #39410

Closed

wbo4958 mentioned this pull request Sep 21, 2023

[SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled #43030

Closed

wbo4958 mentioned this pull request Oct 11, 2023

[SPARK-45495][core] Support stage level task resource profile for k8s cluster when dynamic allocation disabled #43323

Closed

[SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled #37268

[SPARK-39853][CORE] Support stage level task resource profile for standalone cluster when dynamic allocation disabled #37268

Conversation

ivoson commented Jul 24, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Jul 25, 2022

ivoson commented Jul 25, 2022

ivoson commented Jul 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ngone51 commented Aug 1, 2022

tgravescs left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ngone51 Aug 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ngone51 Aug 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson commented Aug 1, 2022

tgravescs commented Aug 1, 2022

ivoson commented Aug 2, 2022

ivoson commented Aug 2, 2022

tgravescs commented Aug 3, 2022

ivoson commented Aug 5, 2022 • edited

tgravescs commented Aug 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson Aug 15, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson Aug 15, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson Aug 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson Aug 17, 2022 • edited

Choose a reason for hiding this comment

ivoson commented Sep 13, 2022

tgravescs commented Sep 14, 2022

tgravescs left a comment

Choose a reason for hiding this comment

ivoson commented Sep 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoson Sep 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ngone51 commented Sep 29, 2022

Ngone51 commented Sep 30, 2022

ivoson commented Sep 30, 2022

ivoson commented Jul 24, 2022 •

edited

tgravescs left a comment •

edited

Ngone51 Aug 8, 2022 •

edited

Ngone51 Aug 12, 2022 •

edited

ivoson commented Aug 5, 2022 •

edited

ivoson Aug 15, 2022 •

edited

ivoson Aug 15, 2022 •

edited

ivoson Aug 16, 2022 •

edited

ivoson Aug 17, 2022 •

edited

ivoson Sep 26, 2022 •

edited