New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38187][K8S][TESTS] Add K8S IT for volcano
minResources cpu/memory spec
#35640
Conversation
96d7b5f
to
ce75f5c
Compare
docs/running-on-kubernetes.md
Outdated
@@ -1356,6 +1356,26 @@ See the [configuration page](configuration.html) for information on Spark config | |||
</td> | |||
<td>3.3.0</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.kubernetes.job.minCPU</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if we can reuse some of the Spark configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this's a good question, there were some related use case of Spark with volcano in production can be shared in here:
- Case1 (this PR):
minCPU
= specifiedminCPU
: for users who know their own cluster resources well, this is the basic user case. Especially, when users don't want to setminRes
strictly to which the spark job real needed resource amounts, this also helps to enhance the utilization of cluster in some level when cluster resource is limited. - Case2:
minCPU
= driver.request +executor.number
*executor.request
: for users who don't care much about job resource usage. - Case3:
minCPU
= (driver.request +executor.number
*executor.request
) *factor
, for users want to guarantee the resources of the job in some level, but also want to improve the utilization of the cluster.
So, we might add the basic minRes configuration first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @holdenk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document what is used in case of (none)
, @Yikun .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no value is specified, there will be no lower limit on job-level CPU resources.
will add it
docs/running-on-kubernetes.md
Outdated
<td>3.3.0</td> | ||
</tr> | ||
<tr> | ||
<td><code>spark.kubernetes.job.minMemory</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
ce75f5c
to
361988f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small typo in the name of method checkAnnotat
ion
...netes/core/src/test/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala
Outdated
Show resolved
Hide resolved
...ion-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
Outdated
Show resolved
Hide resolved
Please check the dev mailing list . Ramping down for Apache Spark 3.3 release started, @Yikun . |
361988f
to
86a5f3e
Compare
|
...egration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoSuite.scala
Outdated
Show resolved
Hide resolved
0f42741
to
d8b38a8
Compare
3f6dc20
to
8bf21e8
Compare
|
8bf21e8
to
05f5512
Compare
|
@dongjoon-hyun Done. |
...kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
Outdated
Show resolved
Hide resolved
...kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
Outdated
Show resolved
Hide resolved
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/queue-2u-3g.yml
Outdated
Show resolved
Hide resolved
...ion-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
Outdated
Show resolved
Hide resolved
@@ -299,6 +300,20 @@ private[spark] object Config extends Logging { | |||
.stringConf | |||
.createOptional | |||
|
|||
val KUBERNETES_JOB_MIN_CPU = ConfigBuilder("spark.kubernetes.job.minCPU") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this naming looks inconsistent to me. For the other K8s configuration, we use Core
instead of CPU
. Please double check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this might be a good suggestion for improvement. I also searched in spark, only spark.task.cpus
are using cpu
, and others are using cores
. I guess it (cores
) might be because of the name originally inherited from Yarn.
So, I think cores
is also fine for me, like spark.kubernetes.job.minCores
. if no objection I can change this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @holdenk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments.
90829d5
to
a6fc888
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun I'm not sure there were some misunderstanding, something below might help you understand if something confused: If we are in the same page then:
No more hidden reason, just make sure it can be more generic, then it can be created by kubernetesClient.resourceList(preKubernetesResources: _*).createOrReplace(). So we only return volcano model in here.
if you meaned use this to build volcano-model, client side yaml load could be a alternative way to complete this, but I think we still need to left placeholder in yaml and do format in feature step. This perhaps not flexiable, and a little bit hard to maintain. and also introduce a volcano client module deps. :), feel free to left any question you had, I will try my best to make work better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yikun . Here is my PR for volcano based on my original suggestion to you. I believe the two PodGroup
template configurations are enough for Volcano
because it is a simpler and more extensible and future-proof way instead of introducing many new configurations one by one like this PR.
…mplates ### What changes were proposed in this pull request? This PR aims to support driver/executor `PodGroup` templates like the following. ```yaml apiVersion: scheduling.volcano.sh/v1beta1 kind: PodGroup spec: minMember: 1000 minResources: cpu: "4" memory: "16Gi" priorityClassName: executor-priority queue: executor-queue ``` ### Why are the changes needed? This is a simpler, more extensible and robust way to support Volcano future because we don't need to add new configurations like #35640 for all Volcano features. ### Does this PR introduce _any_ user-facing change? No because this is a new feature. ### How was this patch tested? Pass the CIs. Closes #35776 from dongjoon-hyun/SPARK-38455. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Could you rebase this PR to the master and convert it as a test case addition PR, @Yikun ? |
a6fc888
to
500ea86
Compare
@dongjoon-hyun Done! Thanks! : ) |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @Yikun . Merged to master.
$ build/sbt -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests -Dtest.exclude.tags=minikube -Dspark.kubernetes.test.deployMode=docker-for-desktop 'kubernetes-integration-tests/testOnly -- -z SPARK-38187'
...
[info] KubernetesSuite:
[info] VolcanoSuite:
[info] - SPARK-38187: Run SparkPi Jobs with minCPU (24 seconds, 485 milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minMemory (24 seconds, 665 milliseconds)
[info] Run completed in 1 minute, 12 seconds.
[info] Total number of tests run: 2
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 99 s (01:39), completed Mar 9, 2022 5:26:29 PM
volcano
minResources cpu/memory spec
…emory spec ### What changes were proposed in this pull request? This PR adds two tests to make sure resource reservation supported. - Run SparkPi Jobs with minCPU - Run SparkPi Jobs with minMemory ### Why are the changes needed? Test resource reservation (min Resoruce) with volcano implementations ### Does this PR introduce _any_ user-facing change? No, K8S IT only ### How was this patch tested? - integration test ``` [info] VolcanoSuite: [info] - Run SparkPi with volcano scheduler (12 seconds, 738 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (13 seconds, 294 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (25 seconds, 659 milliseconds) [info] - SPARK-38423: Run SparkPi Jobs with priorityClassName (19 seconds, 310 milliseconds) [info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 467 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minCPU (29 seconds, 546 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minMemory (30 seconds, 473 milliseconds) [info] Run completed in 2 minutes, 30 seconds. [info] Total number of tests run: 7 [info] Suites: completed 2, aborted 0 [info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 236 s (03:56), completed 2022-3-10 9:17:46 ``` Closes apache#35640 from Yikun/SPARK-38187-minRes. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR adds two tests to make sure resource reservation supported.
Why are the changes needed?
Test resource reservation (min Resoruce) with volcano implementations
Does this PR introduce any user-facing change?
No, K8S IT only
How was this patch tested?