Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47401][K8S][DOCS] Update YuniKorn docs with v1.5 #45523

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 14, 2024

What changes were proposed in this pull request?

This PR aims to update YuniKorn docs with v1.5 for Apache Spark 4.0.0.

Why are the changes needed?

Apache YuniKorn v1.5.0 was released on 2024-03-14 with 219 resolved JIRAs.

  • https://yunikorn.apache.org/release-announce/1.5.0
    • Kubernetes version support: v1.24 ~ 1.29
    • Event streaming API
    • Web UI enhancements
    • Improved Prometheus metric grouping
    • Revamped scheduler initialization support
    • Better allocation traceability
    • REST API enhancements

I installed YuniKorn v1.5.0 on K8s 1.29 and tested manually.

K8s v1.29

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1

YuniKorn v1.4

$ helm list -n yunikorn
NAME    	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART         	APP VERSION
yunikorn	yunikorn 	1       	2024-03-14 10:05:25.323637 -0700 PDT	deployed	yunikorn-1.5.0
$ build/sbt -Pkubernetes -Pkubernetes-integration-tests -Dspark.kubernetes.test.deployMode=docker-desktop "kubernetes-integration-tests/testOnly *.YuniKornSuite" -Dtest.exclude.tags=minikube,local,decom,r -Dtest.default.exclude.tags=
...
[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (6 seconds, 893 milliseconds)
[info] - Run SparkPi with no resources (8 seconds, 801 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (8 seconds, 809 milliseconds)
[info] - Run SparkPi with a very long application name. (9 seconds, 779 milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (8 seconds, 855 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (8 seconds, 787 milliseconds)
[info] - Run SparkPi with an argument. (8 seconds, 867 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment variables. (8 seconds, 897 milliseconds)
[info] - All pods have the same service account by default (7 seconds, 776 milliseconds)
[info] - Run extraJVMOptions check on driver (5 seconds, 424 milliseconds)
[info] - SPARK-42474: Run extraJVMOptions JVM GC option check - G1GC (5 seconds, 876 milliseconds)
[info] - SPARK-42474: Run extraJVMOptions JVM GC option check - Other GC (4 seconds, 841 milliseconds)
[info] - SPARK-42769: All executor pods have SPARK_DRIVER_POD_IP env variable (9 seconds, 812 milliseconds)
[info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (10 seconds, 288 milliseconds)
[info] - Run SparkPi with env and mount secrets. (12 seconds, 83 milliseconds)
[info] - Run PySpark on simple pi.py example (9 seconds, 813 milliseconds)
[info] - Run PySpark to test a pyfiles example (9 seconds, 923 milliseconds)
[info] - Run PySpark with memory customization (9 seconds, 811 milliseconds)
[info] - Run in client mode. (4 seconds, 364 milliseconds)
[info] - Start pod creation from template (8 seconds, 817 milliseconds)
[info] - SPARK-38398: Schedule pod creation from template (9 seconds, 839 milliseconds)
[info] - A driver-only Spark job with a tmpfs-backed localDir volume (6 seconds, 121 milliseconds)
[info] - A driver-only Spark job with a tmpfs-backed emptyDir data volume (5 seconds, 839 milliseconds)
[info] - A driver-only Spark job with a disk-backed emptyDir volume (5 seconds, 898 milliseconds)
[info] - A driver-only Spark job with an OnDemand PVC volume (6 seconds, 239 milliseconds)
[info] - A Spark job with tmpfs-backed localDir volumes (9 seconds, 63 milliseconds)
[info] - A Spark job with two executors with OnDemand PVC volumes (8 seconds, 938 milliseconds)
[info] - PVs with local hostpath storage on statefulsets !!! CANCELED !!! (2 milliseconds)
...
[info] - PVs with local hostpath and storageClass on statefulsets !!! IGNORED !!!
[info] - PVs with local storage !!! CANCELED !!! (1 millisecond)
...
[info] Run completed in 6 minutes, 30 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 2, ignored 1, pending 0
[info] All tests passed.
[success] Total time: 396 s (06:36), completed Mar 14, 2024, 10:19:47 AM
$ kubectl describe pod -l spark-role=driver -n  spark-f5906efe18864a22be5397ffa30f2b30
...
Events:
  Type    Reason             Age   From      Message
  ----    ------             ----  ----      -------
  Normal  Scheduling         1s    yunikorn  spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver is queued and waiting for allocation
  Normal  Scheduled          1s    yunikorn  Successfully assigned spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver to node docker-desktop
  Normal  PodBindSuccessful  1s    yunikorn  Pod spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver is successfully bound to node docker-desktop
  Normal  Pulled             1s    kubelet   Container image "docker.io/kubespark/spark:dev" already present on machine
  Normal  Created            1s    kubelet   Created container spark-kubernetes-driver
  Normal  Started            1s    kubelet   Started container spark-kubernetes-driver

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the DOCS label Mar 14, 2024
@dongjoon-hyun
Copy link
Member Author

Could you review this K8s documentation PR, @huaxingao ?

@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn !

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-47401 branch March 14, 2024 17:33
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

This PR aims to update `YuniKorn` docs with v1.5 for Apache Spark 4.0.0.

### Why are the changes needed?

Apache YuniKorn v1.5.0 was released on 2024-03-14 with 219 resolved JIRAs.

- https://yunikorn.apache.org/release-announce/1.5.0
    - Kubernetes version support: v1.24 ~ 1.29
    - Event streaming API
    - Web UI enhancements
    - Improved Prometheus metric grouping
    - Revamped scheduler initialization support
    - Better allocation traceability
    - REST API enhancements

I installed YuniKorn v1.5.0 on K8s 1.29 and tested manually.

**K8s v1.29**
```
$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1
```

**YuniKorn v1.4**
```
$ helm list -n yunikorn
NAME    	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART         	APP VERSION
yunikorn	yunikorn 	1       	2024-03-14 10:05:25.323637 -0700 PDT	deployed	yunikorn-1.5.0
```

```
$ build/sbt -Pkubernetes -Pkubernetes-integration-tests -Dspark.kubernetes.test.deployMode=docker-desktop "kubernetes-integration-tests/testOnly *.YuniKornSuite" -Dtest.exclude.tags=minikube,local,decom,r -Dtest.default.exclude.tags=
...
[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (6 seconds, 893 milliseconds)
[info] - Run SparkPi with no resources (8 seconds, 801 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (8 seconds, 809 milliseconds)
[info] - Run SparkPi with a very long application name. (9 seconds, 779 milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (8 seconds, 855 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (8 seconds, 787 milliseconds)
[info] - Run SparkPi with an argument. (8 seconds, 867 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment variables. (8 seconds, 897 milliseconds)
[info] - All pods have the same service account by default (7 seconds, 776 milliseconds)
[info] - Run extraJVMOptions check on driver (5 seconds, 424 milliseconds)
[info] - SPARK-42474: Run extraJVMOptions JVM GC option check - G1GC (5 seconds, 876 milliseconds)
[info] - SPARK-42474: Run extraJVMOptions JVM GC option check - Other GC (4 seconds, 841 milliseconds)
[info] - SPARK-42769: All executor pods have SPARK_DRIVER_POD_IP env variable (9 seconds, 812 milliseconds)
[info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (10 seconds, 288 milliseconds)
[info] - Run SparkPi with env and mount secrets. (12 seconds, 83 milliseconds)
[info] - Run PySpark on simple pi.py example (9 seconds, 813 milliseconds)
[info] - Run PySpark to test a pyfiles example (9 seconds, 923 milliseconds)
[info] - Run PySpark with memory customization (9 seconds, 811 milliseconds)
[info] - Run in client mode. (4 seconds, 364 milliseconds)
[info] - Start pod creation from template (8 seconds, 817 milliseconds)
[info] - SPARK-38398: Schedule pod creation from template (9 seconds, 839 milliseconds)
[info] - A driver-only Spark job with a tmpfs-backed localDir volume (6 seconds, 121 milliseconds)
[info] - A driver-only Spark job with a tmpfs-backed emptyDir data volume (5 seconds, 839 milliseconds)
[info] - A driver-only Spark job with a disk-backed emptyDir volume (5 seconds, 898 milliseconds)
[info] - A driver-only Spark job with an OnDemand PVC volume (6 seconds, 239 milliseconds)
[info] - A Spark job with tmpfs-backed localDir volumes (9 seconds, 63 milliseconds)
[info] - A Spark job with two executors with OnDemand PVC volumes (8 seconds, 938 milliseconds)
[info] - PVs with local hostpath storage on statefulsets !!! CANCELED !!! (2 milliseconds)
...
[info] - PVs with local hostpath and storageClass on statefulsets !!! IGNORED !!!
[info] - PVs with local storage !!! CANCELED !!! (1 millisecond)
...
[info] Run completed in 6 minutes, 30 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 2, ignored 1, pending 0
[info] All tests passed.
[success] Total time: 396 s (06:36), completed Mar 14, 2024, 10:19:47 AM
```

```
$ kubectl describe pod -l spark-role=driver -n  spark-f5906efe18864a22be5397ffa30f2b30
...
Events:
  Type    Reason             Age   From      Message
  ----    ------             ----  ----      -------
  Normal  Scheduling         1s    yunikorn  spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver is queued and waiting for allocation
  Normal  Scheduled          1s    yunikorn  Successfully assigned spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver to node docker-desktop
  Normal  PodBindSuccessful  1s    yunikorn  Pod spark-f5906efe18864a22be5397ffa30f2b30/spark-test-app-11c19d5f8b914e719d9d5e4333e7fe16-driver is successfully bound to node docker-desktop
  Normal  Pulled             1s    kubelet   Container image "docker.io/kubespark/spark:dev" already present on machine
  Normal  Created            1s    kubelet   Created container spark-kubernetes-driver
  Normal  Started            1s    kubelet   Started container spark-kubernetes-driver
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45523 from dongjoon-hyun/SPARK-47401.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants