-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54366][INFRA] Add free_disk_space step to K8s integration test GitHub Action job
#53077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6a081be to
3e71066
Compare
free_disk_space_container step to K8s integration test GitHub Action jobfree_disk_space step to K8s integration test GitHub Action job
…st GitHub Action job
3e71066 to
23fb967
Compare
Member
Author
|
Could you review this INFRA PR too, please, @sarutak ? |
sarutak
approved these changes
Nov 15, 2025
Member
sarutak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I confirmed the way to free up is the same as:
spark/.github/workflows/build_and_test.yml
Lines 343 to 347 in d02a6d4
| - name: Free up disk space | |
| run: | | |
| if [ -f ./dev/free_disk_space ]; then | |
| ./dev/free_disk_space | |
| fi |
Member
Author
|
Thank you again! |
dongjoon-hyun
added a commit
that referenced
this pull request
Nov 15, 2025
…st GitHub Action job ### What changes were proposed in this pull request? This PR aims to add `free_disk_space` step to K8s integration test GitHub Action job. ### Why are the changes needed? The K8s integration test CI is flaky due to `No space left on device` error. - https://github.com/apache/spark/actions/runs/19354883389/job/55448531341 ``` [info] 25/11/14 21:27:02 ERROR TaskSchedulerImpl: Lost executor 4 on 10.244.0.67: Unable to create executor due to /var/data/spark-163899dd-08da-4b76-b71d-c428207a3bdf/spark-1e9d976f-69b3-4274-af03-300cfc4d6fd5/-14621403551763155568738_cache -> ./software.amazon.awssdk_bundle-2.29.52.jar: No space left on device ``` Like the other four GitHub Action jobs, `free_disk_space_container` will mitigate this situation in this job. **BEFORE** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` **AFTER** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` ### Does this PR introduce _any_ user-facing change? No, this is a test infra change. ### How was this patch tested? Pass the CIs and check the log. The following is the log result of this PR. - https://github.com/dongjoon-hyun/spark/actions/runs/19395758483/job/55495933312 **BEFORE CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 54G 18G 76% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` **AFTER CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 21G 52G 29% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53077 from dongjoon-hyun/SPARK-54366. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0311f44) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
added a commit
that referenced
this pull request
Nov 15, 2025
…st GitHub Action job ### What changes were proposed in this pull request? This PR aims to add `free_disk_space` step to K8s integration test GitHub Action job. ### Why are the changes needed? The K8s integration test CI is flaky due to `No space left on device` error. - https://github.com/apache/spark/actions/runs/19354883389/job/55448531341 ``` [info] 25/11/14 21:27:02 ERROR TaskSchedulerImpl: Lost executor 4 on 10.244.0.67: Unable to create executor due to /var/data/spark-163899dd-08da-4b76-b71d-c428207a3bdf/spark-1e9d976f-69b3-4274-af03-300cfc4d6fd5/-14621403551763155568738_cache -> ./software.amazon.awssdk_bundle-2.29.52.jar: No space left on device ``` Like the other four GitHub Action jobs, `free_disk_space_container` will mitigate this situation in this job. **BEFORE** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` **AFTER** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` ### Does this PR introduce _any_ user-facing change? No, this is a test infra change. ### How was this patch tested? Pass the CIs and check the log. The following is the log result of this PR. - https://github.com/dongjoon-hyun/spark/actions/runs/19395758483/job/55495933312 **BEFORE CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 54G 18G 76% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` **AFTER CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 21G 52G 29% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53077 from dongjoon-hyun/SPARK-54366. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0311f44) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
added a commit
that referenced
this pull request
Nov 15, 2025
…st GitHub Action job ### What changes were proposed in this pull request? This PR aims to add `free_disk_space` step to K8s integration test GitHub Action job. ### Why are the changes needed? The K8s integration test CI is flaky due to `No space left on device` error. - https://github.com/apache/spark/actions/runs/19354883389/job/55448531341 ``` [info] 25/11/14 21:27:02 ERROR TaskSchedulerImpl: Lost executor 4 on 10.244.0.67: Unable to create executor due to /var/data/spark-163899dd-08da-4b76-b71d-c428207a3bdf/spark-1e9d976f-69b3-4274-af03-300cfc4d6fd5/-14621403551763155568738_cache -> ./software.amazon.awssdk_bundle-2.29.52.jar: No space left on device ``` Like the other four GitHub Action jobs, `free_disk_space_container` will mitigate this situation in this job. **BEFORE** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` **AFTER** ``` $ git grep 'free_disk_space$' .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/build_and_test.yml: ./dev/free_disk_space .github/workflows/release.yml: ./dev/free_disk_space ``` ### Does this PR introduce _any_ user-facing change? No, this is a test infra change. ### How was this patch tested? Pass the CIs and check the log. The following is the log result of this PR. - https://github.com/dongjoon-hyun/spark/actions/runs/19395758483/job/55495933312 **BEFORE CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 54G 18G 76% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` **AFTER CLEANUP** ``` + df -h Filesystem Size Used Avail Use% Mounted on /dev/root 72G 21G 52G 29% / tmpfs 7.9G 84K 7.9G 1% /dev/shm tmpfs 3.2G 1.1M 3.2G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock /dev/sda16 881M 62M 758M 8% /boot /dev/sda15 105M 6.2M 99M 6% /boot/efi /dev/sdb1 74G 4.1G 66G 6% /mnt tmpfs 1.6G 12K 1.6G 1% /run/user/1001 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53077 from dongjoon-hyun/SPARK-54366. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 0311f44) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Member
Author
|
Merged to master/4.1/4.0/3.5. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR aims to add
free_disk_spacestep to K8s integration test GitHub Action job.Why are the changes needed?
The K8s integration test CI is flaky due to
No space left on deviceerror.Like the other four GitHub Action jobs,
free_disk_space_containerwill mitigate this situation in this job.BEFORE
AFTER
Does this PR introduce any user-facing change?
No, this is a test infra change.
How was this patch tested?
Pass the CIs and check the log. The following is the log result of this PR.
BEFORE CLEANUP
AFTER CLEANUP
Was this patch authored or co-authored using generative AI tooling?
No.