Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46745][INFRA] Purge pip cache in dockerfile #44768

Closed
wants to merge 3 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jan 17, 2024

What changes were proposed in this pull request?

Purge pip cache in dockerfile

Why are the changes needed?

to save 4~5G disk space:

before

https://github.com/zhengruifeng/spark/actions/runs/7541725028/job/20530432798

#45 [39/39] RUN df -h
#45 0.090 Filesystem      Size  Used Avail Use% Mounted on
#45 0.090 overlay          84G   70G   15G  83% /
#45 0.090 tmpfs            64M     0   64M   0% /dev
#45 0.090 shm              64M     0   64M   0% /dev/shm
#45 0.090 /dev/root        84G   70G   15G  83% /etc/resolv.conf
#45 0.090 tmpfs           7.9G     0  7.9G   0% /proc/acpi
#45 0.090 tmpfs           7.9G     0  7.9G   0% /sys/firmware
#45 0.090 tmpfs           7.9G     0  7.9G   0% /proc/scsi
#45 DONE 2.0s

after

https://github.com/zhengruifeng/spark/actions/runs/7549204209/job/20552796796

#48 [42/43] RUN python3.12 -m pip cache purge
#48 0.670 Files removed: 392
#48 DONE 0.7s

#49 [43/43] RUN df -h
#49 0.075 Filesystem      Size  Used Avail Use% Mounted on
#49 0.075 overlay          84G   65G   19G  79% /
#49 0.075 tmpfs            64M     0   64M   0% /dev
#49 0.075 shm              64M     0   64M   0% /dev/shm
#49 0.075 /dev/root        84G   65G   19G  79% /etc/resolv.conf
#49 0.075 tmpfs           7.9G     0  7.9G   0% /proc/acpi
#49 0.075 tmpfs           7.9G     0  7.9G   0% /sys/firmware
#49 0.075 tmpfs           7.9G     0  7.9G   0% /proc/scsi

Does this PR introduce any user-facing change?

no, infra-only

How was this patch tested?

ci

Was this patch authored or co-authored using generative AI tooling?

no

@@ -417,10 +417,6 @@ jobs:
- name: Free up disk space
shell: 'script -q -e -c "bash {0}"'
run: |
if [[ "$MODULES_TO_TEST" != *"pyspark-ml"* ]] && [[ "$BRANCH" != "branch-3.5" ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here doesn't make much sense any more, since we install those libraries for each python version

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will reduce the final status of filesystems.

At the same time, this increases the number of layers, doesn't it?

I'm curious about the download size, @zhengruifeng . Could you check the result of

$ docker images

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jan 18, 2024

Yes, this will reduce the final status of filesystems.

At the same time, this increases the number of layers, doesn't it?

I'm curious about the download size, @zhengruifeng . Could you check the result of

$ docker images

The size of current PR is 11.1GB

ruifeng.zheng@xxx:~/spark$ docker images | grep cleanup
test_cleanup       0.1              97b0f1ca0bb6   43 seconds ago   11.1GB
ruifeng.zheng@xxx:~/spark$

probably we can combine those pip commands to control the number of layers

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Yes, we can do that later separately.

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @zhengruifeng and @HyukjinKwon .

@zhengruifeng zhengruifeng deleted the infra_docker_cleanup branch January 18, 2024 04:12
@zhengruifeng
Copy link
Contributor Author

thanks @dongjoon-hyun and @HyukjinKwon for reviews!

yaooqinn pushed a commit that referenced this pull request Jan 18, 2024
### What changes were proposed in this pull request?
Reduce the number of layers of testing dockerfile

### Why are the changes needed?
to address #44768 (review)

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #44781 from zhengruifeng/infra_docker_layers.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants