New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed #37828
Conversation
cc @HyukjinKwon |
@HyukjinKwon Thanks, I will revisit today, and mark it ready for review. |
Ready to go |
@@ -87,6 +91,7 @@ jobs: | |||
sparkr=`./dev/is-changed.py -m sparkr` | |||
tpcds=`./dev/is-changed.py -m sql` | |||
docker=`./dev/is-changed.py -m docker-integration-tests` | |||
infra_image=`./dev/is-changed.py -m infra-image` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm .. actually quick question. would that still build the base image when there are other changes together with Dockerfile? Do we return true when root
module is detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that still build the base image when there are other changes together with Dockerfile?
Yes, module_names
contains all changed module's name, test_modules will be ['infra-image'], this will return true if infra-image
module is detected:
elif len(set(test_modules).intersection(module_names)) == 0:
print("false")
if opts.fail:
sys.exit(1)
else:
print("true")
BTW, If the fork repository is not synchronized to the latest, if the dockerfile changes are included in the synchronized code, it will also return true. You could see test Yikun#170 , infra-image return true, then build and use the base image in lint, sparkr and pyspark. In this case, if GHCR is unstable, only a rebase needed in fork repo.
Do we return true when root module is detected?
No, we only return true when infra dockerfile changes.
If only root module is detected, will first skip this root check
# `./dev/is-changed.py -m infra-image` == True only when changing the infra dockerfile
elif ("root" in test_modules or modules.root in changed_modules) and (
["infra-image"] != test_modules
):
and then do continue check as above (elif
and else
) return result according infra-image
module is detected. The current PR is an real example to show this case, root dectected (because we change the yaml), but didn't change the dockerfile, infra-image return false, then use the master-static
image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could merge this in the morning of some day (such as tmr), I'll keep watching.
Merged to master. |
|
Offline discussed with @Yikun. Let's revert it out for now (since this isn't a problem if ghcr isn't broken, and to make our CI implementation simple) |
@HyukjinKwon Thanks! |
… dockerfile is changed ### What changes were proposed in this pull request? Do base image real in time build only when infra dockerfile is changed. ### Why are the changes needed? Recently github ghcr is [unstable](https://github.com/orgs/community/discussions/32184), so we better use infra static image when no infra dockerfile changes. After this change even github ghcr image write has some issue, only infra dockerfile PR is blocked, other PR don't have any impact. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. CI passed, and [use](https://github.com/Yikun/spark/runs/8242449708?check_suite_focus=true#step:2:19) the `ghcr.io/apache/spark/apache-spark-github-action-image-cache:master-static`. 2. local test infra changes trigger CI: #170 3. test `./dev/is-changed.py -m infra-image` ![image](https://user-images.githubusercontent.com/1736354/189041038-1351a223-749f-4a65-beb8-32fd04116a24.png) ``` # previous hash of dockerfile is changed $ export APACHE_SPARK_REF=871152bda69a5d5db714eecd42fdd3e7d2e04557 $ ./dev/is-changed.py -m infra-image true # hash of dockerfile is changed $ export APACHE_SPARK_REF=0830575100c1bcbc89d98a6859fe3b8d46ca2e6e $ ./dev/is-changed.py -m infra-image false # next hash of dockerfile is changed $ export APACHE_SPARK_REF=e6c58c1bd6f64ebfb337348fa6132c0b230dc932 $ ./dev/is-changed.py -m infra-image false ``` Closes apache#37828 from Yikun/infra-image-static. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Do base image real in time build only when infra dockerfile is changed.
Why are the changes needed?
Recently github ghcr is unstable, so we better use infra static image when no infra dockerfile changes.
After this change even github ghcr image write has some issue, only infra dockerfile PR is blocked, other PR don't have any impact.
Does this PR introduce any user-facing change?
No
How was this patch tested?
ghcr.io/apache/spark/apache-spark-github-action-image-cache:master-static
../dev/is-changed.py -m infra-image