Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed #37828

Closed
wants to merge 1 commit into from

Conversation

Yikun
Copy link
Member

@Yikun Yikun commented Sep 8, 2022

What changes were proposed in this pull request?

Do base image real in time build only when infra dockerfile is changed.

Why are the changes needed?

Recently github ghcr is unstable, so we better use infra static image when no infra dockerfile changes.

After this change even github ghcr image write has some issue, only infra dockerfile PR is blocked, other PR don't have any impact.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. CI passed, and use the ghcr.io/apache/spark/apache-spark-github-action-image-cache:master-static.
  2. local test infra changes trigger CI: Test infra change Yikun/spark#170
  3. test ./dev/is-changed.py -m infra-image
    image
# previous hash of dockerfile is changed
$ export APACHE_SPARK_REF=871152bda69a5d5db714eecd42fdd3e7d2e04557
$ ./dev/is-changed.py -m infra-image
true
# hash of dockerfile is changed
$ export APACHE_SPARK_REF=0830575100c1bcbc89d98a6859fe3b8d46ca2e6e 
$ ./dev/is-changed.py -m infra-image
false
# next hash  of dockerfile is changed
$ export APACHE_SPARK_REF=e6c58c1bd6f64ebfb337348fa6132c0b230dc932
$ ./dev/is-changed.py -m infra-image
false

@Yikun
Copy link
Member Author

Yikun commented Sep 8, 2022

cc @HyukjinKwon

@Yikun Yikun changed the title [SPARK-40384][INFRA] Do base image real in time build only when infra dockerfile is changed [SPARK-40384][INFRA] Only do base image real in time build when infra dockerfile is changed Sep 8, 2022
@Yikun
Copy link
Member Author

Yikun commented Sep 13, 2022

@HyukjinKwon Thanks, I will revisit today, and mark it ready for review.

@Yikun Yikun marked this pull request as ready for review September 13, 2022 08:21
@Yikun
Copy link
Member Author

Yikun commented Sep 13, 2022

Ready to go

@@ -87,6 +91,7 @@ jobs:
sparkr=`./dev/is-changed.py -m sparkr`
tpcds=`./dev/is-changed.py -m sql`
docker=`./dev/is-changed.py -m docker-integration-tests`
infra_image=`./dev/is-changed.py -m infra-image`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm .. actually quick question. would that still build the base image when there are other changes together with Dockerfile? Do we return true when root module is detected?

Copy link
Member Author

@Yikun Yikun Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that still build the base image when there are other changes together with Dockerfile?

Yes, module_names contains all changed module's name, test_modules will be ['infra-image'], this will return true if infra-image module is detected:

    elif len(set(test_modules).intersection(module_names)) == 0:
        print("false")
        if opts.fail:
            sys.exit(1)
    else:
        print("true")

BTW, If the fork repository is not synchronized to the latest, if the dockerfile changes are included in the synchronized code, it will also return true. You could see test Yikun#170 , infra-image return true, then build and use the base image in lint, sparkr and pyspark. In this case, if GHCR is unstable, only a rebase needed in fork repo.

Do we return true when root module is detected?

No, we only return true when infra dockerfile changes.

If only root module is detected, will first skip this root check

    # `./dev/is-changed.py -m infra-image` == True only when changing the infra dockerfile
    elif ("root" in test_modules or modules.root in changed_modules) and (
        ["infra-image"] != test_modules
    ):

and then do continue check as above (elif and else) return result according infra-image module is detected. The current PR is an real example to show this case, root dectected (because we change the yaml), but didn't change the dockerfile, infra-image return false, then use the master-static image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could merge this in the morning of some day (such as tmr), I'll keep watching.

@HyukjinKwon
Copy link
Member

Merged to master.

@Yikun
Copy link
Member Author

Yikun commented Sep 13, 2022

@HyukjinKwon
Copy link
Member

Offline discussed with @Yikun. Let's revert it out for now (since this isn't a problem if ghcr isn't broken, and to make our CI implementation simple)

@Yikun
Copy link
Member Author

Yikun commented Sep 13, 2022

@HyukjinKwon Thanks!

Yikun added a commit to Yikun/spark that referenced this pull request Oct 6, 2022
… dockerfile is changed

### What changes were proposed in this pull request?
Do base image real in time build only when infra dockerfile is changed.

### Why are the changes needed?
Recently github ghcr is [unstable](https://github.com/orgs/community/discussions/32184), so we better use infra static image when no infra dockerfile changes.

After this change even github ghcr image write has some issue, only infra dockerfile PR is blocked, other PR don't have any impact.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. CI passed, and [use](https://github.com/Yikun/spark/runs/8242449708?check_suite_focus=true#step:2:19) the `ghcr.io/apache/spark/apache-spark-github-action-image-cache:master-static`.
2. local test infra changes trigger CI: #170
3. test `./dev/is-changed.py -m infra-image`
![image](https://user-images.githubusercontent.com/1736354/189041038-1351a223-749f-4a65-beb8-32fd04116a24.png)

```
# previous hash of dockerfile is changed
$ export APACHE_SPARK_REF=871152bda69a5d5db714eecd42fdd3e7d2e04557
$ ./dev/is-changed.py -m infra-image
true
# hash of dockerfile is changed
$ export APACHE_SPARK_REF=0830575100c1bcbc89d98a6859fe3b8d46ca2e6e
$ ./dev/is-changed.py -m infra-image
false
# next hash  of dockerfile is changed
$ export APACHE_SPARK_REF=e6c58c1bd6f64ebfb337348fa6132c0b230dc932
$ ./dev/is-changed.py -m infra-image
false
```

Closes apache#37828 from Yikun/infra-image-static.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants