ci: improve GitHub Action CI with re-usable workflows #2753

georglauterbach · 2022-08-29T14:34:12Z

Description

Mew re-usable workflows are introduced to handle building, testing and publishing the container
image in a uniform and easy way. Now, the scheduled_builds, default_on_push
and a part of the test_merge_requests workflow can use the same code
for building, testing and publishing the container images. This is DRY.

Follow-up for: #2742

Type of change

Improvement (non-breaking change that does improve existing functionality)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (README.md or the documentation under docs/)
If necessary I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

A new re-usable workflow was introduced to handle building the container image in a generic way. Now, the `scheduled_builds`, `default_on_push` and a part of the `test_merge_requests` workflow can use the same code for building and publishing the container images. This is DRY. The `test_merge_requests` workflow was adjusted in the following way: building the ARM image is now done _after_ the initial build to run in parallel with BATS. This saves around 13 minutes of CI time. The new ARM build uses the generic build workflow. The other two jobs do not use the generic workflow yet as I was not sure whether the generic workflow fits our needs in this case.

georglauterbach · 2022-08-29T14:36:45Z

@polarathene I am not 100% sure how to properly test whether there aren't any errors in the new workflows. Do you have an idea?

.github/workflows/generic_build.yml

polarathene

Thanks for tackling this! 💪

I'm pressed for time, apologies for the verbose review - no time to revise it or provide a decent summary right now :(

Resolved

I believe I spotted some mistakes with workflow steps in the wrong places, and I'd like to see any conditional caching handled better if it's still deemed necessary ~~(be sure to avoid caching separate ARM builds with the same cache key, without type=gha this might warrant opting out of cache)~~ (EDIT: If AMD64 is always built with, or earlier for the same cache-key, it shouldn't be an issue).
See the scheduled_builds.yml feedback for a potential concern with cache (still affects previous approach AFAIK), additional insights for cache handling is covered over several comments for generic_build.yml.
Due to the nature of this PR, I'd like to request that unrelated stylistic changes (quotes, minor revisions to values or field order) that aren't required for the functionality changes - I'd appreciate it if you'd defer those to a separate PR (annoying I know 😞 ). They add noise to the review, which I usually don't mind but this change should remain focused 😅 (ok with a force-push to rewrite history if that's easier)

.github/workflows/test_merge_requests.yml

.github/workflows/default_on_push.yml

.github/workflows/generic_build.yml

polarathene · 2022-08-29T23:59:45Z

.github/workflows/scheduled_builds.yml


 on:
  schedule:
-    - cron: "0 0 * * 5"
+    - cron: 0 0 * * 5


Just to clarify, this schedule is for building at midnight on fridays each week right?

The purpose of this scheduled build was presumably for keeping ClamAV up to date? Maybe as a way of checking that regular builds continue to work if there hasn't been any master branch updates in a while too?

The cache storage AFAIK has a retention period of 7 days for a cached entry (although I'm not sure if that's extended for an entry each time it's retrieved). But this concern probably applies to previous cache approach too.

I think when the cache is used, this will prevent the early package install portion, and other layers like ClamAV updating it's DB from changing.

Cached layers during build is only invalidated when the build context changes. For the same commit on master since default_on_push.yml would have run earlier, that build context would not have changed.

Once the build cache has expired and a fresh build occurs, then this will work. However, now that we're properly caching layers (no more early ARG per commit or branch ref change invalidating the whole build cache), the re-use of existing cache (via restore-keys) while useful, may also prolong this cached layer effect if you expect non-deterministic builds from newer updates to packages or ClamAV implicitly for image builds at different points in time.

If that is a concern that needs to be addressed (perhaps more important for ClamAV?), then I believe we can invalidate specific Dockerfile stages when required for generic_build.yml as an input?

docker/build-push-action or underlying buildx supports this AFAIK, otherwise you can build stages separately with multiple steps where necessary to cache the stages separately (be mindful of cache-to modes min/max in that case). I know there are some examples of such in that actions issues / docs somewhere.

Managing that multi-stage / multi-step approach is probably not pleasant with type=local due to the cache-from + cache-to shuffling workaround and possibly needing multiple action/cache steps.. type=gha makes all that go away.

Alternatively, since this is a background cron task, it could instead just have the build ignore cache. I think the action supports customizing the buildxcommand instead of having to use conditionals around our cache support, using an input that adds --no-cache or similar to the build command should suffice.

That won't update the cache for an existing cache-key however, so the update for edge would regress by default_on_push.yml - Unless likewise ignoring cache (entirely or selectively).

One area of interest is the buildx feature of COPY --link that you've handled in another PR. I don't recall how that changes / improves the cache situation, but I remember it's meant to avoid rippling invalidation of layers unnecessarily. It's also possible to use it with an image (like official ClamAV) and excludes the COPY from living inside our published image build, instead performing a pull of the image and copying over content from it during the docker pull event that pieces together the image on the Docker host for it to be usable.

AFAIK, that would not require this cron approach to keep such updates, as the user will get the latest at the time they pull our image, instead of when we last built and pushed.

However... I don't think our users would appreciate requiring a runtime released since March 2022 with buildx configured. Requiring that to build the Dockerfile is one thing, but for the broader usage of the published image is probably a bit too soon...?

It may just be better to have Dockerfile get database updates from the official ClamAV image. I'm not sure if that requires pinning the FROM version for ClamAV, or if it will always check the remote instead of using cache (I haven't looked at the GHA build logs in a while, but maybe the FROM instruction for Debian image is always pulled instead of cached?)

The original intend was to provide up-to-date package versions to users using :edge, which includes updated to ClamAV I guess, but because we have a cron job inside the container that runs freshclam daily, I think this is not a concern.

The COPY --link .. is best suited for another PR :) We may be able to get rid of this workflow if we really want to, but right now, I see no harm. The cache problem might however be a real concern. I you're right and I understood you correctly, we do not get package updates because we will now use a cache, which "skips" package installation. We should solve this somehow, otherwise this workflow really is useless.

.github/workflows/scheduled_builds.yml

.github/workflows/test_merge_requests.yml

This way, Ci is more flexible and easier to maintain. We now have dedicated workflows for building, testing and publishing. (Fingers crossed cache works as expected..)

I tested the current workflows in a private fork. Works like a charm. The corresponding PR (#2753) contains all the information about the CI adjustments in summary.

polarathene

Partial review pass. You've committed since I started the review, some some of the 14 items may be outdated 😅

.github/workflows/generic_build.yml

.github/workflows/test_merge_requests.yml

.github/workflows/generic_publish.yml

.github/workflows/default_on_push.yml

Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>

The new solution is much cleaner. The new `platforms` input has a default that builds for AMD64, and one can optionally build ARM images (completely independent from AMD64 even).

polarathene

Fantastic! 🎉

Almost there:

A few revisions to comments I'll apply.
One input I'll leave up for discussion to remove.
A concern with docker/metadata-action (partly related to scheduled_builds.yml), but I believe that's not going to be an issue.

I think that last point is what you were asking about with?:

One things left: are tags now addresses correctly

If so, looks like it'll be fine. Expand the detailed view on that feedback comment for my review on that same concern.

how would you handle extra platforms?

I'd probably just go with a bool toggle as an input for using with an expression on platforms that better documents we only support one or the other (at least once we drop ARMv7). Your updated approach is good too 👍

You can still add linux/amd64 for the ARM build if you like, it'd just use the cache that should be available. But it's fine as-is, so long as the dependency on the AMD64 build job is present, I don't see anywhere that we'd not have a cache with AMD64 available.

Cache key scoping

GHA caching docs have a section on restricted cache access:

A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually main).

test_merge_requests.yml will produce the cache key for AMD64, but that cache key would only have a hit for PRs (or local branches) when they trigger the workflow and create the cache key if it doesn't exist on master / base branch.
default_on_push.yml will do the same for master branch or when semver tags are pushed.
- The cache key from master commits might be the same as the PR, but possibly duplicate the cache storage used anyway due to scope change.
- PRs (forked or local branches) will then have access to the same cache key.
- An example mentions that two different tags would not be able to share cache from each other (but presumably can from the default branch).
- It is not made clear if existing PRs would be able to leverage a newly added cache key on master, or only new PRs since. Likewise with master branch and tagging. I would think the scope may be based on having common commit history (especially based on the feature-b -> feature-a -> main branch example) Explained at the bottom of the cache docs.
- If tagging a release after pushing the release PR too quickly, then both may attempt to upload cache for the same key, but that shouldn't matter.
scheduled_builds.yml will take the longest as it builds all platforms at once (midnight on fridays each week). It would rarely create a cache key, unless an existing key had since expired or been evicted.

.github/workflows/generic_publish.yml

.github/workflows/generic_test.yml

polarathene · 2022-08-31T21:30:02Z

.github/workflows/generic_publish.yml

+      - name: 'Prepare tags'
+        id: prep
+        uses: docker/metadata-action@v4.0
+        with:
+          images: |
+            ${{ secrets.DOCKER_REPOSITORY }}
+            ${{ secrets.GHCR_REPOSITORY }}
+          tags: |
+            type=edge,branch=master
+            type=semver,pattern={{major}}
+            type=semver,pattern={{major}}.{{minor}}
+            type=semver,pattern={{major}}.{{minor}}.{{patch}}


Concerns (Resolved)

default_on_push.yml uses this docker/metadata-action config, but you've omitted the flavor input it had?:

flavor: | latest=auto

Just looked that up, this is an implicit default. auto is valid for type=semver.

type=semver applies to tag events (~~I assume that detection is carried over into this re-usable workflow, but we might have to verify that~~ seems so, according to docs).

type=edge is used otherwise for associating the edge image tag to latest master branch commit.

I saw this comment on the docker/metadata-action repo, where type=edge was apparently being treated as type=ref,event=branch if the workflow was run on a branch that was not meeting the specified branch. This was publishing branch names as image tags.

UPDATE: I cannot see in source why that would happen. And it looks like the action properly handles no valid tags to apply? Either this was fixed since, or there's a configuration issue elsewhere I think and we don't have to worry about it.

Also note that scheduled_builds.yml differed here. No flavor (which is fine since it's implicit) and tags differed with:

tags: | type=raw,value=edge

type=raw seems to just associate the edge tag to the image built, no constraint to being the latest commit of workflow triggered/specified branch?

I'm not sure when type=raw,value=edge would be preferred over type=edge in this case. But we can use expressions to better constrain when a custom tag is applied.

Given the above, if I understand things correctly..

scheduled_builds.yml will trigger type=edge,branch=master, updating the edge image tag.

scheduled_builds.yml is not restricted to the default branch (master), as the workflow history shows it running from your active ci/generic-build branch. If this wasn't failing, I think we'd have published a ci/generic-build image tag?

default_on_push.yml will handle the following workflow events.

Tag pushed: Use semver patterns to publish image tags (without the v prefix).

Commit pushed to master branch: Use edge image tag if the event trigger was a commit.

Changes to consider:

scheduled_builds.yml should probably be constrained to running only on master.

type=edge,branch=master should be what we want for edge 👍

The caveat is, when none of these tag types are valid for a workflow run, it'll apparently fallback to defaults. If a different branch from master (or a PR and other supported defaults) triggers generic_publish.yml, we could accidentally publish tags? (EDIT: Nope, nevermind)

This looks fine 👍

I cannot identify how type=edge,branch=master when the workflow uses a different branch could fallback to using the branch name as an image tag (detailed above in "Concerns"). I haven't tried to reproduce the reported bug, so I might have missed something.

.github/workflows/generic_build.yml

polarathene

I went over the GHA docs for re-usable workflows. Looks like these additions should be taken care of:

Provide access to secrets (secrets: inherit, unless you can use environment?).
Specify revision used (uses: docker-mailserver/docker-mailserver/<generic file name>@master).

Potential concern with cache re-use

I've "unresolved" this old review comment, as it raises a concern about scheduled_builds.yml (but probably affects the other two workflows as well). But we can look into that after this PR.

The gist of it for @casperklein is that when cache is used (which is going to be most of the time, as we use restore-keys for fallback cache), if the early RUN layer that installs packages is not altered via git changes to invalidate the layer, then no new updates if available will be applied as cache is re-used.

I'm not sure how this affects ClamAV with COPY --link which might be the same issue or differ. But freshclam is scheduled to run in a container, while updating packages is not. So perhaps nothing to worry about?

scheduled_builds.yml could use --no-cache or similar opt-out, but as soon as master is updated, a new edge update would regress any fresher updates as cache was used again.

.github/workflows/default_on_push.yml

.github/workflows/generic_publish.yml

.github/workflows/default_on_push.yml

See - #2753 (comment) - #2753 (comment) - #2753 (comment)

polarathene

Brilliant! 🎉

georglauterbach · 2022-09-03T23:23:52Z

@casperklein @wernerfred this is now ready for your review 😆 :)

casperklein

Nice improvement 👍 Looks good to me, nothing to complain 😆
But I've not tested it in a test repo. If you both think that's fine, that should be okay.

georglauterbach · 2022-09-09T09:09:17Z

Nice improvement 👍 Looks good to me, nothing to complain 😆

But I've not tested it in a test repo. If you both think that's fine, that should be okay.

👍 I hope there is no typo somewhere important 🙈

georglauterbach added area/ci kind/improvement Improve an existing feature, configuration file or the documentation labels Aug 29, 2022

georglauterbach added this to the v11.2.0 milestone Aug 29, 2022

georglauterbach requested review from casperklein, polarathene and wernerfred August 29, 2022 14:34

georglauterbach self-assigned this Aug 29, 2022

minor language and naming issue resolved

c613bd1

georglauterbach commented Aug 29, 2022

View reviewed changes

.github/workflows/generic_build.yml Outdated Show resolved Hide resolved

polarathene requested changes Aug 30, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

georglauterbach added 5 commits August 30, 2022 11:42

add new generic re-usable workflows

ff7b9ef

This way, Ci is more flexible and easier to maintain. We now have dedicated workflows for building, testing and publishing. (Fingers crossed cache works as expected..)

adjust existing workflows to use new re-usable workflows

5abee3d

updated workflows to now run properly

02d0628

I tested the current workflows in a private fork. Works like a charm. The corresponding PR (#2753) contains all the information about the CI adjustments in summary.

minor corrections (style + comments)

cbfd0f7

Merge branch 'master' into ci/generic-build

e0d428d

This comment was marked as resolved.

Sign in to view

georglauterbach requested a review from polarathene August 30, 2022 12:26

georglauterbach added 4 commits August 30, 2022 14:27

streamline naming

24eb9e6

undo comment for code snippet used for testing

15ef82b

add quotes around name: fields

04bac22

move env in CI to coherent location

a00ddc3

polarathene requested changes Aug 31, 2022

View reviewed changes

georglauterbach and others added 4 commits August 31, 2022 11:30

Apply suggestions from code review

4dc77e0

Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>

apply more review feedback

e5ca173

forgot to add comment from review feedback

caf1e59

solve platforms problem

4aefa0b

The new solution is much cleaner. The new `platforms` input has a default that builds for AMD64, and one can optionally build ARM images (completely independent from AMD64 even).

georglauterbach requested a review from polarathene August 31, 2022 10:01

polarathene requested changes Aug 31, 2022

View reviewed changes

chore: Minor revisions to comments

34e2ad4

polarathene approved these changes Aug 31, 2022

View reviewed changes

polarathene requested changes Sep 1, 2022

View reviewed changes

.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved

.github/workflows/generic_publish.yml Show resolved Hide resolved

.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved

apply PR feedback

d93f86d

See - #2753 (comment) - #2753 (comment) - #2753 (comment)

georglauterbach requested a review from polarathene September 3, 2022 10:20

polarathene approved these changes Sep 3, 2022

View reviewed changes

casperklein approved these changes Sep 8, 2022

View reviewed changes

Merge branch 'master' into ci/generic-build

ba66853

georglauterbach merged commit f8e1bb0 into master Sep 9, 2022

georglauterbach deleted the ci/generic-build branch September 9, 2022 09:12

georglauterbach mentioned this pull request Sep 9, 2022

ci: enhance build process #2755

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: improve GitHub Action CI with re-usable workflows #2753

ci: improve GitHub Action CI with re-usable workflows #2753

georglauterbach commented Aug 29, 2022 •

edited

georglauterbach commented Aug 29, 2022

polarathene left a comment •

edited

polarathene Aug 29, 2022

georglauterbach Aug 30, 2022

This comment was marked as outdated.

This comment was marked as resolved.

polarathene left a comment

polarathene left a comment •

edited

polarathene Aug 31, 2022 •

edited

polarathene left a comment •

edited

polarathene left a comment

georglauterbach commented Sep 3, 2022

casperklein left a comment

georglauterbach commented Sep 9, 2022

ci: improve GitHub Action CI with re-usable workflows #2753

ci: improve GitHub Action CI with re-usable workflows #2753

Conversation

georglauterbach commented Aug 29, 2022 • edited

Description

Type of change

Checklist:

georglauterbach commented Aug 29, 2022

polarathene left a comment • edited

Choose a reason for hiding this comment

polarathene Aug 29, 2022

Choose a reason for hiding this comment

georglauterbach Aug 30, 2022

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as resolved.

polarathene left a comment

Choose a reason for hiding this comment

polarathene left a comment • edited

Choose a reason for hiding this comment

Cache key scoping

polarathene Aug 31, 2022 • edited

Choose a reason for hiding this comment

polarathene left a comment • edited

Choose a reason for hiding this comment

Potential concern with cache re-use

polarathene left a comment

Choose a reason for hiding this comment

georglauterbach commented Sep 3, 2022

casperklein left a comment

Choose a reason for hiding this comment

georglauterbach commented Sep 9, 2022

georglauterbach commented Aug 29, 2022 •

edited

polarathene left a comment •

edited

polarathene left a comment •

edited

polarathene Aug 31, 2022 •

edited

polarathene left a comment •

edited