-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: improve GitHub Action CI with re-usable workflows #2753
Conversation
A new re-usable workflow was introduced to handle building the container image in a generic way. Now, the `scheduled_builds`, `default_on_push` and a part of the `test_merge_requests` workflow can use the same code for building and publishing the container images. This is DRY. The `test_merge_requests` workflow was adjusted in the following way: building the ARM image is now done _after_ the initial build to run in parallel with BATS. This saves around 13 minutes of CI time. The new ARM build uses the generic build workflow. The other two jobs do not use the generic workflow yet as I was not sure whether the generic workflow fits our needs in this case.
@polarathene I am not 100% sure how to properly test whether there aren't any errors in the new workflows. Do you have an idea? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this! 💪
I'm pressed for time, apologies for the verbose review - no time to revise it or provide a decent summary right now :(
Resolved
- I believe I spotted some mistakes with workflow steps in the wrong places, and I'd like to see any conditional caching handled better if it's still deemed necessary
(be sure to avoid caching separate ARM builds with the same cache key, without(EDIT: If AMD64 is always built with, or earlier for the same cache-key, it shouldn't be an issue).type=gha
this might warrant opting out of cache) - See the
scheduled_builds.yml
feedback for a potential concern with cache (still affects previous approach AFAIK), additional insights for cache handling is covered over several comments forgeneric_build.yml
. - Due to the nature of this PR, I'd like to request that unrelated stylistic changes (quotes, minor revisions to values or field order) that aren't required for the functionality changes - I'd appreciate it if you'd defer those to a separate PR (annoying I know 😞 ). They add noise to the review, which I usually don't mind but this change should remain focused 😅 (ok with a force-push to rewrite history if that's easier)
|
||
on: | ||
schedule: | ||
- cron: "0 0 * * 5" | ||
- cron: 0 0 * * 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this schedule is for building at midnight on fridays each week right?
The purpose of this scheduled build was presumably for keeping ClamAV up to date? Maybe as a way of checking that regular builds continue to work if there hasn't been any master branch updates in a while too?
The cache storage AFAIK has a retention period of 7 days for a cached entry (although I'm not sure if that's extended for an entry each time it's retrieved). But this concern probably applies to previous cache approach too.
I think when the cache is used, this will prevent the early package install portion, and other layers like ClamAV updating it's DB from changing.
Cached layers during build is only invalidated when the build context changes. For the same commit on master since default_on_push.yml
would have run earlier, that build context would not have changed.
Once the build cache has expired and a fresh build occurs, then this will work. However, now that we're properly caching layers (no more early ARG
per commit or branch ref change invalidating the whole build cache), the re-use of existing cache (via restore-keys
) while useful, may also prolong this cached layer effect if you expect non-deterministic builds from newer updates to packages or ClamAV implicitly for image builds at different points in time.
If that is a concern that needs to be addressed (perhaps more important for ClamAV?), then I believe we can invalidate specific Dockerfile stages when required for generic_build.yml
as an input?
docker/build-push-action
or underlying buildx
supports this AFAIK, otherwise you can build stages separately with multiple steps where necessary to cache the stages separately (be mindful of cache-to
modes min
/max
in that case). I know there are some examples of such in that actions issues / docs somewhere.
Managing that multi-stage / multi-step approach is probably not pleasant with type=local
due to the cache-from
+ cache-to
shuffling workaround and possibly needing multiple action/cache
steps.. type=gha
makes all that go away.
Alternatively, since this is a background cron task, it could instead just have the build ignore cache. I think the action supports customizing the buildx
command instead of having to use conditionals around our cache support, using an input that adds --no-cache
or similar to the build command should suffice.
That won't update the cache for an existing cache-key however, so the update for edge
would regress by default_on_push.yml
- Unless likewise ignoring cache (entirely or selectively).
One area of interest is the buildx feature of COPY --link
that you've handled in another PR. I don't recall how that changes / improves the cache situation, but I remember it's meant to avoid rippling invalidation of layers unnecessarily. It's also possible to use it with an image (like official ClamAV) and excludes the COPY from living inside our published image build, instead performing a pull of the image and copying over content from it during the docker pull
event that pieces together the image on the Docker host for it to be usable.
AFAIK, that would not require this cron
approach to keep such updates, as the user will get the latest at the time they pull our image, instead of when we last built and pushed.
However... I don't think our users would appreciate requiring a runtime released since March 2022 with buildx configured. Requiring that to build the Dockerfile
is one thing, but for the broader usage of the published image is probably a bit too soon...?
It may just be better to have Dockerfile
get database updates from the official ClamAV image. I'm not sure if that requires pinning the FROM version for ClamAV, or if it will always check the remote instead of using cache (I haven't looked at the GHA build logs in a while, but maybe the FROM instruction for Debian image is always pulled instead of cached?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intend was to provide up-to-date package versions to users using :edge
, which includes updated to ClamAV I guess, but because we have a cron job inside the container that runs freshclam
daily, I think this is not a concern.
The COPY --link ..
is best suited for another PR :) We may be able to get rid of this workflow if we really want to, but right now, I see no harm. The cache problem might however be a real concern. I you're right and I understood you correctly, we do not get package updates because we will now use a cache, which "skips" package installation. We should solve this somehow, otherwise this workflow really is useless.
This comment was marked as outdated.
This comment was marked as outdated.
This way, Ci is more flexible and easier to maintain. We now have dedicated workflows for building, testing and publishing. (Fingers crossed cache works as expected..)
I tested the current workflows in a private fork. Works like a charm. The corresponding PR (#2753) contains all the information about the CI adjustments in summary.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review pass. You've committed since I started the review, some some of the 14 items may be outdated 😅
Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
The new solution is much cleaner. The new `platforms` input has a default that builds for AMD64, and one can optionally build ARM images (completely independent from AMD64 even).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! 🎉
Almost there:
- A few revisions to comments I'll apply.
- One input I'll leave up for discussion to remove.
- A concern with
docker/metadata-action
(partly related toscheduled_builds.yml
), but I believe that's not going to be an issue.
I think that last point is what you were asking about with?:
One things left: are tags now addresses correctly
If so, looks like it'll be fine. Expand the detailed view on that feedback comment for my review on that same concern.
how would you handle extra platforms?
I'd probably just go with a bool toggle as an input for using with an expression on platforms
that better documents we only support one or the other (at least once we drop ARMv7). Your updated approach is good too 👍
You can still add linux/amd64
for the ARM build if you like, it'd just use the cache that should be available. But it's fine as-is, so long as the dependency on the AMD64 build job is present, I don't see anywhere that we'd not have a cache with AMD64 available.
Cache key scoping
GHA caching docs have a section on restricted cache access:
A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually
main
).
test_merge_requests.yml
will produce the cache key for AMD64, but that cache key would only have a hit for PRs (or local branches) when they trigger the workflow and create the cache key if it doesn't exist onmaster
/ base branch.default_on_push.yml
will do the same formaster
branch or when semver tags are pushed.- The cache key from
master
commits might be the same as the PR, but possibly duplicate the cache storage used anyway due to scope change. - PRs (forked or local branches) will then have access to the same cache key.
- An example mentions that two different tags would not be able to share cache from each other (but presumably can from the default branch).
It is not made clear if existing PRs would be able to leverage a newly added cache key onExplained at the bottom of the cache docs.master
, or only new PRs since. Likewise withmaster
branch and tagging. I would think the scope may be based on having common commit history (especially based on thefeature-b
->feature-a
->main
branch example)- If tagging a release after pushing the release PR too quickly, then both may attempt to upload cache for the same key, but that shouldn't matter.
- The cache key from
scheduled_builds.yml
will take the longest as it builds all platforms at once (midnight on fridays each week). It would rarely create a cache key, unless an existing key had since expired or been evicted.
- name: 'Prepare tags' | ||
id: prep | ||
uses: docker/metadata-action@v4.0 | ||
with: | ||
images: | | ||
${{ secrets.DOCKER_REPOSITORY }} | ||
${{ secrets.GHCR_REPOSITORY }} | ||
tags: | | ||
type=edge,branch=master | ||
type=semver,pattern={{major}} | ||
type=semver,pattern={{major}}.{{minor}} | ||
type=semver,pattern={{major}}.{{minor}}.{{patch}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerns (Resolved)
default_on_push.yml
uses this docker/metadata-action
config, but you've omitted the flavor
input it had?:
flavor: |
latest=auto
Just looked that up, this is an implicit default. auto
is valid for type=semver
.
type=semver
applies totag
events (I assume that detection is carried over into this re-usable workflow, but we might have to verify thatseems so, according to docs).type=edge
is used otherwise for associating theedge
image tag to latest master branch commit.
I saw this comment on the docker/metadata-action
repo, where type=edge
was apparently being treated as type=ref,event=branch
if the workflow was run on a branch that was not meeting the specified branch. This was publishing branch names as image tags.
UPDATE: I cannot see in source why that would happen. And it looks like the action properly handles no valid tags to apply? Either this was fixed since, or there's a configuration issue elsewhere I think and we don't have to worry about it.
Also note that scheduled_builds.yml
differed here. No flavor
(which is fine since it's implicit) and tags
differed with:
tags: |
type=raw,value=edge
type=raw
seems to just associate the edge
tag to the image built, no constraint to being the latest commit of workflow triggered/specified branch?
I'm not sure when type=raw,value=edge
would be preferred over type=edge
in this case. But we can use expressions to better constrain when a custom tag is applied.
Given the above, if I understand things correctly..
scheduled_builds.yml
will triggertype=edge,branch=master
, updating theedge
image tag.scheduled_builds.yml
is not restricted to the default branch (master
), as the workflow history shows it running from your activeci/generic-build
branch.If this wasn't failing, I think we'd have published aci/generic-build
image tag?default_on_push.yml
will handle the following workflow events.- Tag pushed: Use
semver
patterns to publish image tags (without thev
prefix). - Commit pushed to
master
branch: Useedge
image tag if the event trigger was a commit.
- Tag pushed: Use
Changes to consider:
scheduled_builds.yml
should probably be constrained to running only onmaster
.type=edge,branch=master
should be what we want foredge
👍The caveat is, when none of these tag types are valid for a workflow run, it'll apparently fallback to defaults. If a different branch from(EDIT: Nope, nevermind)master
(or a PR and other supported defaults) triggersgeneric_publish.yml
, we could accidentally publish tags?
This looks fine 👍
I cannot identify how type=edge,branch=master
when the workflow uses a different branch could fallback to using the branch name as an image tag (detailed above in "Concerns"). I haven't tried to reproduce the reported bug, so I might have missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went over the GHA docs for re-usable workflows. Looks like these additions should be taken care of:
- Provide access to secrets (
secrets: inherit
, unless you can useenvironment
?). - Specify revision used (
uses: docker-mailserver/docker-mailserver/<generic file name>@master
).
Potential concern with cache re-use
I've "unresolved" this old review comment, as it raises a concern about scheduled_builds.yml
(but probably affects the other two workflows as well). But we can look into that after this PR.
The gist of it for @casperklein is that when cache is used (which is going to be most of the time, as we use restore-keys
for fallback cache), if the early RUN
layer that installs packages is not altered via git changes to invalidate the layer, then no new updates if available will be applied as cache is re-used.
I'm not sure how this affects ClamAV with COPY --link
which might be the same issue or differ. But freshclam
is scheduled to run in a container, while updating packages is not. So perhaps nothing to worry about?
scheduled_builds.yml
could use --no-cache
or similar opt-out, but as soon as master
is updated, a new edge
update would regress any fresher updates as cache was used again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperklein @wernerfred this is now ready for your review 😆 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement 👍 Looks good to me, nothing to complain 😆
But I've not tested it in a test repo. If you both think that's fine, that should be okay.
👍 I hope there is no typo somewhere important 🙈 |
Description
Mew re-usable workflows are introduced to handle building, testing and publishing the container
image in a uniform and easy way. Now, the
scheduled_builds
,default_on_push
and a part of the
test_merge_requests
workflow can use the same codefor building, testing and publishing the container images. This is DRY.
Follow-up for: #2742
Type of change
Checklist:
docs/
)