Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: improve GitHub Action CI with re-usable workflows #2753

Merged
merged 18 commits into from
Sep 9, 2022

Conversation

georglauterbach
Copy link
Member

@georglauterbach georglauterbach commented Aug 29, 2022

Description

Mew re-usable workflows are introduced to handle building, testing and publishing the container
image in a uniform and easy way. Now, the scheduled_builds, default_on_push
and a part of the test_merge_requests workflow can use the same code
for building, testing and publishing the container images. This is DRY.

Follow-up for: #2742

Type of change

  • Improvement (non-breaking change that does improve existing functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

A new re-usable workflow was introduced to handle building the container
image in a generic way. Now, the `scheduled_builds`, `default_on_push`
and a part of the `test_merge_requests` workflow can use the same code
for building and publishing the container images. This is DRY.

The `test_merge_requests` workflow was adjusted in the following way:
building the ARM image is now done _after_ the initial build to run in
parallel with BATS. This saves around 13 minutes of CI time. The new ARM
build uses the generic build workflow. The other two jobs do not use the
generic workflow yet as I was not sure whether the generic workflow fits
our needs in this case.
@georglauterbach georglauterbach added area/ci kind/improvement Improve an existing feature, configuration file or the documentation labels Aug 29, 2022
@georglauterbach georglauterbach added this to the v11.2.0 milestone Aug 29, 2022
@georglauterbach georglauterbach self-assigned this Aug 29, 2022
@georglauterbach
Copy link
Member Author

@polarathene I am not 100% sure how to properly test whether there aren't any errors in the new workflows. Do you have an idea?

Copy link
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! 💪

I'm pressed for time, apologies for the verbose review - no time to revise it or provide a decent summary right now :(

Resolved
  • I believe I spotted some mistakes with workflow steps in the wrong places, and I'd like to see any conditional caching handled better if it's still deemed necessary (be sure to avoid caching separate ARM builds with the same cache key, without type=gha this might warrant opting out of cache) (EDIT: If AMD64 is always built with, or earlier for the same cache-key, it shouldn't be an issue).
  • See the scheduled_builds.yml feedback for a potential concern with cache (still affects previous approach AFAIK), additional insights for cache handling is covered over several comments for generic_build.yml.
  • Due to the nature of this PR, I'd like to request that unrelated stylistic changes (quotes, minor revisions to values or field order) that aren't required for the functionality changes - I'd appreciate it if you'd defer those to a separate PR (annoying I know 😞 ). They add noise to the review, which I usually don't mind but this change should remain focused 😅 (ok with a force-push to rewrite history if that's easier)

.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved
.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved
.github/workflows/generic_build.yml Outdated Show resolved Hide resolved

on:
schedule:
- cron: "0 0 * * 5"
- cron: 0 0 * * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, this schedule is for building at midnight on fridays each week right?

The purpose of this scheduled build was presumably for keeping ClamAV up to date? Maybe as a way of checking that regular builds continue to work if there hasn't been any master branch updates in a while too?

The cache storage AFAIK has a retention period of 7 days for a cached entry (although I'm not sure if that's extended for an entry each time it's retrieved). But this concern probably applies to previous cache approach too.

I think when the cache is used, this will prevent the early package install portion, and other layers like ClamAV updating it's DB from changing.

Cached layers during build is only invalidated when the build context changes. For the same commit on master since default_on_push.yml would have run earlier, that build context would not have changed.

Once the build cache has expired and a fresh build occurs, then this will work. However, now that we're properly caching layers (no more early ARG per commit or branch ref change invalidating the whole build cache), the re-use of existing cache (via restore-keys) while useful, may also prolong this cached layer effect if you expect non-deterministic builds from newer updates to packages or ClamAV implicitly for image builds at different points in time.

If that is a concern that needs to be addressed (perhaps more important for ClamAV?), then I believe we can invalidate specific Dockerfile stages when required for generic_build.yml as an input?

docker/build-push-action or underlying buildx supports this AFAIK, otherwise you can build stages separately with multiple steps where necessary to cache the stages separately (be mindful of cache-to modes min/max in that case). I know there are some examples of such in that actions issues / docs somewhere.

Managing that multi-stage / multi-step approach is probably not pleasant with type=local due to the cache-from + cache-to shuffling workaround and possibly needing multiple action/cache steps.. type=gha makes all that go away.

Alternatively, since this is a background cron task, it could instead just have the build ignore cache. I think the action supports customizing the buildxcommand instead of having to use conditionals around our cache support, using an input that adds --no-cache or similar to the build command should suffice.

That won't update the cache for an existing cache-key however, so the update for edge would regress by default_on_push.yml - Unless likewise ignoring cache (entirely or selectively).


One area of interest is the buildx feature of COPY --link that you've handled in another PR. I don't recall how that changes / improves the cache situation, but I remember it's meant to avoid rippling invalidation of layers unnecessarily. It's also possible to use it with an image (like official ClamAV) and excludes the COPY from living inside our published image build, instead performing a pull of the image and copying over content from it during the docker pull event that pieces together the image on the Docker host for it to be usable.

AFAIK, that would not require this cron approach to keep such updates, as the user will get the latest at the time they pull our image, instead of when we last built and pushed.

However... I don't think our users would appreciate requiring a runtime released since March 2022 with buildx configured. Requiring that to build the Dockerfile is one thing, but for the broader usage of the published image is probably a bit too soon...?


It may just be better to have Dockerfile get database updates from the official ClamAV image. I'm not sure if that requires pinning the FROM version for ClamAV, or if it will always check the remote instead of using cache (I haven't looked at the GHA build logs in a while, but maybe the FROM instruction for Debian image is always pulled instead of cached?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intend was to provide up-to-date package versions to users using :edge, which includes updated to ClamAV I guess, but because we have a cron job inside the container that runs freshclam daily, I think this is not a concern.

The COPY --link .. is best suited for another PR :) We may be able to get rid of this workflow if we really want to, but right now, I see no harm. The cache problem might however be a real concern. I you're right and I understood you correctly, we do not get package updates because we will now use a cache, which "skips" package installation. We should solve this somehow, otherwise this workflow really is useless.

.github/workflows/scheduled_builds.yml Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
@polarathene

This comment was marked as outdated.

This way, Ci is more flexible and easier to maintain. We now have
dedicated workflows for building, testing and publishing. (Fingers
crossed cache works as expected..)
I tested the current workflows in a private fork. Works like a charm.
The corresponding PR (#2753) contains all the information about the CI
adjustments in summary.
@georglauterbach

This comment was marked as resolved.

Copy link
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review pass. You've committed since I started the review, some some of the 14 items may be outdated 😅

.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
.github/workflows/test_merge_requests.yml Outdated Show resolved Hide resolved
.github/workflows/generic_publish.yml Outdated Show resolved Hide resolved
.github/workflows/generic_publish.yml Outdated Show resolved Hide resolved
.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved
georglauterbach and others added 4 commits August 31, 2022 11:30
Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>
The new solution is much cleaner. The new `platforms` input has a
default that builds for AMD64, and one can optionally build ARM images
(completely independent from AMD64 even).
Copy link
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! 🎉

Almost there:

  • A few revisions to comments I'll apply.
  • One input I'll leave up for discussion to remove.
  • A concern with docker/metadata-action (partly related to scheduled_builds.yml), but I believe that's not going to be an issue.

I think that last point is what you were asking about with?:

One things left: are tags now addresses correctly

If so, looks like it'll be fine. Expand the detailed view on that feedback comment for my review on that same concern.

how would you handle extra platforms?

I'd probably just go with a bool toggle as an input for using with an expression on platforms that better documents we only support one or the other (at least once we drop ARMv7). Your updated approach is good too 👍

You can still add linux/amd64 for the ARM build if you like, it'd just use the cache that should be available. But it's fine as-is, so long as the dependency on the AMD64 build job is present, I don't see anywhere that we'd not have a cache with AMD64 available.


Cache key scoping

GHA caching docs have a section on restricted cache access:

A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually main).

  • test_merge_requests.yml will produce the cache key for AMD64, but that cache key would only have a hit for PRs (or local branches) when they trigger the workflow and create the cache key if it doesn't exist on master / base branch.
  • default_on_push.yml will do the same for master branch or when semver tags are pushed.
    • The cache key from master commits might be the same as the PR, but possibly duplicate the cache storage used anyway due to scope change.
    • PRs (forked or local branches) will then have access to the same cache key.
    • An example mentions that two different tags would not be able to share cache from each other (but presumably can from the default branch).
    • It is not made clear if existing PRs would be able to leverage a newly added cache key on master, or only new PRs since. Likewise with master branch and tagging. I would think the scope may be based on having common commit history (especially based on the feature-b -> feature-a -> main branch example) Explained at the bottom of the cache docs.
    • If tagging a release after pushing the release PR too quickly, then both may attempt to upload cache for the same key, but that shouldn't matter.
  • scheduled_builds.yml will take the longest as it builds all platforms at once (midnight on fridays each week). It would rarely create a cache key, unless an existing key had since expired or been evicted.

.github/workflows/generic_publish.yml Outdated Show resolved Hide resolved
.github/workflows/generic_publish.yml Outdated Show resolved Hide resolved
.github/workflows/generic_test.yml Outdated Show resolved Hide resolved
.github/workflows/generic_test.yml Outdated Show resolved Hide resolved
.github/workflows/generic_test.yml Outdated Show resolved Hide resolved
Comment on lines +24 to +35
- name: 'Prepare tags'
id: prep
uses: docker/metadata-action@v4.0
with:
images: |
${{ secrets.DOCKER_REPOSITORY }}
${{ secrets.GHCR_REPOSITORY }}
tags: |
type=edge,branch=master
type=semver,pattern={{major}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}.{{minor}}.{{patch}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerns (Resolved)

default_on_push.yml uses this docker/metadata-action config, but you've omitted the flavor input it had?:

          flavor: |
            latest=auto

Just looked that up, this is an implicit default. auto is valid for type=semver.

  • type=semver applies to tag events (I assume that detection is carried over into this re-usable workflow, but we might have to verify that seems so, according to docs).
  • type=edge is used otherwise for associating the edge image tag to latest master branch commit.

I saw this comment on the docker/metadata-action repo, where type=edge was apparently being treated as type=ref,event=branch if the workflow was run on a branch that was not meeting the specified branch. This was publishing branch names as image tags.

UPDATE: I cannot see in source why that would happen. And it looks like the action properly handles no valid tags to apply? Either this was fixed since, or there's a configuration issue elsewhere I think and we don't have to worry about it.


Also note that scheduled_builds.yml differed here. No flavor (which is fine since it's implicit) and tags differed with:

          tags: |
            type=raw,value=edge

type=raw seems to just associate the edge tag to the image built, no constraint to being the latest commit of workflow triggered/specified branch?

I'm not sure when type=raw,value=edge would be preferred over type=edge in this case. But we can use expressions to better constrain when a custom tag is applied.


Given the above, if I understand things correctly..

  • scheduled_builds.yml will trigger type=edge,branch=master, updating the edge image tag.
  • scheduled_builds.yml is not restricted to the default branch (master), as the workflow history shows it running from your active ci/generic-build branch. If this wasn't failing, I think we'd have published a ci/generic-build image tag?
  • default_on_push.yml will handle the following workflow events.
    • Tag pushed: Use semver patterns to publish image tags (without the v prefix).
    • Commit pushed to master branch: Use edge image tag if the event trigger was a commit.

Changes to consider:

  • scheduled_builds.yml should probably be constrained to running only on master.
  • type=edge,branch=master should be what we want for edge 👍
  • The caveat is, when none of these tag types are valid for a workflow run, it'll apparently fallback to defaults. If a different branch from master (or a PR and other supported defaults) triggers generic_publish.yml, we could accidentally publish tags? (EDIT: Nope, nevermind)

This looks fine 👍

I cannot identify how type=edge,branch=master when the workflow uses a different branch could fallback to using the branch name as an image tag (detailed above in "Concerns"). I haven't tried to reproduce the reported bug, so I might have missed something.

.github/workflows/generic_build.yml Outdated Show resolved Hide resolved
Copy link
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the GHA docs for re-usable workflows. Looks like these additions should be taken care of:

  • Provide access to secrets (secrets: inherit, unless you can use environment?).
  • Specify revision used (uses: docker-mailserver/docker-mailserver/<generic file name>@master).

Potential concern with cache re-use

I've "unresolved" this old review comment, as it raises a concern about scheduled_builds.yml (but probably affects the other two workflows as well). But we can look into that after this PR.

The gist of it for @casperklein is that when cache is used (which is going to be most of the time, as we use restore-keys for fallback cache), if the early RUN layer that installs packages is not altered via git changes to invalidate the layer, then no new updates if available will be applied as cache is re-used.

I'm not sure how this affects ClamAV with COPY --link which might be the same issue or differ. But freshclam is scheduled to run in a container, while updating packages is not. So perhaps nothing to worry about?

scheduled_builds.yml could use --no-cache or similar opt-out, but as soon as master is updated, a new edge update would regress any fresher updates as cache was used again.

.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved
.github/workflows/generic_publish.yml Show resolved Hide resolved
.github/workflows/default_on_push.yml Outdated Show resolved Hide resolved
Copy link
Member

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! 🎉

approved

@georglauterbach
Copy link
Member Author

@casperklein @wernerfred this is now ready for your review 😆 :)

Copy link
Member

@casperklein casperklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement 👍 Looks good to me, nothing to complain 😆
But I've not tested it in a test repo. If you both think that's fine, that should be okay.

@georglauterbach
Copy link
Member Author

Nice improvement 👍 Looks good to me, nothing to complain 😆

But I've not tested it in a test repo. If you both think that's fine, that should be okay.

👍 I hope there is no typo somewhere important 🙈

@georglauterbach georglauterbach merged commit f8e1bb0 into master Sep 9, 2022
@georglauterbach georglauterbach deleted the ci/generic-build branch September 9, 2022 09:12
@georglauterbach georglauterbach mentioned this pull request Sep 9, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci kind/improvement Improve an existing feature, configuration file or the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants