Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor podvm workflows and enable on libvirt e2e #1436

Merged

Conversation

wainersm
Copy link
Member

The libvirt e2e workflow on pull request currently does not create a podvm, instead it uses the pre-built images. We have two problems here: 1) podvm built images are usually old and outdated; 2) not building on pull request means changes on podvm aren't tested on CI.

By the time I introduced the libvirt e2e workflow in PR #1318, @stevenhorsman mentioned on Slack that it would be better to build the podvm too. I agreed, and despite its build takes a lot of time to complete, then I am sending these changes to exactly enable podvm build alongside the libvirt e2e workflow.

Currently the podvm_builder, podvm_binaries and podvm workflows are triggered and gated on release time to publish images on quay.io. I wanted to reuse those workflows on libvirt e2e, so the first commits are meant to prepare the workflows for reuse.

The podvm images created on the libvirt e2e workflow will be published on ghcr.io with the ci-prN (where N is the PR number) tag, so not mixing up with the official images on quay.io. BTW, the caa images for that workflow are already published this way. I've a script to clean up the ghcr.io from images for PRs closed, I should contribute it soon...

OK, I tested those changes on my cloud-api-adaptor fork, on https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/6173089285 . Unfortunately as libvirt e2e runs on pull_request_target we will only be able to fully test it after merging. The podvm release workflow couldn't be tested, unfortunately, and I'm okay if we find prudent to not merge this changes before the 0.8.0 release.

Caching... it has been discussed at least in #1410 on how to use cache to speed up the binaries build by leverage caching or another similar mechanism. At begin I thought in implement something on that direction but then I realized it would be too much changes (well, I don't have a solid proposal too) so I preferred to stick with the silly approach took here: it will simple build the builder, binaries and finally podvms all the time. And this whole process can take ~1h.... that to say it is in my radar how to improve that situation on follow up changes.

@wainersm wainersm added the CI Issues related to CI workflows label Sep 13, 2023
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it looks very good (if we agree to delay caching to a later PR, which I do), I had a couple of comments/questions

---
name: Create release Pod VM Images
on:
release:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's also worth having workflow_dispatch here so we can trigger it manually without needing to do the release process?


case "$event_name" in
release)
tags="${img}:latest,${img}:${{ github.sha }}" ;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also tag release images with the release version as well as the sha and latest?


case "$event_name" in
release)
tags="${img}:latest,${img}:${{ github.sha }}" ;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about release version

@wainersm wainersm force-pushed the workflow_podvm_builder_refactor branch from 3606b31 to 2947456 Compare September 15, 2023 17:25
@wainersm
Copy link
Member Author

hi @stevenhorsman ! Thanks for the prompt review!

I addressed your comments:

@wainersm
Copy link
Member Author

ah, regarding caching...I am making some experiments to use the cache-to and cache-from argument of docker build....will be sent in a separate PR.

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@wainersm
Copy link
Member Author

Hi @snir911 , can I have your review here?

@stevenhorsman stevenhorsman mentioned this pull request Sep 22, 2023
Copy link
Contributor

@snir911 snir911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wainer! I added couple of comments.

pull_request|pull_request_target)
tags="${img}:ci-pr${{ github.event.number }}" ;;
push|workflow_dispatch)
tags="${img}:latest,${img}:${{ github.sha }}" ;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_dispatch is manual executing no? do we want to push to latest i this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snir911 yes, workflow_dispatch is manually executing. With current workflows we some times (very rare I'd say) manually build the images from main and they should get labeled latest

@@ -100,3 +117,4 @@ jobs:
"UBUNTU_IMAGE_CHECKSUM="
"BINARIES_IMG=${{ inputs.registry }}/podvm-binaries-${{ matrix.os }}-${{ matrix.arch }}:${{ inputs.binaries_img_tag }}"
"BUILDER_IMG=${{ inputs.registry }}/podvm-builder-${{ matrix.os }}:${{ inputs.builder_img_tag }}"
"CAA_SRC_REF=${{ env.CAA_SRC_REF }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that there's also #1448 PR ongoing, IIUC it will allow you simplify this series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, @jtumber-ibm has also sent another PR to avoid rebuilding the agent all the time. Plenty of work is on the queue to simplify/speed up the build process but this PR is considering the old way.

@@ -57,12 +57,31 @@ jobs:
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Determine image tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the GITHUB_ENV be passed across downstream jobs? If I understand it correctly then this step should be only present in podvm_builder and the remaining steps should simply rely on it (I'd like to avoid duplicating the embedded steps)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can see later that you perhaps want these to be callable separately. Not sure if it's a benefit as it need quite a lot of code-duplication. Anyway let me know what you think about the GITHUB_ENV or INPUTS use for these as the steps depend on each-other anyway so I don't think they are likely to be called with different arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ldoktor ! First of all, thanks for reviewing this PR.

I did not understand your idea, though.

Sending a variable to $GITHUB_ENV ensures the next steps within same job will have the variable exported on the environment.

In this PR, I made the podvm_builder, podvm_binaries and podvm workflows callable so they can be used by different workflows. And the way to passes values to those callable workflows is via inputs. If there is a way o inherent environment variables between jobs I am not aware of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if GITHUB_ENV could be passed between jobs so you are most certainly right. But the INPUTS should work. Not sure if it would result in less boiler-plate, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's not a deal-breaker, I just dislike the amount of repeated work. If we constructed the CAA_SRC_REF in the first workflow and passed only the product, we might remove most of the duplicities.

[ -n "$caa_src_ref" ] || \
caa_src_ref="$(yq '.git.cloud-api-adaptor.reference' versions.yaml)"
[ -n "$caa_src_ref" ]
echo "CAA_SRC_REF=${caa_src_ref}" >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question goes here, isn't it possible to forward the CAA_SRC_REF to the downstream steps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echo "CAA_SRC_REF=${caa_src_ref}" >> $GITHUB_ENV ensures CAA_SRC_REF is forwarded to downstream steps on same job . Do you mean between jobs?

- name: Read properties from versions.yaml
run: |
# There reference to CAA sources will honored if passed to this
# workflow via inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not native but this sentence is quite hard for me to digest. How about Allow to override the CAA version from versions.yaml from caa_src_ref input? (again, not I'm native and not well familiar with CAA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading what I wrote, I didn't even understood myself :D

Gonna accept your suggestion.

@@ -63,7 +63,15 @@ RUN echo $CAA_SRC

RUN echo $CAA_SRC_REF

RUN git clone ${CAA_SRC} -b ${CAA_SRC_REF} cloud-api-adaptor
RUN if [ -n "${CAA_SRC}" ]; then \
rm -rf cloud-api-adaptor && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is is likely the cloud-api-adaptor is present already? If so dropping everything via rm -rf seems like a waste to me as many objects can be reused. How about git remote set-url origin ${CAA_SRC}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset the remote url (and perhaps fetch the branch again?) seems a good idea. There a couple of places where we delete the repository and clone again, where we can use that idea. However, I would prefer to send in a separate PR as this change will need to re-test this PR (which is not an easy task :( Is it okay for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's just an optimization

@wainersm wainersm force-pushed the workflow_podvm_builder_refactor branch from 2947456 to 20b1808 Compare October 4, 2023 17:46
@wainersm
Copy link
Member Author

wainersm commented Oct 4, 2023

I just updated the PR to re-word some comments as suggested by @ldoktor. Everything else is the same.

@ldoktor
Copy link
Contributor

ldoktor commented Oct 5, 2023

Seems alright, the git rm -rf optimization should be handled separately (if needed). As for the repeated image tags handling I don't have enough experience with GH actions to say for sure so it's probably fine, but if there is a possibility to share the code or to just accept produced variables instead of re-typing the same code again and again, I'd welcome that. But more important is that functional-wise it seems to be working well so unless you find it easy to address the duplication, it's fine.

@stevenhorsman
Copy link
Member

@wainersm, once this is refactored, do you think we need this in to help with the 0.8 release testing? It would be great to merge it sooner rather than later and get more regular CI running as we've established that the podvm builds have gone a bit wrong for centos lately

The podvm_builder, podvm_binaries, and podvm workflows were converted to
`workflow_call` so that they can be re-used.

Before, on release time, `podvm_builder` were first triggered on a `release` event
then the others followed upon finish of the previous. With this change,
the `podvm_publish` workflow was created to be the entry point for
the `release` event (or manual dispatch), so the (now) callable `podvm_builder`,
`podvm_binaries`, and `podvm` are directly called.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added the `registry` parameter to the builder/binaries/podvm workflows
to allow publish the images in an arbitrary registry. Also added login
handler if the registry is `ghcr.io`.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently the podvm images are tagged with the versions.yaml hash. This
will allow one use an arbitrary tag by exporting the PODVM_TAG variable.

The podvm_builder, podvm_binaries and podvm workflows changed to get the
`image_tag` input (non-required).

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Until now the `e2e_on_pull` workflow was using built podvm images, where
the qcow2 files are extracted and uploaded as artifacts which are picked
up by downstream jobs (e.g. `e2e_libvirt`). This changed the workflow to
leverage the new `podvm_builder`, `podvm_binaries` and `podvm` workflows
so that the podvm images are built from the pull request code, pushed to
ghcr.io and the downstream jobs should extract the qcow2 file
themselves.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
@wainersm wainersm force-pushed the workflow_podvm_builder_refactor branch from 20b1808 to 6f6b91d Compare November 15, 2023 20:36
@wainersm
Copy link
Member Author

In reason of the recent changes on the podvm building to allow using the local source tree, I had to re-do most of this PR. Actually it dropped some commits and adapted others, in general the implementation is much more simpler now. The logic is the same though: build the podvm images tagged with ci-prNUMBER then push to ghcr.io.

@ldoktor mind to have another look? Most of the boilerplate are gone
@stevenhorsman If you find (you always find!) some time to review it... :)

I tried to test it but the podvm_binaries build has failed (tried 2x) due to networking issue as you can see in https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/6881815692/job/18720954383?pr=63

I hope later today or tomorrow that networking issue is solved so I can give it a 3rd tentative.

@wainersm
Copy link
Member Author

In reason of the recent changes on the podvm building to allow using the local source tree, I had to re-do most of this PR. Actually it dropped some commits and adapted others, in general the implementation is much more simpler now. The logic is the same though: build the podvm images tagged with ci-prNUMBER then push to ghcr.io.

@ldoktor mind to have another look? Most of the boilerplate are gone @stevenhorsman If you find (you always find!) some time to review it... :)

I tried to test it but the podvm_binaries build has failed (tried 2x) due to networking issue as you can see in https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/6881815692/job/18720954383?pr=63

I hope later today or tomorrow that networking issue is solved so I can give it a 3rd tentative.

ok, re-ran the job today, all podvm passed except for CentOS Stream which I already expected to fail. Here is: https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/6881815692/job/18745812378

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good and I think it's more simpler thanks to the work done by @tumberino? Thanks for the work x2!

on:
# Create release Pod VM Images
release:
types: [created]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but I wonder if longer term we want to have a master release workflow that calls this via workflow_call so we can do things like trigger the CAA build and release tagging at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @stevenhorsman ,

By "release tagging" you meant tagging the image (e.g quay.io/confidential-containers/cloud-api-adaptor:v0.9.0 and quay.io/confidential-containers/podvm-generic-ubuntu-amd64:v0.9.0)?

I was thinking in send a follow up PR to tag the podvm image, yes. I wasn't thinking about the CAA image, it seems a good idea btw

Copy link
Contributor

@ldoktor ldoktor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the changes make sense to me, naming seems corresponding to Makefile and it looks nicer then the previous version.

@wainersm
Copy link
Member Author

Yep, the changes make sense to me, naming seems corresponding to Makefile and it looks nicer then the previous version.

Thanks @ldoktor !

Copy link
Contributor

@snir911 snir911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@wainersm wainersm merged commit a818375 into confidential-containers:main Nov 20, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Issues related to CI workflows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants