tests/e2e: Migrate e2e test for s390x to GHA #295

BbolroC · 2023-11-28T13:07:37Z

This PR is to migrate the following e2e tests for s390x in Jenkins to GHA:

confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-baseline
confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-PR
confidential-containers-operator-main-ubuntu-20.04-s390x-SE-daily

It will be staying in a work-in-progress state and the workflow structure will be changed based on that of x86_64. (i.e. a decision could be made when #260 is finalised.)

The workflow itself has been tested several times

The rationale for the {pre,post} action in the workflow is as follows:

A self-hosted runner for s390x cannot be instantiated and used instantly on request, rather should be ready/running 24/7. It is inevitable to institute a hook for the management over test runs.
The management script could be committed to the repo, but it wouldn't be flexible enough to tackle any environmental issues on the runner.

Of course, an introduction of the actions makes it hard to incorporate workflows for different architectures into one via matrix because if: for the selective run for an architecture (e.g. if: ${{ matrix.arch == 's390x' }}) conflicts with if: always() which is used for the post action. I look forward to any feedback for those two actions.

There are 2 things which must be prepared before the merge:

Registration for a self-hosted runner for s390x
Configuration of a secret named REGISTRY_CREDENTIAL_ENCODED.

Thanks.

Fixes: #294

Signed-off-by: Hyounggyu Choi Hyounggyu.Choi@ibm.com

portersrc · 2023-12-01T14:11:01Z

Hey @BbolroC, looks good to me.
Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

BbolroC · 2023-12-04T11:32:30Z

Hey @BbolroC, looks good to me. Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

Hi, thank you for your comment. I'd like to add more information to further clarify the conflict I previously explained:

Let's say we would like to incorporate 2 different architectures by using 2 runners named x86_64 and s390x, respectively.

name: Example Workflow

jobs:
  job1:
    matrix:
      os: [x86_64, s390x]
    runs-on: {{ matrix.os }}

    steps:
      - name: Pre Step
        run: echo "Running Pre Step"
        if: {{ matrix.os == 's390x' }}

      - name: Main Step
        run: echo "Running Main Step"

      - name: Post Step
        if: {{ matrix.os == 's390x' }}
        run: echo "Running Post Step"

Thinking that the {pre,post} steps are only required for s390x, the conditional if: {{ matrix.os == 's390x' }} at the post step prevents the step from achieving what if: always() did before because it is not 100% true. Consequently, the post step fails to execute when other steps encounter an error.

Upon further consideration, a potential resolution could involve passing a matrix element as an environment variable into a step, like:

      - name: Post Step
        if: always()
        run: echo "Dealing with a step selectively based on an environment variable `OS`"
        env:
          OS: {{ matrix.os }}

I can confidently state that the introduction of the {pre,post} actions wouldn't pose any issues for incorporating workflows across multiple architectures. Thanks.

wainersm · 2023-12-12T18:03:15Z

.github/workflows/ci-s390x.yaml

+    steps:
+      - name: Adjust a permission for repo
+        run: |
+          sudo chown -R $USER:$USER $GITHUB_WORKSPACE


hi @BbolroC should this chown call be part of the pre-action too?

Yeah, you are totally right. Thanks for pointing it out. I will update the PR.

FYI: this optimisation will also apply to kata GHA (kata-containers/kata-containers#8649) soon. 😉

wainersm · 2023-12-12T18:23:59Z

Hey @BbolroC, looks good to me. Regarding the pre- and post-action scripts and if: ${{ matrix.arch == 's390x' }} vs. if: always(): You're already using runs-on: s390x and a self-hosted runner for this workflow, so does that not solve the problem?

Hi, thank you for your comment. I'd like to add more information to further clarify the conflict I previously explained:

Let's say we would like to incorporate 2 different architectures by using 2 runners named x86_64 and s390x, respectively.
name: Example Workflow

jobs:
  job1:
    matrix:
      os: [x86_64, s390x]
    runs-on: {{ matrix.os }}

    steps:
      - name: Pre Step
        run: echo "Running Pre Step"
        if: {{ matrix.os == 's390x' }}

      - name: Main Step
        run: echo "Running Main Step"

      - name: Post Step
        if: {{ matrix.os == 's390x' }}
        run: echo "Running Post Step"
Thinking that the {pre,post} steps are only required for s390x, the conditional if: {{ matrix.os == 's390x' }} at the post step prevents the step from achieving what if: always() did before because it is not 100% true. Consequently, the post step fails to execute when other steps encounter an error.

Upon further consideration, a potential resolution could involve passing a matrix element as an environment variable into a step, like:
      - name: Post Step
        if: always()
        run: echo "Dealing with a step selectively based on an environment variable `OS`"
        env:
          OS: {{ matrix.os }}

Indeed it is a possible solution. You can use an environment variable as shown above and/or you can check the existence of a post-step script:

      - name: Take a post-action
        if: always()
        run: |
          if [ -f "${HOME}/script/post_action.sh" ];then
            ${HOME}/script/post_action.sh cc-operator-pr
          fi

Likewise the pre-step.

Another solution would be to decouple the post-step from this workflow. If you use the https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run event than you can get a 2nd workflow ran afterwards which could run the post-step actions. However I think this solution kind of overkill.

I can confidently state that the introduction of the {pre,post} actions wouldn't pose any issues for incorporating workflows across multiple architectures. Thanks.

wainersm · 2023-12-12T18:29:59Z

@ryansavino could you also jump in this discussion? Essentially @BbolroC, @portersrc and I have figured out ways to have a multi-arch and non-TEE/TEE integrated workflow for running the e2e tests. I do believe that for SEV/SNP you will need to have pre/post actions as well.

BbolroC · 2024-01-11T18:07:21Z

As discussed above, the workflows introduced in this PR are incorporated into the existing x86_64 workflow ccruntime_e2e.yaml. So the workflows of interest are triggered not only by PR, but also nightly.

The whole workflow looks like https://github.com/BbolroC/cc-operator/actions/runs/7492328011/job/20395717523.

@wainersm @stevenhorsman @fidencio please take a look at this and give me a feedback. Thanks!

wainersm · 2024-01-15T20:23:41Z

.github/workflows/ccruntime_e2e.yaml

-          ./run-local.sh -r "${{ matrix.runtimeclass }}" -u
+          args="-u"
+          if [ $RUNNING_INSTANCE = "s390x" ]; then
+            args=""


@BbolroC so s390x is not going to run the undo (-u) operations from the e2e framework. Will it be done (undo) by the post-action then?

Yes, it was concluded that the option -u is not suitable for a self-hosted runner context. The functionality of the option is delegated to the {pre,post}-action.

wainersm · 2024-01-15T20:29:41Z

.github/workflows/ccruntime-on-push.yaml

+
+jobs:
+  e2e-on-push:
+    if: ${{ contains(github.event.pull_request.labels.*.name, 'ok-to-test') }}


Very nice that you are protecting the pipeline with labels. We even talked today on #confidentia-containers-ci about securing our pipelines and this is one measurement.

wainersm · 2024-01-15T21:28:01Z

.github/workflows/ccruntime-on-push.yaml

@@ -0,0 +1,17 @@
+name: ccruntime e2e test on-push
+on:
+  pull_request_target:


Changing from pull_request to pull_request_target will change the behavior of actions/checkout@v4, i.e., it will check out the pull request tree which is not rebased with main. We will need to rebase the check'ed code ourselves to emulate the same behavior of pull_request. This is a well-known problem, we can fix it as kata CI and in a follow up PR if you prefer.

Yeah, then I will borrow the script and place it in this PR rather than instituting it in the next PR.

wainersm · 2024-01-15T21:31:10Z

.github/workflows/ccruntime-on-push.yaml

@@ -0,0 +1,17 @@
+name: ccruntime e2e test on-push


Nit: not use on-push on the workflow name, file and jobs as this workflow is triggered on pull request.

Yes, I will rename ones relevant to this comment. Thanks for the comment.

ldoktor

I think @wainersm said it all, it looks almost ready, just the little naming issue and (if you have time) the pull_request_target rebase it should be ready to go.

BbolroC · 2024-01-16T12:36:39Z

@wainersm @ldoktor I've updated the PR based on your feedback. Looking forward to the next step. Thanks!

This PR is to migrate the following e2e tests for s390x to GHA: - confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-baseline - confidential-containers-operator-main-ubuntu-20.04-s390x-containerd_kata-qemu-PR - confidential-containers-operator-main-ubuntu-20.04-s390x-SE-daily They are incorporated into the existing x86_64 test workflow `ccruntime_e2e.yaml`. The workflows incorparated are triggered not only by PR, but also nightly at 2:00 UTC. Fixes: confidential-containers#294 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>

wainersm

I think it is ready to go. Thanks @BbolroC !

stevenhorsman · 2024-01-16T14:57:40Z

tests/git-helper.sh

+		# Recover from any previous rebase left halfway
+		git rebase --abort 2> /dev/null || true
+		if ! git rebase origin/${TARGET_BRANCH}; then
+			>&2 echo "Rebase failed, exiting"


Hey Choi, in the kata-containers version there was a different path for self-hosted runner than deleted the workspace directory to avoid merge conflicts impacting runs IIRC - is that needed here too?

I intentionally removed those lines here, wanting to see if that is also the case for this repo. Even if that is the case again, that could be handled by {pre,post}-action, I guess.

Great, thanks for the explanation!

stevenhorsman

LGTM

BbolroC · 2024-01-16T15:56:10Z

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

stevenhorsman · 2024-01-16T16:07:13Z

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

Just to confirm, this is just a one off where we need to remove them as required and then after this merge we've have new jobs to make required that superceded them?

BbolroC · 2024-01-16T16:11:02Z

Now, we are facing the pull_request_target problem (the old checks marked as required are not triggered any more). Please help me out!

Just to confirm, this is just a one off where we need to remove them as required and then after this merge we've have new jobs to make required that superceded them?

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

stevenhorsman · 2024-01-16T16:16:50Z

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

I think that makes sense to take the hit now, especially given we are far away from releases. @wainersm are you happy with this?

ldoktor

Thanks for the changes, what is the problem with the required tests? I remember the discussion but still new to this repo. In different repos when GH CI did not scale we simply tested the PR locally and pushed things via cmdline (where the required conditions are not checked). Worst case we can always revert if the GH runners find the main version broken.

wainersm · 2024-01-17T13:51:38Z

Yes, I think so, unfortunately. If we don't like this approach, let's stop proceeding things and start re-discussing what could be a better way to trigger the tests on the expected github events for a PR in a protective (resource) way.

I think that makes sense to take the hit now, especially given we are far away from releases. @wainersm are you happy with this?

Yes @stevenhorsman , I'm happy with that.

@BbolroC I added an entry on the agenda of our meeting today to discuss the methods we have used to protect the self-hosted runners.

wainersm · 2024-01-17T14:07:11Z

Thanks for the changes, what is the problem with the required tests? I remember the discussion but still new to this repo. In different repos when GH CI did not scale we simply tested the PR locally and pushed things via cmdline (where the required conditions are not checked). Worst case we can always revert if the GH runners find the main version broken.

Hi @ldoktor , let me see if I can explain accurately...

This PR change the current jobs so that they are triggered on a different event (pull_request_target) and now requires the ok-to-test label. If we add the ok-to-test label on this PR the new jobs will be triggered but will fail because on a pull_request_target it reads the workflows from the target branch (main) which doesn't exist until this PR is merged. I.e. it is chicken-and-eggs situation. So we will need to make the current jobs non-required so that we can merge the PR first.

This is to fix 2 issues identified during the follow-up test for confidential-containers#295. - git-helper.sh misses an execution permission - actions/checkout@v4 does not fetch the code from the git ref This commit is to fix these issues. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>

This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA. Builds on top of PR confidential-containers#295 Signed-off-by: Adithya Krishnan Kannan<adkannan@amd.com>

This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA. Builds on top of PR confidential-containers#295 Signed-off-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>

This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA. Builds on top of PR confidential-containers#295 Signed-off-by: Adithya Krishnan Kannan <13063254+AdithyaKrishnan@users.noreply.github.com>

This PR is to migrate the AMD SEV and SNP e2e tests from Jenkins to GHA. Includes some fixes suggested by ldoktor and fitzthum. Builds on top of PR confidential-containers#295 Signed-off-by: Adithya Krishnan Kannan <13063254+AdithyaKrishnan@users.noreply.github.com>

BbolroC force-pushed the migrate-to-gha-s390x branch from 696f3cb to ed0fe0b Compare November 29, 2023 12:37

wainersm reviewed Dec 12, 2023

View reviewed changes

BbolroC force-pushed the migrate-to-gha-s390x branch from ed0fe0b to 25fe509 Compare January 11, 2024 18:00

BbolroC changed the title ~~WIP tests/e2e: Migrate e2e test for s390x to GHA~~ tests/e2e: Migrate e2e test for s390x to GHA Jan 11, 2024

BbolroC force-pushed the migrate-to-gha-s390x branch from 25fe509 to 70c8cd1 Compare January 11, 2024 18:09

wainersm reviewed Jan 15, 2024

View reviewed changes

ldoktor reviewed Jan 16, 2024

View reviewed changes

BbolroC force-pushed the migrate-to-gha-s390x branch 3 times, most recently from 27a0adf to 71b3f69 Compare January 16, 2024 12:34

BbolroC force-pushed the migrate-to-gha-s390x branch 4 times, most recently from 8436322 to 4b8f594 Compare January 16, 2024 13:04

BbolroC force-pushed the migrate-to-gha-s390x branch from 4b8f594 to e1d7fe2 Compare January 16, 2024 13:31

wainersm approved these changes Jan 16, 2024

View reviewed changes

stevenhorsman reviewed Jan 16, 2024

View reviewed changes

stevenhorsman approved these changes Jan 16, 2024

View reviewed changes

BbolroC mentioned this pull request Jan 16, 2024

doc: Include custom build for DEVELOPMENT.md #322

Merged

ldoktor approved these changes Jan 17, 2024

View reviewed changes

wainersm merged commit 24fc700 into confidential-containers:main Jan 17, 2024
5 checks passed

BbolroC deleted the migrate-to-gha-s390x branch January 17, 2024 16:17

BbolroC mentioned this pull request Jan 17, 2024

Fix error on rebasing pr branch atop of target one #326

Merged

This was referenced Jan 17, 2024

CI: should not keep the PR 'ok-to-test' labeled #327

Open

Add GHA daily jobs #306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/e2e: Migrate e2e test for s390x to GHA #295

tests/e2e: Migrate e2e test for s390x to GHA #295

BbolroC commented Nov 28, 2023 •

edited

portersrc commented Dec 1, 2023

BbolroC commented Dec 4, 2023

wainersm Dec 12, 2023

BbolroC Dec 13, 2023

wainersm commented Dec 12, 2023

wainersm commented Dec 12, 2023

BbolroC commented Jan 11, 2024 •

edited

wainersm Jan 15, 2024

BbolroC Jan 16, 2024

wainersm Jan 15, 2024

wainersm Jan 15, 2024

BbolroC Jan 16, 2024

wainersm Jan 15, 2024

BbolroC Jan 16, 2024

ldoktor left a comment

BbolroC commented Jan 16, 2024

wainersm left a comment

stevenhorsman Jan 16, 2024 •

edited

BbolroC Jan 16, 2024

stevenhorsman Jan 16, 2024

stevenhorsman left a comment

BbolroC commented Jan 16, 2024 •

edited

stevenhorsman commented Jan 16, 2024

BbolroC commented Jan 16, 2024 •

edited

stevenhorsman commented Jan 16, 2024

ldoktor left a comment

wainersm commented Jan 17, 2024

wainersm commented Jan 17, 2024

tests/e2e: Migrate e2e test for s390x to GHA #295

tests/e2e: Migrate e2e test for s390x to GHA #295

Conversation

BbolroC commented Nov 28, 2023 • edited

portersrc commented Dec 1, 2023

BbolroC commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wainersm commented Dec 12, 2023

wainersm commented Dec 12, 2023

BbolroC commented Jan 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ldoktor left a comment

Choose a reason for hiding this comment

BbolroC commented Jan 16, 2024

wainersm left a comment

Choose a reason for hiding this comment

stevenhorsman Jan 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenhorsman left a comment

Choose a reason for hiding this comment

BbolroC commented Jan 16, 2024 • edited

stevenhorsman commented Jan 16, 2024

BbolroC commented Jan 16, 2024 • edited

stevenhorsman commented Jan 16, 2024

ldoktor left a comment

Choose a reason for hiding this comment

wainersm commented Jan 17, 2024

wainersm commented Jan 17, 2024

BbolroC commented Nov 28, 2023 •

edited

BbolroC commented Jan 11, 2024 •

edited

stevenhorsman Jan 16, 2024 •

edited

BbolroC commented Jan 16, 2024 •

edited

BbolroC commented Jan 16, 2024 •

edited