[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

nikimanoledaki · 2024-05-14T14:49:04Z

What type of PR is this?

kind/enhancement

What this PR does / why we need it:

This proposal provides an automated way to fetch and run the benchmark tests in a modular way.

For context, please read Proposal 1 here first :)

Which issue(s) this PR fixes:

Fixes #86

Special notes for your reviewer (optional):

locomundo

added/proposed a few little things

website/content/docs/proposals/proposal-002-run.md

@nikimanoledaki

Worked with @nikimanoledaki offline, pushing all our changes so far.

@nikimanoledaki

Worked with @nikimanoledaki offline, pushing all our changes so far. Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com> Signed-off-by: locomundo <flavia@aknostic.com>

@nikimanoledaki

Worked with @nikimanoledaki offline, pushing all our changes so far. Signed-off-by: Flavia Paganelli <flavia@aknostic.com> Co-Authored-By: nikimanoledaki <niki.manoledaki@grafana.com>

@nikimanoledaki

Worked with @nikimanoledaki offline, pushing all our changes so far. Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com> Signed-off-by: Flavia Paganelli <flavia@aknostic.com>

@nikimanoledaki

Worked with @nikimanoledaki offline, pushing all our changes so far. Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com> Signed-off-by: Flavia Paganelli <flavia@aknostic.com>

Co-authored-by: locomundo <flavia@30mhz.com> Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com> * Remove option 1 of defining job in-line * Move proposal 2 to docs/proposals * Defining test jobs for proposal 2 * Split use cases and add sections for auth, versioning

nikimanoledaki · 2024-06-28T12:36:18Z

docs/proposals/proposal-002-run.md

+  # other Falco tests:
+  # - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
+  # - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
+  # TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD


Would appreciate input on this one!

I'd be in favour of starting with one test workflow per CNCF project. I think that's simpler and should work for the current Falco benchmarks.

We could take another look at this when we know what the next CNCF project will be. As mentioned earlier its hard to predict what the future requirements will be.

Yes I am also in favor of having one workflow per project with multiple jobs

I agree - simple is always better - we can always make it more complex if the need arises

One question, though: for Falco is this going to work? Don't they have three different tests already?

rossf7

@nikimanoledaki @locomundo Great job this is looking good! 🚀

A few thoughts from me. Mainly on my favourite topic. Deleting stuff! 🧹😄

rossf7 · 2024-07-02T17:51:24Z

docs/proposals/proposal-002-run.md

+Let's recap some of the components defined in [Proposal 1](proposal-001-trigger-and-deploy.md):
+1. **Green Reviews pipeline**: the Continuous Integration pipeline which deploys a CNCF project to a test cluster, runs a set of benchmarks while measuring carbon emissions and stores the results. It is implemented by the workflows listed below.
+2. **Cron workflow**: This refers to the initial GitHub Action workflow (described in proposal 1) and which dispatches a project workflow (see next definition).
+3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.


At the moment the benchmark pipeline has both the deploy and delete steps.

This was something I advocated for but I think it might be more flexible to split the delete so the cron workflow can call it at the end of the process. WDYT?

Yes, I agree, it would be better to split :)

Sounds good. This needs to be changed in proposal 1?

rossf7 · 2024-07-02T17:55:03Z

docs/proposals/proposal-002-run.md

+          # the action to take here depends on the Functional Unit of the CNCF project. wait for amount of time, for resources
+          kubectl apply -f https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/main/kustomize/falco-driver/ebpf/stress-ng.yaml
+          wait 15m
+          kubectl delete -f https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/main/kustomize/falco-driver/ebpf/stress-ng.yaml


I know this is an example but I think with a long wait this could be a problem if the apply fails for some reason.

Could we have the convention of delete being a separate step and ensure it always runs?

rossf7 · 2024-07-02T17:59:08Z

docs/proposals/proposal-002-run.md

+  # other Falco tests:
+  # - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
+  # - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
+  # TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD


I'd be in favour of starting with one test workflow per CNCF project. I think that's simpler and should work for the current Falco benchmarks.

We could take another look at this when we know what the next CNCF project will be. As mentioned earlier its hard to predict what the future requirements will be.

AntonioDiTuri

Great job so far! I have only pointed out minor rephrasing and some questions I have

AntonioDiTuri · 2024-07-08T14:51:46Z

docs/proposals/proposal-002-run.md

+so that people aren't searching all over the place for lost context.
+-->
+
+* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)


Do we need to link this discussion? I think it is quite old and difficult to follow, I would personally remove it

AntonioDiTuri · 2024-07-08T15:06:37Z

docs/proposals/proposal-002-run.md

+
+**CNCF project maintainer creates a new benchmark test for their project**
+
+As a project maintainer, I create and run a benchmark test from a separate GitHub repository or from the green-reviews-tooling repository, by following the steps indicated in the Green Reviews documentation, so that the project’s benchmark tests runs in the Green Reviews pipeline and so that I can see the project's sustainability metrics.


I would slightly rephrase:

As a project maintainer, I am interested in creating benchmark tests by following the steps indicated in the Green Reviews documentation. My main interest is that the benchmarks recreate a runtime environment that is as realistic as possible. Only with the right environment we can see the right metrics.

Comments:

I don't think we need details on where the benchmarks will be hosted in the user stories, I would keep it more high level

The rest is just rephrasing to simplify a bit

AntonioDiTuri · 2024-07-08T15:07:47Z

docs/proposals/proposal-002-run.md

+
+As a Green Reviews maintainer, I can help a CNCF project maintainers to define the Functional Unit of a project so that the project maintainers can create a benchmark test.
+
+**Project maintainer modifies or removes a benchmark test**


CNCF project maintainer?

Yes, so the CNCF project maintainer would be the developers of Falco for example. It's not clear?

AntonioDiTuri · 2024-07-08T15:20:38Z

docs/proposals/proposal-002-run.md

+
+As with every design document, there’s a risk that the solution doesn’t cover all cases, especially considering that at Green Reviews we have only had one test case (very appreciated guinea pig 🙂), which is Falco. This proposal needs to at least support that case. When other CNCF projects start using Green Reviews we will learn more and adapt the project as needed.
+
+Another risk is the question of ownership between Green Reviews contributors and CNCF project maintainers. As much as possible, Green Reviews contributors should empower project maintainers to create the benchmark tests. This does not stop Green Reviews maintainers from creating new benchmark tests. However, project maintainers should be encouraged to lead this work. Otherwise, this could add load on Green Reviews maintainers and it is hard to scale. Project maintainers should lead this since they have the expertise about how the project scales and performs under load.


I would simplify a bit and refer to the investigation issue:

Another risk is the question of ownership between Green Reviews contributors and CNCF project maintainers.

CNCF project maintainers should be allowed to define and maintain their own benchmarks to make sure that the right environment is configured. However if every projects run different benchmarks it would be though to compare results (sustainability metrics) between each other.

This needs deeper investigations that will be discussed in a separate proposal.

Yes, I think we are talking about two different potential (and perhaps conflicting) issues here:

The scalability if we get many more CNCF projects interested and there's an expectation of Green Reviews WG to participate in writing the tests or in general helping the CNCF projects getting on board.

The validity of comparing results if every CNCF project has a different criteria for making these tests.

Perhaps we should mention both, with the link you suggest for the second, @AntonioDiTuri ?

AntonioDiTuri · 2024-07-08T15:25:26Z

docs/proposals/proposal-002-run.md

+Let's recap some of the components defined in [Proposal 1](proposal-001-trigger-and-deploy.md):
+1. **Green Reviews pipeline**: the Continuous Integration pipeline which deploys a CNCF project to a test cluster, runs a set of benchmarks while measuring carbon emissions and stores the results. It is implemented by the workflows listed below.
+2. **Cron workflow**: This refers to the initial GitHub Action workflow (described in proposal 1) and which dispatches a project workflow (see next definition).
+3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.


Yes, I agree, it would be better to split :)

AntonioDiTuri · 2024-07-08T15:42:43Z

docs/proposals/proposal-002-run.md

+3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.
+
+This proposal adds the following components:
+1. **[new] Test job**: A test job is an instance of a GitHub Action Job. It is called right after deploying the CNCF project from the test workflow. The test job runs the test workflow and instructions for a CNCF project. Which benchmark test to run is defined by inputs in the calling workflow: a CNCF project and a subcomponent.


I would put the workflow at position 1 and the job at position 2: since the job is called from the workflow

AntonioDiTuri · 2024-07-08T15:46:10Z

docs/proposals/proposal-002-run.md

+  # other Falco tests:
+  # - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
+  # - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
+  # TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD


Yes I am also in favor of having one workflow per project with multiple jobs

AntonioDiTuri · 2024-07-08T15:48:47Z

docs/proposals/proposal-002-run.md

+
+The job has test instructions to apply the upstream Kubernetes manifest which contains a `while` loop that runs `stress-ng`. The manifest already defines where the test should run in the cluster i.e. in which namespace. The functional unit test is time-bound in this case and scoped to 15 minutes. Therefore, we deploy this test, wait for 15 minutes, then delete the manifest to end the loop. The test instructions depend on the functional unit of each CNCF project.
+
+In the example above, the Kubernetes manifest that is applied to the cluster is located in a different repository, but this could be located anywhere. This is due to preferences for CNCF project maintainers who may have a own repository with configuration related to the Green Reviews pipeline, as is the case with the Falco project.


I would simplify here:

In the example above, the Kubernetes manifest that is applied to the cluster is located in a different repository: this is the case of an externally defined benchmark

AntonioDiTuri · 2024-07-08T15:49:17Z

docs/proposals/proposal-002-run.md

+
+### Authentication
+
+Before the test workflow is called on, we assume that the workflow already contains a secret with a kubeconfig to authenticate with the test cluster and Falco has already been deployed to it. It is required that the pipeline authenticates with the Kubernetes cluster before running the job with the test.


Is this taken care of in proposal 1?

Yes but every job start a new VM so we need to authenticate again for every job who need to interact with the cluster. I think the test workflow should take as input a kubeconfig and use it to authenticate.

Yes, that was the idea we had as well @SFulpius 👍

AntonioDiTuri · 2024-07-08T15:49:53Z

docs/proposals/proposal-002-run.md

+## Alternatives
+
+<!--
+What other approaches did you consider, and why did you rule them out? These do


I would list here the alternative design of having one workflow per job

To add:

CNCF <-> benchmark test: one-to-one or one-to-many relationship? Start with 1:1 and support 1:many in the future

AntonioDiTuri · 2024-07-08T15:56:47Z

docs/proposals/diagrams/green-reviews-pipeline-components.png

Can we have the test instructions better rendered with maybe some tiny example written in it?

AntonioDiTuri · 2024-07-08T15:57:44Z

docs/proposals/diagrams/calling-test-job.png

Can we improve the arrows a bit so that it looks nicer?

nikimanoledaki · 2024-07-09T14:19:01Z

docs/proposals/proposal-002-run.md

+so that people aren't searching all over the place for lost context.
+-->
+
+* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)


Suggested change

* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)

nikimanoledaki added the kind/enhancement label May 15, 2024

nikimanoledaki force-pushed the proposal-002-run branch 3 times, most recently from 6759913 to 991b8f0 Compare May 15, 2024 08:20

locomundo reviewed May 21, 2024

View reviewed changes

nikimanoledaki commented May 24, 2024

View reviewed changes

website/content/docs/proposals/proposal-002-run.md Outdated Show resolved Hide resolved

nikimanoledaki force-pushed the proposal-002-run branch 2 times, most recently from 4377aa0 to 7c6970c Compare May 31, 2024 10:10

locomundo pushed a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 26, 2024

Proposal 2 Run (cncf-tags#93)

18abcdb

Worked with @nikimanoledaki offline, pushing all our changes so far.

locomundo force-pushed the proposal-002-run branch from 18abcdb to 706c863 Compare June 26, 2024 10:44

locomundo force-pushed the proposal-002-run branch from 706c863 to d331d4d Compare June 26, 2024 10:49

locomundo force-pushed the proposal-002-run branch from d331d4d to 548a65e Compare June 26, 2024 10:53

nikimanoledaki changed the title ~~Add Proposal 2: Run~~ [Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows Jun 28, 2024

nikimanoledaki force-pushed the proposal-002-run branch from 9649002 to 17089f6 Compare June 28, 2024 09:25

nikimanoledaki force-pushed the proposal-002-run branch 2 times, most recently from 71c8a6d to e783d3b Compare June 28, 2024 10:39

nikimanoledaki force-pushed the proposal-002-run branch from e783d3b to 4b0af07 Compare June 28, 2024 12:30

nikimanoledaki marked this pull request as ready for review June 28, 2024 12:35

nikimanoledaki requested a review from a team as a code owner June 28, 2024 12:35

nikimanoledaki commented Jun 28, 2024

View reviewed changes

rossf7 reviewed Jul 2, 2024

View reviewed changes

AntonioDiTuri reviewed Jul 8, 2024

View reviewed changes

nikimanoledaki commented Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

nikimanoledaki commented May 14, 2024 •

edited

Loading

locomundo left a comment

nikimanoledaki Jun 28, 2024

rossf7 Jul 2, 2024

AntonioDiTuri Jul 8, 2024

locomundo Jul 10, 2024

locomundo Jul 10, 2024

rossf7 left a comment

rossf7 Jul 2, 2024

AntonioDiTuri Jul 8, 2024

locomundo Jul 10, 2024

rossf7 Jul 2, 2024

rossf7 Jul 2, 2024

AntonioDiTuri left a comment

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

locomundo Jul 10, 2024

AntonioDiTuri Jul 8, 2024

locomundo Jul 10, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

SFulpius Jul 9, 2024

locomundo Jul 10, 2024

AntonioDiTuri Jul 8, 2024

nikimanoledaki Jul 10, 2024

AntonioDiTuri Jul 8, 2024

AntonioDiTuri Jul 8, 2024

nikimanoledaki Jul 9, 2024


		CNCF project maintainer creates a new benchmark test for their project

		As a project maintainer, I create and run a benchmark test from a separate GitHub repository or from the green-reviews-tooling repository, by following the steps indicated in the Green Reviews documentation, so that the project’s benchmark tests runs in the Green Reviews pipeline and so that I can see the project's sustainability metrics.


		As a Green Reviews maintainer, I can help a CNCF project maintainers to define the Functional Unit of a project so that the project maintainers can create a benchmark test.

		Project maintainer modifies or removes a benchmark test


		As with every design document, there’s a risk that the solution doesn’t cover all cases, especially considering that at Green Reviews we have only had one test case (very appreciated guinea pig 🙂), which is Falco. This proposal needs to at least support that case. When other CNCF projects start using Green Reviews we will learn more and adapt the project as needed.

		Another risk is the question of ownership between Green Reviews contributors and CNCF project maintainers. As much as possible, Green Reviews contributors should empower project maintainers to create the benchmark tests. This does not stop Green Reviews maintainers from creating new benchmark tests. However, project maintainers should be encouraged to lead this work. Otherwise, this could add load on Green Reviews maintainers and it is hard to scale. Project maintainers should lead this since they have the expertise about how the project scales and performs under load.


		The job has test instructions to apply the upstream Kubernetes manifest which contains a `while` loop that runs `stress-ng`. The manifest already defines where the test should run in the cluster i.e. in which namespace. The functional unit test is time-bound in this case and scoped to 15 minutes. Therefore, we deploy this test, wait for 15 minutes, then delete the manifest to end the loop. The test instructions depend on the functional unit of each CNCF project.

		In the example above, the Kubernetes manifest that is applied to the cluster is located in a different repository, but this could be located anywhere. This is due to preferences for CNCF project maintainers who may have a own repository with configuration related to the Green Reviews pipeline, as is the case with the Falco project.


		### Authentication

		Before the test workflow is called on, we assume that the workflow already contains a secret with a kubeconfig to authenticate with the test cluster and Falco has already been deployed to it. It is required that the pipeline authenticates with the Kubernetes cluster before running the job with the test.

[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

Are you sure you want to change the base?

[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

Conversation

nikimanoledaki commented May 14, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer (optional):

locomundo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossf7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AntonioDiTuri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikimanoledaki commented May 14, 2024 •

edited

Loading