Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows #93

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nikimanoledaki
Copy link
Contributor

@nikimanoledaki nikimanoledaki commented May 14, 2024

What type of PR is this?

kind/enhancement

What this PR does / why we need it:

This proposal provides an automated way to fetch and run the benchmark tests in a modular way.

For context, please read Proposal 1 here first :)

Which issue(s) this PR fixes:

Fixes #86

Special notes for your reviewer (optional):

Copy link

@locomundo locomundo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added/proposed a few little things

website/content/docs/proposals/proposal-002-run.md Outdated Show resolved Hide resolved
website/content/docs/proposals/proposal-002-run.md Outdated Show resolved Hide resolved
website/content/docs/proposals/proposal-002-run.md Outdated Show resolved Hide resolved
website/content/docs/proposals/proposal-002-run.md Outdated Show resolved Hide resolved
@nikimanoledaki nikimanoledaki force-pushed the proposal-002-run branch 2 times, most recently from 4377aa0 to 7c6970c Compare May 31, 2024 10:10
locomundo pushed a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 26, 2024
Worked with @nikimanoledaki offline, pushing all our changes so far.
locomundo pushed a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 26, 2024
Worked with @nikimanoledaki offline, pushing all our changes so far.

Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com>
Signed-off-by: locomundo <flavia@aknostic.com>
locomundo pushed a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 26, 2024
Worked with @nikimanoledaki offline, pushing all our changes so far.

Signed-off-by: Flavia Paganelli <flavia@aknostic.com>
Co-Authored-By: nikimanoledaki <niki.manoledaki@grafana.com>
locomundo pushed a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 26, 2024
Worked with @nikimanoledaki offline, pushing all our changes so far.

Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com>
Signed-off-by: Flavia Paganelli <flavia@aknostic.com>
@nikimanoledaki nikimanoledaki changed the title Add Proposal 2: Run [Proposal 2: Run] Call and define benchmark tests as GitHub Action jobs & workflows Jun 28, 2024
nikimanoledaki added a commit to nikimanoledaki/green-reviews-tooling that referenced this pull request Jun 28, 2024
Worked with @nikimanoledaki offline, pushing all our changes so far.

Co-authored-by: nikimanoledaki <niki.manoledaki@grafana.com>
Signed-off-by: Flavia Paganelli <flavia@aknostic.com>
@nikimanoledaki nikimanoledaki force-pushed the proposal-002-run branch 2 times, most recently from 71c8a6d to e783d3b Compare June 28, 2024 10:39
Co-authored-by: locomundo <flavia@30mhz.com>
Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>

* Remove option 1 of defining job in-line
* Move proposal 2 to docs/proposals
* Defining test jobs for proposal 2
* Split use cases and add sections for auth, versioning
@nikimanoledaki nikimanoledaki marked this pull request as ready for review June 28, 2024 12:35
@nikimanoledaki nikimanoledaki requested a review from a team as a code owner June 28, 2024 12:35
# other Falco tests:
# - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
# - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
# TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would appreciate input on this one!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of starting with one test workflow per CNCF project. I think that's simpler and should work for the current Falco benchmarks.

We could take another look at this when we know what the next CNCF project will be. As mentioned earlier its hard to predict what the future requirements will be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am also in favor of having one workflow per project with multiple jobs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - simple is always better - we can always make it more complex if the need arises

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, though: for Falco is this going to work? Don't they have three different tests already?

Copy link
Contributor

@rossf7 rossf7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikimanoledaki @locomundo Great job this is looking good! 🚀

A few thoughts from me. Mainly on my favourite topic. Deleting stuff! 🧹😄

Let's recap some of the components defined in [Proposal 1](proposal-001-trigger-and-deploy.md):
1. **Green Reviews pipeline**: the Continuous Integration pipeline which deploys a CNCF project to a test cluster, runs a set of benchmarks while measuring carbon emissions and stores the results. It is implemented by the workflows listed below.
2. **Cron workflow**: This refers to the initial GitHub Action workflow (described in proposal 1) and which dispatches a project workflow (see next definition).
3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the benchmark pipeline has both the deploy and delete steps.

This was something I advocated for but I think it might be more flexible to split the delete so the cron workflow can call it at the end of the process. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, it would be better to split :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This needs to be changed in proposal 1?

# the action to take here depends on the Functional Unit of the CNCF project. wait for amount of time, for resources
kubectl apply -f https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/main/kustomize/falco-driver/ebpf/stress-ng.yaml
wait 15m
kubectl delete -f https://raw.githubusercontent.com/falcosecurity/cncf-green-review-testing/main/kustomize/falco-driver/ebpf/stress-ng.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an example but I think with a long wait this could be a problem if the apply fails for some reason.

Could we have the convention of delete being a separate step and ensure it always runs?

# other Falco tests:
# - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
# - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
# TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of starting with one test workflow per CNCF project. I think that's simpler and should work for the current Falco benchmarks.

We could take another look at this when we know what the next CNCF project will be. As mentioned earlier its hard to predict what the future requirements will be.

Copy link
Contributor

@AntonioDiTuri AntonioDiTuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job so far! I have only pointed out minor rephrasing and some questions I have

so that people aren't searching all over the place for lost context.
-->

* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to link this discussion? I think it is quite old and difficult to follow, I would personally remove it


**CNCF project maintainer creates a new benchmark test for their project**

As a project maintainer, I create and run a benchmark test from a separate GitHub repository or from the green-reviews-tooling repository, by following the steps indicated in the Green Reviews documentation, so that the project’s benchmark tests runs in the Green Reviews pipeline and so that I can see the project's sustainability metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would slightly rephrase:

As a project maintainer, I am interested in creating benchmark tests by following the steps indicated in the Green Reviews documentation. My main interest is that the benchmarks recreate a runtime environment that is as realistic as possible. Only with the right environment we can see the right metrics.

Comments:

  • I don't think we need details on where the benchmarks will be hosted in the user stories, I would keep it more high level
  • The rest is just rephrasing to simplify a bit


As a Green Reviews maintainer, I can help a CNCF project maintainers to define the Functional Unit of a project so that the project maintainers can create a benchmark test.

**Project maintainer modifies or removes a benchmark test**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNCF project maintainer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the CNCF project maintainer would be the developers of Falco for example. It's not clear?


As with every design document, there’s a risk that the solution doesn’t cover all cases, especially considering that at Green Reviews we have only had one test case (very appreciated guinea pig 🙂), which is Falco. This proposal needs to at least support that case. When other CNCF projects start using Green Reviews we will learn more and adapt the project as needed.

Another risk is the question of ownership between Green Reviews contributors and CNCF project maintainers. As much as possible, Green Reviews contributors should empower project maintainers to create the benchmark tests. This does not stop Green Reviews maintainers from creating new benchmark tests. However, project maintainers should be encouraged to lead this work. Otherwise, this could add load on Green Reviews maintainers and it is hard to scale. Project maintainers should lead this since they have the expertise about how the project scales and performs under load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify a bit and refer to the investigation issue:


Another risk is the question of ownership between Green Reviews contributors and CNCF project maintainers.

CNCF project maintainers should be allowed to define and maintain their own benchmarks to make sure that the right environment is configured. However if every projects run different benchmarks it would be though to compare results (sustainability metrics) between each other.

This needs deeper investigations that will be discussed in a separate proposal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we are talking about two different potential (and perhaps conflicting) issues here:

  • The scalability if we get many more CNCF projects interested and there's an expectation of Green Reviews WG to participate in writing the tests or in general helping the CNCF projects getting on board.
  • The validity of comparing results if every CNCF project has a different criteria for making these tests.

Perhaps we should mention both, with the link you suggest for the second, @AntonioDiTuri ?

Let's recap some of the components defined in [Proposal 1](proposal-001-trigger-and-deploy.md):
1. **Green Reviews pipeline**: the Continuous Integration pipeline which deploys a CNCF project to a test cluster, runs a set of benchmarks while measuring carbon emissions and stores the results. It is implemented by the workflows listed below.
2. **Cron workflow**: This refers to the initial GitHub Action workflow (described in proposal 1) and which dispatches a project workflow (see next definition).
3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, it would be better to split :)

3. **Benchmark workflow**: The project workflow is dispatched by the Cron workflow. A project workflow can be, for example, a Falco workflow. A project workflow deploys the project, calls the test workflow (see below), and then cleans up the deployment. A project workflow can be dispatched more than once if there are multiple subcomponents. In addition, a Project workflow, which is also just another GitHub Action workflow, contains a list of GitHub Action Jobs. These are a list of test jobs - more info below.

This proposal adds the following components:
1. **[new] Test job**: A test job is an instance of a GitHub Action Job. It is called right after deploying the CNCF project from the test workflow. The test job runs the test workflow and instructions for a CNCF project. Which benchmark test to run is defined by inputs in the calling workflow: a CNCF project and a subcomponent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the workflow at position 1 and the job at position 2: since the job is called from the workflow

# other Falco tests:
# - redis-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/redis.yaml
# - event-generator-test e.g. https://github.com/falcosecurity/cncf-green-review-testing/blob/main/kustomize/falco-driver/ebpf/falco-event-generator.yaml
# TODO: should each test be a workflow or a job in a single workflow? as in, one test workflow per cncf project or multiple workflows per cncf project? TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am also in favor of having one workflow per project with multiple jobs


The job has test instructions to apply the upstream Kubernetes manifest which contains a `while` loop that runs `stress-ng`. The manifest already defines where the test should run in the cluster i.e. in which namespace. The functional unit test is time-bound in this case and scoped to 15 minutes. Therefore, we deploy this test, wait for 15 minutes, then delete the manifest to end the loop. The test instructions depend on the functional unit of each CNCF project.

In the example above, the Kubernetes manifest that is applied to the cluster is located in a different repository, but this could be located anywhere. This is due to preferences for CNCF project maintainers who may have a own repository with configuration related to the Green Reviews pipeline, as is the case with the Falco project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify here:

In the example above, the Kubernetes manifest that is applied to the cluster is located in a different repository: this is the case of an externally defined benchmark


### Authentication

Before the test workflow is called on, we assume that the workflow already contains a secret with a kubeconfig to authenticate with the test cluster and Falco has already been deployed to it. It is required that the pipeline authenticates with the Kubernetes cluster before running the job with the test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this taken care of in proposal 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but every job start a new VM so we need to authenticate again for every job who need to interact with the cluster. I think the test workflow should take as input a kubeconfig and use it to authenticate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea we had as well @SFulpius 👍

## Alternatives

<!--
What other approaches did you consider, and why did you rule them out? These do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would list here the alternative design of having one workflow per job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add:

  • CNCF <-> benchmark test: one-to-one or one-to-many relationship? Start with 1:1 and support 1:many in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the test instructions better rendered with maybe some tiny example written in it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the arrows a bit so that it looks nicer?

so that people aren't searching all over the place for lost context.
-->

* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [Slack discussion on benchmark test framework](https://cloud-native.slack.com/archives/C060EDHN431/p1708416918423089?thread_ts=1708348336.259699&cid=C060EDHN431)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ACTION] Run
5 participants