Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking, Green Reviews WG] Design Green Reviews WG pipeline workflow #182

Closed
4 tasks done
nikimanoledaki opened this issue Aug 28, 2023 · 10 comments
Closed
4 tasks done
Labels
board/wg-green-reviews Filter for the WG Green Reviews project board

Comments

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented Aug 28, 2023

Description

Following from the proposal "Proof of Environmental Sustainability activities and best practices for CNCF projects", the Green Reviews WG is setting up infrastructure to measure the sustainability footprint of CNCF projects. This issue tracks the design & implementation of this technical work.

Please add your suggestions / comments / questions to the Design Doc here! 🌱

Milestones

  • Create a document to start outlining the sections above.
  • Discuss the progress during Green Reviews WG meetings.
  • Identify the starting point (or multiple points in parallel) that can be worked on.
  • Break down work into actionable items that folks can take on.

[Tracking] Epic

Co-Authored By: Kristina D @guidemetothemoon

@nikimanoledaki nikimanoledaki changed the title [Tracking: Green Reviews] [Green Reviews] Aug 28, 2023
@nikimanoledaki nikimanoledaki changed the title [Green Reviews] [Tracking] Create design doc for Green Reviews WG project Aug 28, 2023
@leonardpahlke leonardpahlke added the board/wg-green-reviews Filter for the WG Green Reviews project board label Aug 28, 2023
@nikimanoledaki nikimanoledaki changed the title [Tracking] Create design doc for Green Reviews WG project [Tracking] Design workflow for Green Reviews WG project Aug 29, 2023
@nikimanoledaki nikimanoledaki changed the title [Tracking] Design workflow for Green Reviews WG project [Tracking] Design Green Reviews WG pipeline workflow Aug 29, 2023
@AntonioDiTuri
Copy link
Contributor

AntonioDiTuri commented Sep 13, 2023

I have few comments/questions:

Here it is the google doc tracking the implementation details and related discussion:

https://docs.google.com/document/d/19fzZW-IMv2kDNatKFHeHh7wqcEN0e2N60wzxvCGZd48/edit?usp=sharing

Questions:

In this [link](In this link I have seen some steps to trigger GH actions between different repos. Is it a valid approach?) I have seen some steps to trigger GH actions between different repos. Is it a valid approach?

Could you please clarify what we would need k6 for? I am not an expert and it is not so clear to me :)

Does Prometheus handle the remote write? Where can I read some doc on how?

Proposals:

We would maybe need to add a step which: "install the project: we would like to test.
e.g. In the test deployed server we would need to install the target project (e.g. Falco)

Can't wait to start!

@guidemetothemoon guidemetothemoon changed the title [Tracking] Design Green Reviews WG pipeline workflow [Tracking, Green Reviews WG] Design Green Reviews WG pipeline workflow Sep 27, 2023
@nikimanoledaki
Copy link
Contributor Author

nikimanoledaki commented Sep 27, 2023

I have seen some steps to trigger GH actions between different repos. Is it a valid approach?

This is the way to go, yes. The workflow to build and deploy the CNCF project should trigger the GH Action workflow in the green-reviews-tooling repo.

To achieve this, we need to add a workflow_dispatch in the CNCF project build pipeline (assuming they use GH Action). Then, we need to add a a repository_dispatch in the tooling repo. This will be the trigger for the load tests.

Could you please clarify what we would need k6 for?

These should be benchmark tests that perform an action against the CNCF Project. To use an example more familiar to me, in the example of a GitOps benchmark test, we create and deploy an application which Flux/ArgoCD then reconcile in the cluster. This would be the equivalent to the SCI Functional Unit, in theory:

The functional unit defines how your application scales. For instance, if your application scales by APIs then choose API as your functional unit.

This will look different for each CNCF Project. Unsure what this looks like for Falco - it is what we are trying to figure out next :)

In this scenario, the benchmark tests run in the same Node as Kepler is running, so that we can measure the energy consumption of the Functional Unit.

Does Prometheus handle the remote write?

devstats is a Grafana dashboard so we should be able to create a Prometheus data source for it. The Grafana instance will read from the Prometheus instance running on our Node (which contains the Kepler metrics). TBD :)
https://github.com/cncf/devstats/blob/master/GRAFANA.md

@leonardpahlke
Copy link
Member

@nikimanoledaki, @AntonioDiTuri the Falco project tests their deployments on Equinix using this Ansible config https://github.com/falcosecurity/kernel-testing/tree/main. Perhaps this could be a good starting point for us? We could copy the resources, deploying the setup (for testing purposes, -> running the Falco tests) then reducing the deployment (remove all settings we do not need) and adding new configuration (to enable assessing the SCI score, measuring energy etc.). We likely need to build a small project (in Go or so) next to the Ansible config to make some calculations.

cc @incertum

@incertum
Copy link

incertum commented Oct 2, 2023

@leonardpahlke, our Falco core maintainers meeting is on October 5th. Will check with the other maintainers (@FedeDP, @Andreagit97) to see how we can generalize the Ansible setup so that it can be used for more future projects beyond Falco. Voting in favor of creating a Go project that we can all contribute to. This may come in handy for future setup configurations and other aspects.

@incertum
Copy link

incertum commented Oct 5, 2023

I see that this ticket is referenced: https://www.bbb.org/us/il/aurora/profile/window-installation/green-t-windows-0654-88593900

Does this mean that we plan to run Kubernetes on a bare-metal Equinox cluster, without a hypervisor like EKS?

Do we plan to deploy each project directly on the hosts, or in the Falco case as a daemonset?

Alternatives such as Firecracker microVMs are being considered to work with Falco's setup: cncf-tags/green-reviews-tooling#1 (comment))

Please note that the Firecracker VM setup you are referencing is used to test different kernel versions. For the Green Reviews WG efforts, one kernel version is sufficient, and we would like to deploy Falco realistically, as the goal here is currently slightly different: we want to test performance.

@nikimanoledaki
Copy link
Contributor Author

nikimanoledaki commented Oct 9, 2023

@incertum that is correct, Equinix is our infrastructure so we will be running on a BM cluster without a hypervisor. In the linked issue about the cluster config, Ross and I are evaluating the pros/cons of different tooling and cluster setups. We have not decided on any yet - we're slightly blocked on how the testing will be for each CNCF Project which will determine what the cluster will look like and which tooling is best. We're looking into various CAPI setups but open to Ansible as well. Our only requirement is that we use IaaC/GitOps so that the configuration can live in the dedicated tooling repo: https://github.com/cncf-tags/green-reviews-tooling

Do we plan to deploy each project directly on the hosts, or in the Falco case as a daemonset?

This is a great question - we are still in the process of designing the E2E flow. We need to think about how we will scale when we test each project separately, with an emphasis on isolation when we run the tests for each project. This could be done with CAPI by running a new worker cluster for each project/test. However this could be too much overhead. We could also achieve this degree of isolation within a single cluster if we run each CNCF Project and its test on a dedicated Node if we do a combination of Namespace + Node taint/toleration for each project that we test. So we could run Falco as a Daemonset but essentially it would be 1 Pod that runs on 1 Node.

Please let me know what you think about all of this! Appreciate your feedback and that this is a WIP with a lot of back-and-forth to make sure we're building the right thing 😊

In order to make a decision about the above it would also help us first to decide what and how we will run the tests for each CNCF Project. With regard to what we test, I agree with @leonardpahlke's idea in the comment above that we could reuse the existing Falco test steps/structure with some changes to fit our use case. We can replicate the test steps even if the tooling is different.

With regard to how we run these tests, we should discuss whether we will use the Project's existing test tooling or a unified test tooling. For example, we could use the tooling that each CNCF Project uses, such as Ansible in the case of Falco. On the one hand, we could ask for Project maintainers to maintain this test suite, potentially. On the other hand, we introduce some discrepancy in how we test CNCF Projects if we use a different tool each time. @leonardpahlke curious as to what you have in mind for this part.

Alternatively, we would use a unified testing tooling such as k6, which has been brought up before but we are not sure yet that k6 is the right tool for testing Falco and other CNCF Projects. Before we move forward, it would be great to establish whether k6 is a good tool for this. @incertum, do you think it would be possible to collaborate with you and the other Falco maintainers to do a spike on porting the simplest Falco test to k6?

@immavalls works with k6 and brought up the following, here:

Will this be using https://github.com/marketplace/actions/k6-load-test? or is the plan to run the tests in a k8s cluster using https://github.com/grafana/k6-operator? I can help figure this part out with more information


we would like to deploy Falco realistically, as the goal here is currently slightly different: we want to test performance.

Could you expand further on what you mean with regard to deploying Falco realistically? Is there something we are missing or should look out for in your experience of how Falco is deployed?

@rossf7
Copy link

rossf7 commented Oct 13, 2023

@incertum I've been looking at https://github.com/falcosecurity/kernel-testing with @nikimanoledaki to see if we can reuse the approach you're using with GitHub Actions and Ansible.

I may well be missing something but high level AIUI the test is triggered by Falco's Prow instance which runs the Ansible playbook that connects to an Equinix machine using SSH.

Is the Equinix machine running Kubernetes and how do you manage it?

I ask because in our case we need to run the tests on a K8s node. This is so Kepler can attribute the energy consumption from the CPU socket to the Falco pods.

@incertum
Copy link

@rossf7

GitHub Actions + Ansible Kubernetes module could be a valid solution.
Any alternative will work for us, Ansible is not required from our side.

I'd like to clarify that our kernel testing involved spinning up VMs to test different kernels, and yes, SSH was used, along with some other setup requirements. However, for the Green Reviews WG + Falco, VMs and SSH are not necessary. I shared our setup to demonstrate how we successfully established a robust pipeline using GitHub Actions + Ansible against an Equinix machine. My intention was to illustrate that the steps of our pipeline could potentially inspire the Green Reviews workflow. Realizing maybe it complicates the discussion, apology for any confusion caused.

@rossf7
Copy link

rossf7 commented Oct 14, 2023

@incertum No worries and thank you this is very helpful!

GitHub Actions + Ansible Kubernetes module could be a valid solution.
Any alternative will work for us, Ansible is not required from our side.

We're looking into using Ansible and the Kubernetes module could be useful for our pipeline but good to know other tooling is an option.

For the cluster we can use the Equinix Ansible module but we still need to provision K8s. CAPI / CAPEM is a challenge without a management cluster. Kubeadm is also an option but it would be good to see other CNCF projects solve this.

@nikimanoledaki nikimanoledaki added this to the Measure the cloud native sustainability footprint of Falco manually milestone Oct 23, 2023
@nikimanoledaki nikimanoledaki removed this from the [Green Reviews WG] Measure the cloud native sustainability footprint of Falco manually milestone Jan 24, 2024
@nikimanoledaki
Copy link
Contributor Author

Closing this issue since we are tracking more specific work in the WG repository: https://github.com/cncf-tags/green-reviews-tooling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/wg-green-reviews Filter for the WG Green Reviews project board
Projects
None yet
Development

No branches or pull requests

5 participants