[Proposal] Proof of Environmental Sustainability activities and best practices for CNCF projects #64

mkorbi · 2023-03-09T21:31:45Z

Description

We would implement a process/approach for CNCF projects (and others) to qualify their commitment to Environmental Sustainability and give them a KPI on their hand to show improvements.

Therefore we can leverage the Software Carbon Intensity (SCI). I think we will have to define 2 SCI.

Default SCI: Within the default SCI, we will define some values as the Embodied and Energy Carbon Intensity. Also, we will specify the test setup.
Custom SCI: The custom SCI is up to the project, and we will assist. Also, we will provide a reference implementation that can be used (eg. we identify an approach how to easily measure relevant data).

Besides the SCI we can think of a checklist out of the Green Software Patterns and how much they are adhered to by the projects where possible.

Both can be checked per release. If the project thinks of optimizations it will get visible over time.

In the beginning, the definition and KPIs tracked per project could be stored in an "ensu.md" within their repositories. Where a project has multiple components, either each will require its own definition or all together are evaluated.

In the future, we can further automate this and display the SCI value per project in a Grafana dashboard. We could add tags like the following to the projects as indicator etc.

We could add this as entry criteria for CNCF projects when the approach is matured.

Impact

This activity will increase the visibility and relevance of or topic.
Projects have a reference value against which they can optimize their software.
We will provide a relevant approach to how to implement this measurement.

Timeline

I would like to postpone working on this until we have all deliverables for KubeCon EU. However, if this proposal founds support within the TAG, I want to introduce this idea also to the TOC and receive their feedback, before we start planning activities.

Scope

The initial scope should focus on a pragmatic approach. Therefore, we need 3-5 projects for testing before rolling it out. Also, we should work and push this topic top down, from the big graduated to the smaller incubated and sandboxed projects.

Resources

Original working document: https://docs.google.com/document/d/1qLjYlkOvcduxRnGo2-zfaGb9170Ziq0Pf9-OeI1GFqo/edit#
WG Proposal: [Proposal] Green Reviews Working Group #116
WG Proposal working Doc: https://docs.google.com/document/d/18MkLryOSbZpSsgl_ileLbHp_K4xn8TqxMzcHDtNuY_o/edit

Sealjay · 2023-03-22T16:03:26Z

Hi @mkorbi - we discussed this in our Green-Software-Foundation/opensource-wg#75 meeting today - @srini1978 of Microsoft has been working on a way to generate CarbonQL scores in an automated way

Are you interested in collaborating with us on this? FYI @jawche @seanmcilroy29 @dtoakley-tw @dtoakley

We also have a guide here: https://sci-guide.greensoftware.foundation/

mkorbi · 2023-03-23T21:53:35Z

hey @Sealjay we are looking forward to working with you on this and I'm very interested in @srini1978 approach. I joined the meeting yesterday, but no one (1-2 silent people) joined within the first 10min, maybe I have the wrong invite.

About the guide, I'm well aware. For me would be the question of whats the right approach to support the OSS projects in it. Thats why I came up with the idea to find a half way generic SCI spec to get things rolling and then in extend to do custom SCIs. Not sure if it makes sense though.

Sealjay · 2023-03-23T22:33:11Z

Perfect @mkorbi! And sorry, we moved the meeting to :15 past the hour yesterday.

So this is the CarbonQL project:
https://github.com/Green-Software-Foundation/carbon-ql

I'm not sure I understand the idea of a custom SCI though - is this about defining how the score was calculated and the variables used? If so, that might be related to the SCI reporting requirements:
https://github.com/Green-Software-Foundation/sci-reporting/blob/main/reporting_requirements.md

Sealjay · 2023-03-29T12:57:21Z

@mkorbi it will be at :30 past the hour next week going forward; still happy to discuss.

Automated scoring project doesn't exist yet, but is being kicked off.

mkorbi · 2023-04-04T15:37:55Z

Keeping track on here:

I had a meeting with @incertum and @jasondellaluce on how we can get started with that topic on Falco as they asked at the same time for advice on how to improve energy efficiency.
Post-kubecon we will move ahead as the end of Mai will be also a new Falco release. Until then we discussed that we as TAG will proceed in working out some more details.

TheFoxAtWork · 2023-05-22T19:22:35Z

Checking in here on this - specifically best practices for CNCF projects IAW with this group's charter:

Capabilities, benchmarks, and processes to evaluate technological and architectural health of projects

Is the expectation on this issue's deliverable to be an initial guide in evaluating resource consumption for projects in a default configuration so that interested projects can receive such an evaluation from this TAG? there is a balance between projects in CNCF that can do this for things running in an environment, however there is also an outstanding need for projects to understand how they are performing and areas they could improve so that as they develop features and other capabilities, this is in the forefront of those decisions. Even starting with common trade-offs for efficiency would be beneficial.

catblade · 2023-05-22T20:12:58Z

@TheFoxAtWork This whole space is fairly new and I don't think we are anywhere near where we want to be regarding measurement. The tooling regarding networking consumption or cooling offsets for CPUs heating (cooling is 30-50% of datacenter costs) are not there. Most measurements currently have to do more with the power use of the CPUs, maybe something regarding memory use. I think stating maybe what is in existence regarding measurements (which we have some of in the landscape doc) and then continuing to expand on what capabilities exist is helpful.

Let me give an example of my concerns:
The SCI repo with their published example here:
https://github.com/Green-Software-Foundation/sci/blob/main/case-studies/eshoppen.md

They talk about the energy consumption being measured here:
https://github.com/Green-Software-Foundation/sci/blob/main/case-studies/eshoppen.md#energy-e
P[kwH] = (Power consumed by CPU or Pc Number of cores + Power consumed by Memory or Pr + Power consumed by GPU or Pg Number of GPUs)/1000 (

This slice of energy measurement, for instance, misses the networking power consumption (and "those switches can be indistinguishable from blast furnaces that happen to route packets"-not my words) and heating requirements (for everything involved-sometimes those labs sound like jet engines because of the power required to cool). If we dig into even more optimal ways we can save energy, I am also unaware of anything that measures the amount of energy required by things like crossing the UPI bus in a multi-socket system in the case that the CPU/memory/GPUs are not co-located in the same NUMA node. I'm sure there are other pieces I'm missing, like packet-processing core consumption (something that may be done as part of auxiliary functionality on the board), length of time to run a process according to its efficiency, et cetera. Additionally, I suspect that not only does the CPU Utilization not scale linearly with power consumption, but also the heat generated does not.

And none of this talks about time-to-failure, as may be discussed on this paper here (HPC has done a lot of work around the space of saving power in massively distributed systems): https://www.osti.gov/servlets/purl/1140455

(things that keep me up at night)

...

Which is a very long way of saying I worry and this is not an easy space.

TheFoxAtWork · 2023-05-22T20:22:37Z

Understood. Recommend narrowing the focus to areas where we've got something we can begin with - CPU & GPU utilization, cores, & memory. How do our projects today measure up across those categories? are there specific operations, configurations, and functions that increase or decrease those categories? What about specific functionality that could be smartly considered to reduce across those categories? Take security and event logging for instance, is it more efficient to do on-host processing or send logs off host to a central service to aggregate, analyze, process, and display? Can we educate adopters on reasonable expectations for logging to reduce consumption? i.e. logging touches these "hot spots" in sustainable computing: processes, storage, networking, detection, etc. these hot spots have other consideration to reduce their footprint, do you really need to "log all the things"? balancing why its needed, what it conveys, and other observations that convey the same value for less consumption. can logging be limited until indicators of an issue occur which in-turn trigger on-demand expanded logging?

catblade · 2023-05-22T22:47:48Z

@TheFoxAtWork You are absolutely correct. We have a tendency to measure everything and goodness gracious do we love our dashboards.

I've been advocating we partner with something like GSF with SCI in the "what" to measure, and generally how to get those metrics, and then as part of the CNCF TAG work here do more of the "how" and "minimal resource consumption" part.

I would like more scientists/industrial engineers involved. Part of my general concern is that CPUs are such a small part of the total power consumption while cooling is a larger part of that total usage. We may be optimizing for the components that are much smaller in impact over other factors.

We also have to be aware that some customer requirements, depending on how the chips work and whether the kernel scheduler looking at the current core usage causes an interrupt, will not find acceptable those measurement methods (think traffic that cares about kernel interrupts, like most things with quick packet processing). I know that K8s does not allow for core assignment like that, but there are many workarounds that let us get around that (see CMK for instance) which are being used in industry.

Sealjay · 2023-05-22T23:13:48Z

@TheFoxAtWork This whole space is fairly new and I don't think we are anywhere near where we want to be regarding measurement. [...

Let me give an example of my concerns:
[...]

This slice of energy measurement, for instance, misses the networking power consumption (and "those switches can be indistinguishable from blast furnaces that happen to route packets"-not my words) and heating requirements (for everything involved-sometimes those labs sound like jet engines because of the power required to cool). If we dig into even more optimal ways we can save energy, [...]

I'd agree on networking - to note that the SCI is a living document, so please do propose PRS or issues to include networking considerations further.

It's part of a bigger landscape for us - so we include networking in our training patterns ( https://patterns.greensoftware.foundation/catalog/cloud/reduce-transmitted-data ) and it's in discussions for some of the other measurement tooling like carbonQL.

catblade · 2023-05-23T12:37:19Z

@Sealjay do you think we could get the GSF to give a presentation at our next meeting, on the 7th of June, on SCI and current efforts shaped around that?

Sealjay · 2023-05-23T14:52:26Z

Sure, I'll drop Abhishek and Henry an email (they are the chairs of the Standards WG.)

catblade · 2023-05-23T17:12:26Z

Contact me via slack (or @mkorbi or @leonardpahlke or @caradelia and we can make sure it is reflected on the agenda ahead of time.

leonardpahlke · 2023-05-23T19:27:13Z

Contact me via slack (or @mkorbi or @leonardpahlke or @caradelia and we can make sure it is reflected on the agenda ahead of time.

Please do so by dropping a message to the #tag-env-sustainability channel - so others are aware. Looking forward to it 🙌

Sealjay · 2023-05-23T23:20:04Z

I'm on a mobile without slack at the moment, but just dropped @catblade and @mkorbi an email - @Henry-WattTime is happy to support.

nikimanoledaki · 2023-06-15T17:26:24Z

There is an ongoing discussion among GSF folks about how to measure the energy consumption of networking, as raised by @catblade: Green-Software-Foundation/sci-guide#13

That repo has a bunch of other open issues around this. It looks like the right place to start a similar investigation on how to quantify the energy consumed by cooling. There is a reference to cooling here, which points to: https://devblogs.microsoft.com/sustainable-software/how-to-measure-the-power-consumption-of-your-backend-service/, which uses the Thermal Design Point as a reference.

Echoing what @catblade said on the GSF leading "what" to measure while the TAG takes care of "how" to do that in a cloud-native context.

There are open-ended questions about handling known unknowns, mapping these, and incorporating new/evolving data points. However, it would be great to start collecting data on what we can already measure (CPU, GPU, memory), as @TheFoxAtWork said.

SCI = (E * I) + M per R
(E) - Energy consumption
(I) - Emissions factors
(M) - Embodied emissions data for servers

While we have some of the Energy component, it is equally challenging to collect data in a consistent way for carbon emissions (I) and embodied carbon emissions (M) in the context of the cloud. There are a few different methods for doing this. We may want to start by crowdsourcing and documenting these different methodologies. @mkorbi I remember you mentioned something around this during the project meetup at KubeCon EU?

Lastly, the Tools & Practices / GreenOps / etc WG has gathered some momentum and a group of folks who would like to contribute to this kind of technical work, but it lacks focus so we are trying to narrow down the scope. In the last meeting, we decided to shift focus to support this initiative. How about we merge this project and the WG effort?

leonardpahlke · 2023-06-20T11:04:16Z

FYI: falcosecurity/falco#2435 (comment)

TheFoxAtWork · 2023-06-29T14:18:28Z

I just reviewed the proposal - this is much more narrowly scoped and looks great. I left a series of comments - mostly focusing on further refinement (there is a significantly large Level of effort in item 4 that requires a lot of front-loading to be successful).

leonardpahlke · 2023-06-29T15:09:38Z

Updated the issue description to mention the WG proposal and the current working document to collaborate on the WG charter.

mkorbi · 2023-07-21T07:43:48Z

Service Desk ticket opened for an account and hardware

mkorbi · 2023-08-14T12:36:52Z

I am currently running some testing and screwing things together, which slowly leads me to the following picture we can provide.

In that case, we would have to provide:

GH Action template
GH Action that executes the testing
Test server with the setup
dev stats config?
Shema/approach for the test e.g. idle, avrg. load, scaled load?

/cc @nikimanoledaki @guidemetothemoon

immavalls · 2023-08-23T15:15:03Z

@mkorbi happy to chime in if help is needed with k6.io, as this is my squad at Grafana. For projects on k8s maybe the k6-operator can work well, not requiring a test server. On GitHub have you tried to use a docker image or a k6 GitHub action?

leonardpahlke · 2024-04-03T14:58:43Z

I will close this issue since we started the WG Green Reviews last summer which addresses this issue.

mkorbi added the issue/tracking Tracking action items label Mar 9, 2023

mkorbi added this to the KubeCon+CloudNativeCon NA 2023 milestone Mar 9, 2023

mkorbi self-assigned this Mar 9, 2023

Sealjay mentioned this issue Mar 22, 2023

2023 03 22 Green-Software-Foundation/opensource-wg#75

Closed

33 tasks

mkorbi mentioned this issue Mar 26, 2023

[UMBRELLA] Falco collaboration with CNCF tag-env-sustainability falcosecurity/falco#2435

Open

seanmcilroy29 mentioned this issue Apr 5, 2023

2023 04 05 Green-Software-Foundation/opensource-wg#77

Closed

25 tasks

Sealjay mentioned this issue Apr 19, 2023

2023 04 19 Green-Software-Foundation/opensource-wg#81

Closed

15 tasks

mkorbi added the info/help-wanted Extra attention is needed label May 29, 2023

nikimanoledaki mentioned this issue Jun 7, 2023

[Proposal] Green Reviews Working Group #116

Closed

12 tasks

leonardpahlke assigned nikimanoledaki and leonardpahlke Jul 5, 2023

guidemetothemoon mentioned this issue Jul 15, 2023

Add TAG's Working Groups section and Green Reviews WG #151

Merged

mkorbi mentioned this issue Aug 9, 2023

TAG Environmental Sustainability Carbon Intensity Measurement cncf/cluster#246

Closed

nikimanoledaki mentioned this issue Aug 28, 2023

[Tracking, Green Reviews WG] Design Green Reviews WG pipeline workflow #182

Closed

4 tasks

AntonioDiTuri mentioned this issue Oct 4, 2023

[BLOG] Getting started as a TAG ENV Contributor #212

Closed

19 tasks

leonardpahlke closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Proof of Environmental Sustainability activities and best practices for CNCF projects #64

[Proposal] Proof of Environmental Sustainability activities and best practices for CNCF projects #64

mkorbi commented Mar 9, 2023 •

edited by leonardpahlke

Sealjay commented Mar 22, 2023

mkorbi commented Mar 23, 2023

Sealjay commented Mar 23, 2023

Sealjay commented Mar 29, 2023

mkorbi commented Apr 4, 2023

TheFoxAtWork commented May 22, 2023

catblade commented May 22, 2023

TheFoxAtWork commented May 22, 2023

catblade commented May 22, 2023

Sealjay commented May 22, 2023

catblade commented May 23, 2023

Sealjay commented May 23, 2023

catblade commented May 23, 2023

leonardpahlke commented May 23, 2023

Sealjay commented May 23, 2023

nikimanoledaki commented Jun 15, 2023

leonardpahlke commented Jun 20, 2023

TheFoxAtWork commented Jun 29, 2023

leonardpahlke commented Jun 29, 2023 •

edited

mkorbi commented Jul 21, 2023

mkorbi commented Aug 14, 2023 •

edited

immavalls commented Aug 23, 2023 •

edited

leonardpahlke commented Apr 3, 2024

[Proposal] Proof of Environmental Sustainability activities and best practices for CNCF projects #64

[Proposal] Proof of Environmental Sustainability activities and best practices for CNCF projects #64

Comments

mkorbi commented Mar 9, 2023 • edited by leonardpahlke

Description

Impact

Timeline

Scope

Resources

Sealjay commented Mar 22, 2023

mkorbi commented Mar 23, 2023

Sealjay commented Mar 23, 2023

Sealjay commented Mar 29, 2023

mkorbi commented Apr 4, 2023

TheFoxAtWork commented May 22, 2023

catblade commented May 22, 2023

TheFoxAtWork commented May 22, 2023

catblade commented May 22, 2023

Sealjay commented May 22, 2023

catblade commented May 23, 2023

Sealjay commented May 23, 2023

catblade commented May 23, 2023

leonardpahlke commented May 23, 2023

Sealjay commented May 23, 2023

nikimanoledaki commented Jun 15, 2023

leonardpahlke commented Jun 20, 2023

TheFoxAtWork commented Jun 29, 2023

leonardpahlke commented Jun 29, 2023 • edited

mkorbi commented Jul 21, 2023

mkorbi commented Aug 14, 2023 • edited

immavalls commented Aug 23, 2023 • edited

leonardpahlke commented Apr 3, 2024

mkorbi commented Mar 9, 2023 •

edited by leonardpahlke

leonardpahlke commented Jun 29, 2023 •

edited

mkorbi commented Aug 14, 2023 •

edited

immavalls commented Aug 23, 2023 •

edited