Collating data into an easier to use format #16

jawache · 2022-06-19T19:01:39Z

We have to assume most of the people heading to this repo eventually will have little expertise in this space. To be a success I believe this project needs to parse the vast information in this space and decide on concrete simplified (yes probably opinionated) numbers.

@mrchrisadams we're starting to get some great detailed answers in this repo. The questions seem to be boiling down to just two forms:

What's the kWh per X for something.
What's the Carbon per X for something.

Note: Carbon per X is also another way of saying an "SCI Score"

Once an issue is closed I propose we add it to a DATA FAQ document in the repo, with a format something like:

Question: What's the "XXX per XXX"?
Summary Answer: The summary/averaged/good-enough-for-most-situations number.
Detailed Answer: Link to the issue where this is discussed with more info (this also gives people the opportunity to comment on the choice incase they disagree) OR link to an external source of this information if we are just pointing to someone else's data.

The first place people should go to is the FAQ, and the thing we should be sharing round is the FAQ. Raising an issue should have a checkbox asking people if they have read the FAQ, raising an issue should hopefully eventually be a rare occurrence as most of the information is in the FAQ.

What do you think?

mrchrisadams · 2022-06-22T11:50:12Z

Hi @jawache, I had a discussion with co-chair Brian G. Hopper about this earlier in the week, and independent of seeing this issue, he ended up suggesting a similar approach.

We agreed that the creating a high level "annotated bibliography" for the topics below would be helpful, as while we're not getting incoming questions and answers, it's not very navigable.

Most of the time, as you say, people are looking for a few simple conversion factors with guidance on where it's safe to use, and what trade-offs are being made by using it, so that can have an idea of what to use for in each of the key areas of the SCI - i.e. the E, I and M in SCI = ((E * I) + M) per R

Where:

E = Energy consumed by a software system
I = Location-based marginal carbon emissions
M = Embodied emissions of a software system
R = Functional unit (e.g. carbon per additional user, API-call, ML job, etc)

(see more in the SCI methodology summary)

An updated approach

Brian is currently working on putting together high level outline for the figures below, based on the questions and answers shared so far, with guidance on how to get the 'good enough' figures, and pointers to the areas where we've identified a need for more research, and common pitfalls.

The structure is likely follow that shared below, and allow for dividing up responsibility to individual contributors to be responsible for some of the specific areas:

(I) - Emissions factors. These may be regional yearly averages to begin, but ideally marginal and at a finer granularity than that.
(M) - Embodied Emissions data for servers, mobile devices and laptops
(E) - Energy consumption (kilowatt hours) for different components of the software boundary, over a given time period. Examples:
    CPU/GPUs at different percentages of utilization
    Data storage
    Memory allocation
    Data transferred over a network

In each case we'll link to the sources providing ascending levels of detail, from global averages, to sources of info with much more detailed information like those provided by Boavsista, Wattime, Electricity Map, Ember, and corresponding libraries incorporating this info for easy re-use.

Brian will share back a high level outline for discussion towards the end of the month, which I'll link to here, but I'll also share to the relevant groups like the open source WG.

I'm also tagging @seanmcilroy29 so he can link to this for the corresponding working groups who would need to refer to it.

jawache · 2022-06-24T23:25:46Z

Totally agreed @mrchrisadams! Given its all SCI related. The user flow will likely be someone is trying to figure out a value for either E, I or M for their SCI calculation so when they head to this repo they are specifically going to be looking for an answer to an E, I or M question.

But I think there needs to be another heading to cover what perhaps we can just bucket as "managed services". E.g. Managed databases and any SaaS product or API service your app consumes.

So I think there will be 4 categories.

Energy (kWh per R)
Carbon Intensity Of Energy (Carbon per kWh)
Embodied (Embodied Carbon per R)
Managed Services/SCI Scores (Total Carbon per R)

Your SCI score is = Sum of SCI scores for the services you manage PLUS Sum of the SCI scores for the managed services you consumed.

In the perfect future, every managed service in the world will have to calculate and publish an SCI score. So you just calculate your SCI score, add in the SCI scores of all the managed services you consumed and that's it.

E.g. You can think of networking as a managed service. With networking I guess the majority of people are just looking for an SCI score for Networking, Carbon per GB. There can be many flavours of a Networking SCI score (DC<>DC, Consumer etc...) but they are just looking for Carbon per GB.

There is already a demand for SCI scores for other managed services, e.g. for managed databases we might want to provide a "Carbon per GB Stored". Or for Serverless Infra maybe "Carbon per Minute Of Execution"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collating data into an easier to use format #16

Collating data into an easier to use format #16

jawache commented Jun 19, 2022

mrchrisadams commented Jun 22, 2022 •

edited

Loading

jawache commented Jun 24, 2022

Collating data into an easier to use format #16

Collating data into an easier to use format #16

Comments

jawache commented Jun 19, 2022

mrchrisadams commented Jun 22, 2022 • edited Loading

jawache commented Jun 24, 2022

mrchrisadams commented Jun 22, 2022 •

edited

Loading