Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object storage Carbon estimates #101

Open
1 task done
nickbarber opened this issue Mar 14, 2024 · 2 comments
Open
1 task done

Object storage Carbon estimates #101

nickbarber opened this issue Mar 14, 2024 · 2 comments
Assignees
Labels
plug-in-project registered A project which has been registered with the GSF submitted The project team has submitted their solution.

Comments

@nickbarber
Copy link

nickbarber commented Mar 14, 2024

Prize category

Best Plugin

Overview

A plugin to estimate the carbon emissions of object storage based on a few factors such as:

  • data amount
  • length of time data is stored
  • availability of data
  • data velocity

Questions to be answered

No response

Have you got a project team yet?

Yes and we aren't recruiting

Project team

@nickbarber, @mgriffin-scottlogic, @jmain-scottlogic, @ishmael-burdeau

Terms of Participation

Project Submission

Summary

A series of plugins to estimate the energy used by data storage, particularly cloud object storage as well as the impact of reading and writing data to storage devices. We have created 4 plugins that can be used in conjunction with each other, and other plugins as needed. One to estimate the energy used by cloud object storage, one to return a replication factor multiplication based on defaults of some cloud providers, one to estimate the energy consumed by stored data and one to estimate the energy consumed by reading/writing data.

Problems

How to calculate the carbon emissions generated by data storage and the reading and writing of data as well as object (blob) storage.

Application

Consists of multiple plugins, designed so that they can be used together or as separate components for non-cloud usage.
One plugin retrieves the replication factor for cloud storage services.
Another takes drive size, power along with duration, data stored and the replication factor to estimate the total energy associated with storage.

Prize category

Best Plugin

Judging criteria

Gives users a way to calculate the carbon emissions of their data storage, and has been created in such a way that it can be applied from single drives up to cloud managed services as long as the user has access to the relevant required data.

Video

https://www.youtube.com/watch?v=dGMidYCsEnk

Artefacts

https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin

Usage

https://github.com/mgriffin-scottlogic/if-carbon-hack-plugin?tab=readme-ov-file#usage

Process

We tried to get data out of cloud services that would give us an indication of what was being used on the back end of cloud object storage services but were unable to get much useful information.
We used an open-source object storage system to test hypothesis’ we had about the impact of storage as well as looking up research others had done on the topic. We therefore decided to create a plugin solution in it’s simplest form, to output energy used by a storage device. We discovered energy usage differs drastically on read/write vs idle and therefore split the plugins.

Inspiration

Discussions between Scott Logic and DWP and what impact 30TB of data in S3 has on carbon emissions and that there was no clear way to calculate or estimate outside of AWS reporting.

Challenges

Getting data out of cloud services and understanding what is happening at lower levels, especially with replication, redundancy and also availability levels (e.g. intelligent tiering)
Finding the right places to break up plugins
Understanding the CPU/Memory impact of object storage on top of the storage component, whether it be hosted service or a system running locally.

Accomplishments

Calculating estimate within reasonable error of AWS own reporting for common crawl
Making use of existing if-plugins to estimate the embodied carbon of storage

Learnings

Simply storing data has less impact than we expected. CPU usage for reading/writing etc has a bigger impact.
There is lots of scope to reuse the standard impact framework plugins in helpful ways (reusing embodied for data on drives)

What's next?

Further research into what information is required to get more detailed calculations for cloud object storage services.

Computation and memory overheads of object storage systems

Erasure coding vs replication.

Automated duplication of observations for replicated regions

@nickbarber nickbarber added the draft This project is in draft mode and has not been submitted label Mar 14, 2024
@russelltrow russelltrow added plug-in-project registered A project which has been registered with the GSF and removed draft This project is in draft mode and has not been submitted labels Mar 14, 2024
@jawache
Copy link
Contributor

jawache commented Mar 17, 2024

This is such an important proposal, lots is written about CPU carbon emissions from computation, but emission from storage/DB usage is an important gap we need to fill 🙏

@russelltrow russelltrow added the submitted The project team has submitted their solution. label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plug-in-project registered A project which has been registered with the GSF submitted The project team has submitted their solution.
Projects
None yet
Development

No branches or pull requests

4 participants