Propose new environmental consideration information for ML models #396

mrutkows · 2024-03-08T16:35:25Z

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.

The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

Background:

many more from any search engine...

jkowalleck · 2024-03-10T19:22:03Z

i dont understand the issue

This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

this description does in no way describe the actual problem, but gives a reason why a certain problem shall be solved

jkowalleck · 2024-03-10T19:33:02Z

@stevespringett can you help me here? I dont see a reason for putting these values in an ML-BOM.
Putting self-proclaimed side-data in BOM - does this actually help anybody? Is there some write-up or video-recording from the CycloneDX ML-WorkingGroup related to this topic?
With my current understanding of the topic, this all looks like an abuse of BOM for bragging-purposes (like: look how large/low my numbers are... and my numbers are better than yours...).

stevespringett · 2024-03-12T19:34:45Z

@jkowalleck The energy crisis for AI was just starting to happen when the AI/ML workgroup was operational. Over the last year, the crisis has grown exponentially. Organizations previously were talking about being carbon neutral. With the energy demands of AI, that likely is not possible. This reality is captured in the text of the AI Act. The energy considerations can also be combined with CDXA so that organizations can attest to the data in the model card.

The environment consideration support that Matt is working on will help CycloneDX adopters meet requirements in the AI Act.

According to the text adopted by the European Parliament, the AI Act sets out requirements for so-called "high-risk AI systems." These systems must be designed and developed with logging capabilities that enable the recording of energy consumption, the measurement or calculation of resource use, and the environmental impact throughout the system's lifecycle. These requirements primarily focus on transparency, ensuring that stakeholders have access to data on energy consumption. However, it is important to note that, in this case, the AI Act does not compel measures to reduce the energy consumption of AI systems.

Source: https://www.techpolicy.press/addressing-ai-energy-consumption-why-the-eu-must-embrace-ecodesign-for-software/

This is the use case that Matt is trying to achieve with this feature.

stevespringett · 2024-03-12T21:12:08Z

To frame this in a use case:

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.

jkowalleck · 2024-03-13T09:01:40Z

Environmental costs for ML-BOM is just one aspect.
Would you also add cost for SaaSBOM - how much does it cost to run the service?
Would you also add time cost for SBOM - like how many hours went into the development of a component?
Would you also add health/medical costs for HBOM - how many people suffered for mining the materials used in a component?

Thing is, all these "costs" are currently (in real world) priced in money (taxes, operational costs, RnD, etc).
If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

stevespringett · 2024-03-13T20:13:34Z

If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

Valid point. However, the same logic could be applied to the majority of the model card, including performance metrics and biases. But that's not where the industry is currently at. But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

jkowalleck · 2024-03-14T09:42:52Z

But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

that sounds good. finding a generalized solution that can be reused 👍

PS: here are others asking for a generic approach

a request over here: Propose new environmental consideration information for ML models #395 (comment)
a request in the presentation here: https://youtu.be/jmMxm3wVcWo?t=2272

jkowalleck · 2024-03-20T16:46:06Z

Existing work/art in the field : Green Software Foundation - Impact Framework - see https://if.greensoftware.foundation/

jkowalleck · 2024-03-21T17:24:40Z

a followp will be #406

The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model. - Adds `ModelCardConsiderations.environmentalConsiderations` this fixes #396 - Adds `OrganizationalEntity.address` ---- TODO - [x] modify JSON schema - [x] modify XML schema - [x] modify protobuf schema - [x] add examples & test resources

## Added * Core enhancement: Attestation ([#192](#192) via [#348](#348)) * Core enhancement: Cryptography Bill of Materials — CBOM ([#171](#171), [#291](#291) via [#347](#347)) * Feature to express the URL to source distribution ([#98](#98) via [#269](#269)) * Feature to express the URL to RFC 9116 compliant documents ([#380](#380) via [#381](#381)) * Feature to express tags/keywords for services and components (via [#383](#383)) * Feature to express details for component authors ([#335](#335) via [#379](#379)) * Feature to express details for component and BOM manufacturer ([#346](#346) via [#379](#379)) * Feature to express communicate concluded values from observed evidences ([#411](#411) via [#412](#412)) * Features to express license acknowledgement ([#407](#407) via [#408](#408)) * Feature to express environmental consideration information for model cards ([#396](#396) via [#395](#395)) * Feature to express the address of organizational entities (via [#395](#395)) * Feature to express additional component identifiers: Universal Bill Of Receipts Identifier and Software Heritage persistent IDs ([#413](#413) via [#414](#414)) ## Fixed * Allow multiple evidence identities by XML/JSON schema ([#272](#272) via [#359](#359)) This was already correct via ProtoBuff schema. * Prevent empty `license` entities by XML schema ([#288](#288) via [#292](#292)) This was already correct in JSON/ProtoBuff schema. * Prevent empty or malformed `property` entities by JSON schema ([#371](#371) via [#375](#375)) This was already correct in XML/ProtoBuff schema. * Allow multiple `licenses` in `Metadata` by ProtoBuff schema ([#264](#264) via [#401](#401)) This was already correct in XML/JSON schema. ## Changed * Allow arbitrary `$schema` values by JSON schema ([#402](#402) via [#403](#403)) * Increased max length of `versionRange` (via [`3e01ce6`](3e01ce6)) * Harmonized length of `version` (via [#417](#417)) ## Deprecated * Data model "Component"'s field `author` was deprecated. (via [#379](#379)) Use field `authors` or field `manufacturer` instead. * Data model "Metadata"'s field `manufacture` was deprecated. ([#346](#346) via [#379](#379)) Use "Metadata"'s field `component`'s field `manufacturer` instead. - for XML: `/bom/metadata/component/manufacturer` - for JSON: `$.metadata.component.manufacturer` - for ProtoBuf: `Bom:metadata.component.manufacturer` ## Documentation * Centralize version and version-range (via [#322](#322)) * Streamlined SPDX expression related descriptions (via [#327](#327)) * Enhanced descriptions of `bom-ref`/`refType` ([#336](#336) via [#344](#344)) * Enhanced readability of enum documentation in JSON schema ([#361](#361) via [#362](#362)) * Fixed typo "compliment" -> "complement" (via [#369](#369)) * Added documentation for enum "ComponentScope"'s values in JSON schema ([#293](#293) via [`d92e58e`](d92e58e)) Texts were a taken from the existing ones in XML/ProtoBuff schema. * Added documentation for enum "TaskType"'s values ([#245](#245) via [#377](#377)) * Improve documentation for data model "Metadata"'s field `licenses` ([#273](#273) via [#378](#378)) * Added documentation for enum "MachineLearningApproachType"'s values ([#351](#351) via [#416](#416)) * Rephrased some texts here and there. ## Test data * Added test data for newly added use cases * Added quality assurance for our ProtoBuf schemas ([#384](#384) via [#385](#385))

mrutkows mentioned this issue Mar 8, 2024

Propose new environmental consideration information for ML models #395

Merged

4 tasks

jkowalleck linked a pull request Mar 10, 2024 that will close this issue

Propose new environmental consideration information for ML models #395

Merged

4 tasks

jkowalleck mentioned this issue Mar 21, 2024

environmental/economical/ethical costs of service/component/etc for runtime/manufacturing/etc #406

Open

jkowalleck mentioned this issue Mar 29, 2024

v1.6 #323

Merged

stevespringett added this to the 1.6 milestone Apr 1, 2024

stevespringett closed this as completed Apr 1, 2024

andreas-hilti mentioned this issue Aug 25, 2024

support cdx 1.6 CycloneDX/cyclonedx-dotnet-library#288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propose new environmental consideration information for ML models #396

Propose new environmental consideration information for ML models #396

mrutkows commented Mar 8, 2024 •

edited by jkowalleck

Loading

jkowalleck commented Mar 10, 2024

jkowalleck commented Mar 10, 2024 •

edited

Loading

stevespringett commented Mar 12, 2024

stevespringett commented Mar 12, 2024

jkowalleck commented Mar 13, 2024

stevespringett commented Mar 13, 2024

jkowalleck commented Mar 14, 2024 •

edited

Loading

jkowalleck commented Mar 20, 2024 •

edited

Loading

jkowalleck commented Mar 21, 2024

Propose new environmental consideration information for ML models #396

Propose new environmental consideration information for ML models #396

Comments

mrutkows commented Mar 8, 2024 • edited by jkowalleck Loading

jkowalleck commented Mar 10, 2024

jkowalleck commented Mar 10, 2024 • edited Loading

stevespringett commented Mar 12, 2024

stevespringett commented Mar 12, 2024

jkowalleck commented Mar 13, 2024

stevespringett commented Mar 13, 2024

jkowalleck commented Mar 14, 2024 • edited Loading

jkowalleck commented Mar 20, 2024 • edited Loading

jkowalleck commented Mar 21, 2024

mrutkows commented Mar 8, 2024 •

edited by jkowalleck

Loading

jkowalleck commented Mar 10, 2024 •

edited

Loading

jkowalleck commented Mar 14, 2024 •

edited

Loading

jkowalleck commented Mar 20, 2024 •

edited

Loading