Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new environmental consideration information for ML models #396

Closed
mrutkows opened this issue Mar 8, 2024 · 9 comments · Fixed by #395
Closed

Propose new environmental consideration information for ML models #396

mrutkows opened this issue Mar 8, 2024 · 9 comments · Fixed by #395
Milestone

Comments

@mrutkows
Copy link
Contributor

mrutkows commented Mar 8, 2024

see #396 (comment)

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.


The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

Background:

many more from any search engine...

@jkowalleck
Copy link
Member

i dont understand the issue

This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.

this description does in no way describe the actual problem, but gives a reason why a certain problem shall be solved

@jkowalleck
Copy link
Member

jkowalleck commented Mar 10, 2024

@stevespringett can you help me here? I dont see a reason for putting these values in an ML-BOM.
Putting self-proclaimed side-data in BOM - does this actually help anybody? Is there some write-up or video-recording from the CycloneDX ML-WorkingGroup related to this topic?
With my current understanding of the topic, this all looks like an abuse of BOM for bragging-purposes (like: look how large/low my numbers are... and my numbers are better than yours...).

@stevespringett
Copy link
Member

@jkowalleck The energy crisis for AI was just starting to happen when the AI/ML workgroup was operational. Over the last year, the crisis has grown exponentially. Organizations previously were talking about being carbon neutral. With the energy demands of AI, that likely is not possible. This reality is captured in the text of the AI Act. The energy considerations can also be combined with CDXA so that organizations can attest to the data in the model card.

The environment consideration support that Matt is working on will help CycloneDX adopters meet requirements in the AI Act.

According to the text adopted by the European Parliament, the AI Act sets out requirements for so-called "high-risk AI systems." These systems must be designed and developed with logging capabilities that enable the recording of energy consumption, the measurement or calculation of resource use, and the environmental impact throughout the system's lifecycle. These requirements primarily focus on transparency, ensuring that stakeholders have access to data on energy consumption. However, it is important to note that, in this case, the AI Act does not compel measures to reduce the energy consumption of AI systems.

Source: https://www.techpolicy.press/addressing-ai-energy-consumption-why-the-eu-must-embrace-ecodesign-for-software/

This is the use case that Matt is trying to achieve with this feature.

@stevespringett
Copy link
Member

To frame this in a use case:

As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act.

@jkowalleck
Copy link
Member

Environmental costs for ML-BOM is just one aspect.
Would you also add cost for SaaSBOM - how much does it cost to run the service?
Would you also add time cost for SBOM - like how many hours went into the development of a component?
Would you also add health/medical costs for HBOM - how many people suffered for mining the materials used in a component?

Thing is, all these "costs" are currently (in real world) priced in money (taxes, operational costs, RnD, etc).
If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

@stevespringett
Copy link
Member

If we wanted to add environmental costs specifically, then I would argue that we should add costs in general - for every component/service/...

Valid point. However, the same logic could be applied to the majority of the model card, including performance metrics and biases. But that's not where the industry is currently at. But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 14, 2024

But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service.

that sounds good. finding a generalized solution that can be reused 👍

PS: here are others asking for a generic approach

@jkowalleck
Copy link
Member

jkowalleck commented Mar 20, 2024

Existing work/art in the field : Green Software Foundation - Impact Framework - see https://if.greensoftware.foundation/

@jkowalleck
Copy link
Member

a followp will be #406

stevespringett added a commit that referenced this issue Mar 29, 2024
The fact that datasets used to train AI models are increasingly large
and take an enormous amount of energy (and indirectly produce large CO2
emissions) to develop, train and run has come to the forefront. This PR
contains proposed additions to the "modelCard" type to account for these
considerations when selecting/utilizing a model.

- Adds `ModelCardConsiderations.environmentalConsiderations` 
  this fixes #396
- Adds `OrganizationalEntity.address`

----

TODO

- [x] modify JSON schema
- [x] modify XML schema
- [x] modify protobuf schema
- [x] add examples & test resources
@jkowalleck jkowalleck mentioned this issue Mar 29, 2024
@stevespringett stevespringett added this to the 1.6 milestone Apr 1, 2024
stevespringett added a commit that referenced this issue Apr 9, 2024
## Added

* Core enhancement: Attestation
([#192](#192) via
[#348](#348))
* Core enhancement: Cryptography Bill of Materials — CBOM
([#171](#171),
[#291](#291) via
[#347](#347))
* Feature to express the URL to source distribution
([#98](#98) via
[#269](#269))
* Feature to express the URL to RFC 9116 compliant documents
([#380](#380) via
[#381](#381))
* Feature to express tags/keywords for services and components (via
[#383](#383))
* Feature to express details for component authors
([#335](#335) via
[#379](#379))
* Feature to express details for component and BOM manufacturer
([#346](#346) via
[#379](#379))
* Feature to express communicate concluded values from observed
evidences ([#411](#411)
via [#412](#412))
* Features to express license acknowledgement
([#407](#407) via
[#408](#408))
* Feature to express environmental consideration information for model
cards ([#396](#396) via
[#395](#395))
* Feature to express the address of organizational entities (via
[#395](#395))
* Feature to express additional component identifiers: Universal Bill Of
Receipts Identifier and Software Heritage persistent IDs
([#413](#413) via
[#414](#414))

## Fixed

* Allow multiple evidence identities by XML/JSON schema
([#272](#272) via
[#359](#359))
  This was already correct via ProtoBuff schema.
* Prevent empty `license` entities by XML schema
([#288](#288) via
[#292](#292))
  This was already correct in JSON/ProtoBuff schema.
* Prevent empty or malformed `property` entities by JSON schema
([#371](#371) via
[#375](#375))
  This was already correct in XML/ProtoBuff schema.
* Allow multiple `licenses` in `Metadata` by ProtoBuff schema
([#264](#264) via
[#401](#401))
  This was already correct in XML/JSON schema.

## Changed

* Allow arbitrary `$schema` values by JSON schema
([#402](#402) via
[#403](#403))
* Increased max length of `versionRange` (via
[`3e01ce6`](3e01ce6))
* Harmonized length of `version` (via
[#417](#417))

## Deprecated

* Data model "Component"'s field `author` was deprecated. (via
[#379](#379))
  Use field `authors` or field `manufacturer` instead.
* Data model "Metadata"'s field `manufacture` was deprecated.
([#346](#346) via
[#379](#379))
  Use "Metadata"'s field `component`'s field `manufacturer` instead. 
  - for XML: `/bom/metadata/component/manufacturer`
  - for JSON: `$.metadata.component.manufacturer`
  - for ProtoBuf: `Bom:metadata.component.manufacturer`

## Documentation

* Centralize version and version-range (via
[#322](#322))
* Streamlined SPDX expression related descriptions (via
[#327](#327))
* Enhanced descriptions of `bom-ref`/`refType`
([#336](#336) via
[#344](#344))
* Enhanced readability of enum documentation in JSON schema
([#361](#361) via
[#362](#362))
* Fixed typo "compliment" -> "complement" (via
[#369](#369))
* Added documentation for enum "ComponentScope"'s values in JSON schema
([#293](#293) via
[`d92e58e`](d92e58e))
  Texts were a taken from the existing ones in XML/ProtoBuff schema.
* Added documentation for enum "TaskType"'s values
([#245](#245) via
[#377](#377))
* Improve documentation for data model "Metadata"'s field `licenses`
([#273](#273) via
[#378](#378))
* Added documentation for enum "MachineLearningApproachType"'s values
([#351](#351) via
[#416](#416))
* Rephrased some texts here and there.

## Test data

* Added test data for newly added use cases
* Added quality assurance for our ProtoBuf schemas
([#384](#384) via
[#385](#385))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants