Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new environmental consideration information for ML models #395

Merged
merged 76 commits into from
Mar 29, 2024

Conversation

mrutkows
Copy link
Contributor

@mrutkows mrutkows commented Mar 7, 2024

The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.


TODO

  • modify JSON schema
  • modify XML schema
  • modify protobuf schema
  • add examples & test resources

@mrutkows
Copy link
Contributor Author

mrutkows commented Mar 7, 2024

@mrutkows
Copy link
Contributor Author

mrutkows commented Mar 7, 2024

@mrutkows
Copy link
Contributor Author

mrutkows commented Mar 7, 2024

@jkowalleck @stevespringett
NOTE: we need to add "externalReferences' and "properties" where applicable

@mrutkows
Copy link
Contributor Author

mrutkows commented Mar 7, 2024

My presentation from the 03-07-2024 Ecma TC-54 call:
2024-03-07 ML-BOM Environmental Consideration Proposal - Rutkowski.pdf

@jkowalleck jkowalleck added this to the 1.6 milestone Mar 8, 2024
@jkowalleck jkowalleck removed CDX 1.6 related to release v1.6 format: JSON labels Mar 8, 2024
@jkowalleck
Copy link
Member

@mrutkows is this PR related to an existing issue?
If not, please create an issue that describes the reasoning, use cases, edge cases, out-of-scopes, and such.

@mrutkows mrutkows marked this pull request as ready for review March 8, 2024 18:02
@mrutkows mrutkows requested a review from a team as a code owner March 8, 2024 18:02
@mrutkows mrutkows marked this pull request as draft March 8, 2024 18:04
@jkowalleck jkowalleck linked an issue Mar 10, 2024 that may be closed by this pull request
@jkowalleck
Copy link
Member

this issue claims to solve #396

well, since the #396 does in no way describe a problem yet, I cannot see how this PR intends to solve it.
@mrutkows , please take the 10 minute to describe the actual problem, before I take the two hours to actually review the "solution"

@prabhu
Copy link
Contributor

prabhu commented Mar 12, 2024

Could we make this a bit more generic so that it could be attached to services, manufacturers, etc? Initially, it could be a type in external reference since the analysis is usually very specific to the needs of a particular organization.

@stevespringett
Copy link
Member

@mrutkows I think we have consensus to use the current model card approach to adding support for environmental concerns and then in v1.7, we can expand that support to every component and service.

What is the status of this PR? We likely have one week to flush this out, otherwise it will need to be postponed to v1.7.

@jkowalleck
Copy link
Member

I still have questions and concerns. I vote for postponing and continued discussions in #396
If the data is needed earlier, we still could create CDX properties for this - and ses how it is adopted.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 21, 2024

re: #395 (comment)
@mrutkows , Vinod and I had a discussion and will work on having this moved forward, having this part of 1.6

Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck
Copy link
Member

@mrutkows here is the neeed fix for the examples: mrutkows#3

as soon as this is merged, we should be golden.

mrutkows and others added 7 commits March 27, 2024 04:40
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
@mrutkows
Copy link
Contributor Author

@mrutkows here is the neeed fix for the examples: mrutkows#3

as soon as this is merged, we should be golden.

Thanks Jan!!!

BTW, there is still something wrong in the XSD in that I should have an energyConsumptionsType (plural) as well as the current energyConsumptionType to account for the corresponding anonymous type in the JSON schema... I will commit that change shortly.

…type in JSON

Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Copy link
Member

@jkowalleck jkowalleck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall.

schema/bom-1.6.proto Show resolved Hide resolved
schema/bom-1.6.proto Outdated Show resolved Hide resolved
schema/bom-1.6.proto Show resolved Hide resolved
docgen/examples/md/mlbom-1.6-env-considerations-1.json Outdated Show resolved Hide resolved
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
Signed-off-by: Matt Rutkowski <mrutkows@us.ibm.com>
@mrutkows mrutkows changed the title [WIP] Propose new environmental consideration information for ML models Propose new environmental consideration information for ML models Mar 27, 2024
@jkowalleck jkowalleck requested review from a team and stevespringett March 27, 2024 21:06
Copy link
Member

@stevespringett stevespringett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @mrutkows . Thanks for all the hard work on this. If you can sort out the other discrepancy, we can get this approved.

schema/bom-1.6.schema.json Show resolved Hide resolved
@stevespringett stevespringett added promote to tc54 Promote to Ecma Technical Committee 54 tc54 reviewed Ecma TC54 has reviewed the feature candidate tc54 accepted Ecma TC54 has accepted the feature candidate labels Mar 28, 2024
@stevespringett stevespringett merged commit 6f284bd into CycloneDX:1.6-dev Mar 29, 2024
8 checks passed
@jkowalleck jkowalleck mentioned this pull request Mar 29, 2024
stevespringett added a commit that referenced this pull request Apr 9, 2024
## Added

* Core enhancement: Attestation
([#192](#192) via
[#348](#348))
* Core enhancement: Cryptography Bill of Materials — CBOM
([#171](#171),
[#291](#291) via
[#347](#347))
* Feature to express the URL to source distribution
([#98](#98) via
[#269](#269))
* Feature to express the URL to RFC 9116 compliant documents
([#380](#380) via
[#381](#381))
* Feature to express tags/keywords for services and components (via
[#383](#383))
* Feature to express details for component authors
([#335](#335) via
[#379](#379))
* Feature to express details for component and BOM manufacturer
([#346](#346) via
[#379](#379))
* Feature to express communicate concluded values from observed
evidences ([#411](#411)
via [#412](#412))
* Features to express license acknowledgement
([#407](#407) via
[#408](#408))
* Feature to express environmental consideration information for model
cards ([#396](#396) via
[#395](#395))
* Feature to express the address of organizational entities (via
[#395](#395))
* Feature to express additional component identifiers: Universal Bill Of
Receipts Identifier and Software Heritage persistent IDs
([#413](#413) via
[#414](#414))

## Fixed

* Allow multiple evidence identities by XML/JSON schema
([#272](#272) via
[#359](#359))
  This was already correct via ProtoBuff schema.
* Prevent empty `license` entities by XML schema
([#288](#288) via
[#292](#292))
  This was already correct in JSON/ProtoBuff schema.
* Prevent empty or malformed `property` entities by JSON schema
([#371](#371) via
[#375](#375))
  This was already correct in XML/ProtoBuff schema.
* Allow multiple `licenses` in `Metadata` by ProtoBuff schema
([#264](#264) via
[#401](#401))
  This was already correct in XML/JSON schema.

## Changed

* Allow arbitrary `$schema` values by JSON schema
([#402](#402) via
[#403](#403))
* Increased max length of `versionRange` (via
[`3e01ce6`](3e01ce6))
* Harmonized length of `version` (via
[#417](#417))

## Deprecated

* Data model "Component"'s field `author` was deprecated. (via
[#379](#379))
  Use field `authors` or field `manufacturer` instead.
* Data model "Metadata"'s field `manufacture` was deprecated.
([#346](#346) via
[#379](#379))
  Use "Metadata"'s field `component`'s field `manufacturer` instead. 
  - for XML: `/bom/metadata/component/manufacturer`
  - for JSON: `$.metadata.component.manufacturer`
  - for ProtoBuf: `Bom:metadata.component.manufacturer`

## Documentation

* Centralize version and version-range (via
[#322](#322))
* Streamlined SPDX expression related descriptions (via
[#327](#327))
* Enhanced descriptions of `bom-ref`/`refType`
([#336](#336) via
[#344](#344))
* Enhanced readability of enum documentation in JSON schema
([#361](#361) via
[#362](#362))
* Fixed typo "compliment" -> "complement" (via
[#369](#369))
* Added documentation for enum "ComponentScope"'s values in JSON schema
([#293](#293) via
[`d92e58e`](d92e58e))
  Texts were a taken from the existing ones in XML/ProtoBuff schema.
* Added documentation for enum "TaskType"'s values
([#245](#245) via
[#377](#377))
* Improve documentation for data model "Metadata"'s field `licenses`
([#273](#273) via
[#378](#378))
* Added documentation for enum "MachineLearningApproachType"'s values
([#351](#351) via
[#416](#416))
* Rephrased some texts here and there.

## Test data

* Added test data for newly added use cases
* Added quality assurance for our ProtoBuf schemas
([#384](#384) via
[#385](#385))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
promote to tc54 Promote to Ecma Technical Committee 54 proposed core enhancement ready for review request for comment tc54 accepted Ecma TC54 has accepted the feature candidate tc54 reviewed Ecma TC54 has reviewed the feature candidate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propose new environmental consideration information for ML models
4 participants