-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify whether base64 encoding of license content is really optional #382
Comments
If you look at the spec./schema (i.e., specifically, https://cyclonedx.org/docs/1.5/json/#metadata_component_licenses_oneOf_i0_items_license_text_content) the If the In short, feel free to simply fill in the For further clarity, the "one of" language assures that schema will ONLY allow "one of" the values indicated; at this time, ONLY a single value is allowed (only if that field is used) which would be "base64". Again, limiting the encoding types, esp. to a single value, helps in automated processing so that no unexpected/unclear encoding values appear. |
Thanks that's very helpful! I'm really glad non-encoded is an option. I did quite carefully read all the spec (including the part you quoted) yesterday - multiple times :) - and couldn't convince myself 100% which way to interpret it (I found reasons for both possible interpretations :) ), so a brief remark in the next spec to make it explicit would be super helpful to avoid possible confusion. Perhaps add an example where the encode= is not specified. And in the "encoding" parameter you could state that the behaviour if the encoding parameter is not specified i.e. that the text is not encoded . (I realise that might sound too obvious to someone who already knows the intention of the spec, but the text stating it must be one of "base64" sent me off on a different interpretation. (I was also looking at this thread CycloneDX/cyclonedx-core-java#100 where it looks like someone else concluded base64 was mandatory) |
Base64-encoding of license texts is optional [1]. As it results in larger and less readable files, disable it. [1]: CycloneDX/specification#382 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
We're having a debate about this on CycloneDX/cyclonedx-core-java#100 where it seems the main Java library for CycloneDX has taken a different interpretation of the encoding question. |
Base64-encoding of license texts is optional [1]. As it results in larger and less readable files, disable it. [1]: CycloneDX/specification#382 Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
PS/EDIT: I learned I was wrong, the following is not true:
nope. no default value anymore, this was fixed. at the state of CycloneDX 1.5 and before:
In addition, the example: https://cyclonedx.org/use-cases/#provenance {
"contentType": "text/xml",
"content": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiID8+CjxTb2Z0d2FyZUlkZW50aXR5IHhtbDpsYW5nPSJFTiIgbmFtZT0iQ3J5cHRvIExpYnJhcnkiIHZlcnNpb249IjMuMC4wIiAKIHZlcnNpb25TY2hlbWU9Im11bHRpcGFydG51bWVyaWMiIAogdGFnSWQ9InN3aWRnZW4tNWRjYjc5YWYtYTFkMi02MWIzLTM0ZmQtNTM2YzUzYjA4ODEwXzMuMC4wIiAKIHhtbG5zPSJodHRwOi8vc3RhbmRhcmRzLmlzby5vcmcvaXNvLzE5NzcwLy0yLzIwMTUvc2NoZW1hLnhzZCI+IAogeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgCiB4c2k6c2NoZW1hTG9jYXRpb249Imh0dHA6Ly9zdGFuZGFyZHMuaXNvLm9yZy9pc28vMTk3NzAvLTIvMjAxNS1jdXJyZW50L3NjaGVtYS54c2Qgc2NoZW1hLnhzZCIgPgogIDxNZXRhIGdlbmVyYXRvcj0iU1dJRCBUYWcgT25saW5lIEdlbmVyYXRvciB2MC4xIiAvPiAKICA8RW50aXR5IG5hbWU9IkV4YW1wbGUsIEluYy4iIHJlZ2lkPSJleGFtcGxlLmNvbSIgcm9sZT0idGFnQ3JlYXRvciIgLz4gCjwvU29mdHdhcmVJZGVudGl0eT4="
}
this is wrong, as stated before. |
Aha, well that's a shame. But thanks for clarifying, much appreciated! Given several of us have come to different conclusions about what the current spec means it's certainly valuable to make the wording in the next spec more explicit. But I hope you would also consider supporting human-readable/non-base64 encoding in future - the spec allows for other legal texts such as copyrights to be human-readable, and the lack of supporting this for custom license texts makes the format unusable as a machine-readable way to comply with legal requirements. Even though I much prefer CycloneDX for all other aspects, it looks like I will have so switch over to SPDX SBOMs unfortunately :( |
@ben-sag , {content": "KGMpIDIwMjQgSmFuZSBEb2Uu", "encoding": "base64"} |
Yes you can include any text, but it's not human readable.... so although you're 99% of the way to having a really useful file format that can both list the components and fulfil the legal obligations (e.g. to publish license texts), the fact it's mandated to be in a non-human readable format (base64) stops the second use case. It's fine to allow base64 but forcing it as the only option is a surprising and imho unnecessary limitation. (Especially since you don't do that for other extracted text like copyrights but only license text). |
let me ask it this way: would you give a JSON file to your lawyer? and would they open it as a pure text file and read the raw text? literally give a lawyer this example https://raw.githubusercontent.com/CycloneDX/bom-examples/master/SBOM/juice-shop/via_npm/bare/bom.1.4.json
This is during transport. Nothing prevents you from base64-decoding the text, before handing it to some person.
The field you are referring to is of type "attachment", not "text". Please understand, that JSON and XML might be human-readable, but are not intended to be consumed by humans. They are data interchange formats for the digital age. |
@ben-sag I checked again and I was wrong with my original statement. {"content": "(c) 2024 Jane Doe"} |
This reverts commit 2300487 as only setting the `encoding` field is optional, but `content` indeed always has to be Base64-encoded [1]. [1]: CycloneDX/specification#382 (comment) Signed-off-by: Sebastian Schuberth <sebastian@doubleopen.org>
Hmm. Does that mean that depending on the specification / schema level chosen, a user of the Java library needs to manually encode e.g. license texts? |
IMO, adding a method to the Java library that will encode/decode the license text would be a good feature to add. |
IMO, just adding such a function is not of much value, the Java code for that already is a one-liner. Instead, the library should automatically Base64-encode whatever is required to be Base64-encoded. |
I couldn't see any change to the spec - from v1.2 right through to v1.5 the encoding field is marked as "optional". So the Java library should not be doing any automatic transformations but just passing through the text unchanged. I do think given the confusion on this thread there's a very strong case now for a small change making it clear/explicit in the spec whether non-base64 content is permitted (sounds like it is?). |
I believe (and asking as well) what Jan is saying, being the gatekeeper for v1.6, is that v1.6 it is no longer "optional"??? However, the current JSON schema still indicates "text" is of type "attachment" which is still described as follows (links to current/committed v1.6 json schema): "attachment": {
"type": "object",
"title": "Attachment",
"description": "Specifies the metadata and content for an attachment.",
"required": [
"content"
],
...
"properties": {
"contentType": {
"type": "string",
"title": "Content-Type",
"description": "Specifies the content type of the text. Defaults to text/plain if not specified.",
"default": "text/plain"
},
"encoding": {
"type": "string",
"title": "Encoding",
"description": "Specifies the optional encoding the text is represented in.",
"enum": [
"base64"
],
"meta:enum": {
"base64": "Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string."
}
},
"content": {
"type": "string",
"title": "Attachment Text",
"description": "The attachment data. Proactive controls such as input validation and sanitization should be employed to prevent misuse of attachment text."
}
}
}, which still has the only "required" field being "content" (i.e., "encoding" is not required) ... and the description of "encoding" still has "optional". Perhaps there is an outstanding (not merged) pull request (PR) that has this projected change? |
TLDR; specification/schema/bom-1.6.schema.json Lines 1128 to 1130 in 2873c49
Therefore, omitting the "encoding" field means no encoding is applied, as intended.
I was referring to a state when we had a schema issue during CycloneDX 1.3 JSON version only, when there was a default value for the "encoding" -- which was an unintended thing and was fixed in the CycloneDX JSON schema a while ago. This affected only JSON documents that were validated by the schema. The actual CycloneDX specification never had a default value for the attachment's "encoding" field. |
caused by CycloneDX/specification#382 (comment) > [...] add an example where the encode= is not specified [...] we have two examples of encoded text attachments in the use cases already. this PR makes one of them an unencoded text attachment. this way we have both use cases shown. since the MIT license us not a template, but an actual text with actual values in it, and since it is pretty short, I used an example with MIT license. in contrast - the Apache license is a template - it must not be modified and therefore it makes no practical since to have it as an attachment ... --------- Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com> Signed-off-by: Jan Kowalleck <jan.kowalleck@owasp.org>
The specs (incl 1.4) are unclear about whether it is mandatory or optional to use base64 encoding for license text's "content".
The fact the doc for "encoding" states it "must be one of"... "base64" makes me think it's mandatory to use base64, but the fact it also says "Specifies the optional encoding" makes me think maybe it's optional (e.g. if I don't specify an encoding parameter maybe it's treated as normal unencoded text). I looked at the examples but there is none that have content without encoding.
Can you clarify, and in the next spec put something into the doc of the "content" field to make clear whether plain unencoded text is permitted? And maybe something in the examples to demonstrate unencoded text.
(btw I'm very much hoping that use of base64 is not mandatory for this field - many licenses make it legally necessary to include the full license text and the SBOM is a very natural place to put that... but I doubt a law court would accept a base64 encoding of the text as meeting that requirement)
The text was updated successfully, but these errors were encountered: