-
Notifications
You must be signed in to change notification settings - Fork 648
Structured cbor #3036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Structured cbor #3036
Conversation
Full disclosure: This PR incorporates code from a draft generated by Junie (albeit an impressive draft that saved a day of work). This is not a dumb copypasta of AI-generated code. Even if it were already feature-complete It would still not yet be marked ready for review because we have yet to review everything internally. I also want to stress that "we" is not a euphemism. There will be at least two of us reviewing and discussing internally, almost certainly with additional input from other humans in the process of readying this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the code at a general level. There is a lot of repetition, so I've only commented on the first case, not every one. I guess some of the AI generation is visible in the details.
encodeKeyTags = true, | ||
encodeValueTags = true, | ||
encodeObjectTags = true, | ||
verifyKeyTags = true, | ||
verifyValueTags = true, | ||
verifyObjectTags = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these parameters change the behaviour of the system you should not change the default. Instead you could provide a way for users to choose a newer/different configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! definitely not intended
* See [RFC 8949 3.4. Tagging of Items](https://datatracker.ietf.org/doc/html/rfc8949#name-tagging-of-items). | ||
*/ | ||
@OptIn(ExperimentalUnsignedTypes::class) | ||
public val tags: ULongArray = ulongArrayOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to keep out the default initializer to ensure that any concrete subclass passes the value (it is sealed so it is not a usability issue to require it to be passed along).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if tags are uncommon you should probably make it nullable to avoid the overhead of the empty array. Alternatively you could create an internal empty array value that is reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd go for an internal empty array. It does not make sense to differentiate between a null value and an empty array
@Serializable(with = CborIntSerializer::class) | ||
public class CborNegativeInt( | ||
public val value: Long, | ||
tags: ULongArray = ulongArrayOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, remove the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well actually, I am confident that all empty arrays of primitives delegate to a Singleton, so no premature optimisation planned here and in all other places
* Class representing CBOR `null` value | ||
*/ | ||
@Serializable(with = CborNullSerializer::class) | ||
public class CborNull(tags: ULongArray=ulongArrayOf()) : CborPrimitive(tags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value should remain, but use an internal array (or null) to avoid extensive copies of empty arrays.
} | ||
|
||
override fun serialize(encoder: Encoder, value: CborElement) { | ||
encoder.asCborEncoder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally you allow serializing to other formats with the "normal" behaviour (see the serializers for JsonElement
etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't the JsonElement serializer have a verify
method that ensures it can only be used with the JSON format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out: it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JesusMcCloud You are right. I had missed the check because it was encapsulated. The worst part is that it would actually work without that verify check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does make life easier for those implementing the serializers. for me at least, i don't have to think about how this will behave with other formats, esp. wrt. tagging.
} | ||
} | ||
|
||
public object CborIntSerializer : KSerializer<CborNegativeInt> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be internal like the other specialist serializers. They are accessible on the type itself.
* Class representing signed CBOR integer (major type 1). | ||
*/ | ||
@Serializable(with = CborIntSerializer::class) | ||
public class CborNegativeInt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name NegativeInt is misleading as it appears to mean a value that can only be negative, not a signed value (that can be positive or negative). Given this name is poor, you also want to rename the counterpart from PositiveInt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was intentional. IIRC only negative ints are encoded as signed int in cbor, so it really must be a negative int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if they are created programmatically? If you have this value based type you need a factory that creates the correct version on demand (and probably have an "CborInt" parent type that is sign independent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a sealed parent makes sense (with a custom invoke function that returns the subtype based on sign), but the name for this class is the only one i could think of that is actually accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JesusMcCloud With the explanation it certainly does make sense to call it positive/negative, although you could also just have a single type and make the distinction only on writing (as it is value dependent and deterministic anyway). Maybe the single type is best as that also avoids instances being created with invalid values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it definitely makes life easier when you don't parse, but create. when parsing, though, I am slightly leaning towards being very explicit and very close to the wire format (which is what we're representing). I'll let @sandwwraith decide
private fun defer(deferred: () -> SerialDescriptor): SerialDescriptor = object : SerialDescriptor { | ||
private val original: SerialDescriptor by lazy(deferred) | ||
|
||
override val serialName: String | ||
get() = original.serialName | ||
override val kind: SerialKind | ||
get() = original.kind | ||
override val elementsCount: Int | ||
get() = original.elementsCount | ||
|
||
override fun getElementName(index: Int): String = original.getElementName(index) | ||
override fun getElementIndex(name: String): Int = original.getElementIndex(name) | ||
override fun getElementAnnotations(index: Int): List<Annotation> = original.getElementAnnotations(index) | ||
override fun getElementDescriptor(index: Int): SerialDescriptor = original.getElementDescriptor(index) | ||
override fun isElementOptional(index: Int): Boolean = original.isElementOptional(index) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this, it may be better to have the descriptor property itself be lazy, rather than each of its elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well…
fixes #2975
(not yet ready for review, as an internal review, cleanups and more thorough test cases are still pending; hence draft)
Discussion points @sandwwraith :
useDefiniteLengthEncoding
: YespreferCborLabelsOverNames
: Probably no. Rather: if a label is present on a CborElement, use that over the nameStill TODO: