Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBOR Feature Drop: COSE #2412

Merged
merged 98 commits into from
Jul 22, 2024
Merged

Conversation

JesusMcCloud
Copy link

This PR obsoletes #2371 and #2359 as it contains the features of both PRs and many more.

Specifically, this PR contains all feature required to serialize and parse COSE-compliant CBOR (thanks to @nodh). While some canonicalization steps (such as sorting keys) still needs to be performed manually. It does get the job done quite well. Namely, we have successfully used the features introduced here to create and validate ISO/IEC 18013-5:2021-compliant mobile driving license data.

This PR introduces the following features to the CBOR format:

  • Serial Labels
  • Tagging of keys and values
  • Definite length encoding (this is the largest change, as it effectively makes the cbor encoder two-pass)
  • Globally preferring major type 2 for byte array encoding
  • Encodes null complex objects as empty map (to be COSE-compliant)

@sandwwraith
Copy link
Member

Hi, thanks for your PR! Just as a quick heads-up so you know I've seen it — I'm going on vacation soon, so I'll be able to properly review it in September.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I appreciate the amount of work you've done — especially for the tests. I didn't look into the implementation (yet), focusing on surface API for now.

Regarding failure on CI — it's connected to the fact that Long.toUnsignedString was added only in Android API 26. However, it is possible to use this declaration on Android with desugaring enabled, so there shouldn't be any problems. You have to crate a special annotation @SuppressAnimalSniffer (it already exists for core and json, but not for cbor, see e.g. here:

@SuppressAnimalSniffer // Long.toUnsignedString(long)
) and use it in configuration here:
switch (name) {

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the test thoroughly (to understand if there is a need for more test cases), but I highlighted all of the problematic moments from my perspective. Don't forget to press 'show hidden conversations' on Github :)

* Note that `equals()` and `hashCode()` only use `value`, not `serialized`.
*/
@Serializable(with = ByteStringWrapperSerializer::class)
public class ByteStringWrapper<T>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: it seems you need to update apiDump since this class is no longer data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, done!

@Serializable(with = ByteStringWrapperSerializer::class)
public class ByteStringWrapper<T>(
public val value: T,
public val serialized: ByteArray = byteArrayOf()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've skimmed through the diff and can't find any usages of this value. Is there any reason to be a val? In what cases will the user be interested in accessing it? Also, it is very easy to create inconsistent data with this approach (e.g. ByteStringWrapper(original.value, byteArrayOf(garbage))). If it is not necessary to have it, this class can be a value class. Or even no class would be needed, as this can be handled with yet another @SerialInfo annotation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nodh can you take care of this, please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll use the CBOR serialization for ISO 18013-5 compliaent mobile driving licences. There verifiers need to check that (the digest of) the serialized byte values appear in another structure (signed by the issuer). So in that case its quite useful to have the bytes as they were serialized (and parsed) in the deserialized ByteStringWrapper object too.

@SerialInfo
@Target(AnnotationTarget.CLASS)
@ExperimentalSerializationApi
public annotation class CborArray(@OptIn(ExperimentalUnsignedTypes::class) vararg val tag: ULong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vararg val tag is still undocumented though

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! done!

readProperties++
descriptor.getElementIndexOrThrow(elemName)
descriptor.getElementIndexOrThrow(elemName).also { index ->
if (cbor.verifyKeyTags) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of duplicated code can be extracted to function (perhaps local)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else {
if (cbor.verifyKeyTags) {
descriptor.getKeyTags(index)?.let { keyTags ->
if (!(keyTags contentEquals tags)) throw CborDecodingException("CBOR tags $tags do not match declared tags $keyTags")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General observation (for all throw expressions): It is would be much easier to debug errors if they also included locations. Here we don't have a json path analog, but appending descriptor.toString() will already make the job easier significantly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done where we have access to the descriptor (which we do not always have)

readByte()
}

return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO also creates unnecessary nesting here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to be transformed into an ULongArray for comparison and for returning it. the alternative of adding a temp variable just for two accesses seems a bit cumbersome to me


return (if (collectedTags.isEmpty()) null else collectedTags.toULongArray()).also { collected ->
tags?.let {
if (!it.contentEquals(collected)) throw CborDecodingException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that contentEquals has both nullable receiver and parameter. It results in behavior like this: intArrayOf(1,2,3).contentEquals(null) -> false. I'm not sure it is what you intended to have here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to compare if tags are actually set. Otherwise, we don't care.

I'll also add that comment to the code.

@@ -633,9 +952,55 @@ private fun Iterable<ByteArray>.flatten(): ByteArray {

@OptIn(ExperimentalSerializationApi::class)
private fun SerialDescriptor.isByteString(index: Int): Boolean {
return getElementAnnotations(index).find { it is ByteString } != null
return kotlin.runCatching { getElementAnnotations(index).find { it is ByteString } != null }.getOrDefault(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why runCatching is needed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only a theoretical possibility, but an IndexOutOfBoundsException could be thrown. I'll remove it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, such a call fails for the WASM target, even though we use runCatching. I have no explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the problem? I do not see any related WASM failures

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was an implementation bug on my end, that has ben resolved in the meantime. I removed the runCatching in a1a7dad

@develar
Copy link

develar commented Dec 17, 2023

Globally preferring major type 2 for byte array encoding

It would be awesome, as currently it's somewhat surprising that the default encoded size of byte arrays is nearly twice as large.

@JesusMcCloud
Copy link
Author

WASM still fails, because even runCatching does not catch a runtimeException for an expected IndexOutOfBoundsException when checking for tags. Help would be much appreachiated.

@JesusMcCloud
Copy link
Author

JesusMcCloud commented Jan 12, 2024

WASM still fails, because even runCatching does not catch a runtimeException for an expected IndexOutOfBoundsException when checking for tags. Help would be much appreachiated.

Here are the details:

  Current test: testOpenPolymorphism
  Process output:
   wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:1
  
  
  RuntimeError: array element access out of bounds
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.Array.get (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[2207]:0x4247e)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.descriptors.SerialDescriptorImpl.getElementAnnotations (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[3974]:0x595a3)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.getKeyTags (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5369]:0x6b6e2)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.Preamble.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5180]:0x68fa4)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.Token.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5173]:0x68d56)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.internal.CborWriter.encode (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5294]:0x6a0c3)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.Cbor.encodeToByteArray (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5042]:0x6743b)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlinx.serialization.cbor.CborPolymorphismTest.testOpenPolymorphism (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[6111]:0x79d2a)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.wasm.internal.testContainer$kotlinx.serialization.cbor test fun$CborPolymorphismTest test fun$testOpenPolymorphism test fun.invoke (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[6715]:0x8440e)
      at <org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>.kotlin.test.TeamcityAdapter$test$lambda.invoke (wasm://wasm/<org.jetbrains.kotlinx:kotlinx-serialization-cbor_test>-00616f4e:wasm-function[5605]:0x6f4a9)

EDIT: seems it's already been reported and there's really not much we can do: https://youtrack.jetbrains.com/issue/KT-59081/WASM-Cant-catch-index-out-of-bounds-and-divide-by-0-exceptions

@sandwwraith sandwwraith self-assigned this Apr 17, 2024
@JesusMcCloud
Copy link
Author

In my browser, this PR is now behaves uber-glitchy when all comments are shown, maybe it is just too large. The page jumps around when scrolling and is utterly unusable for me. I really tried to address all comments and i sincerely hope I did, because every single one was very valuable and improved the overall quality. If something slipped through, I'm really sorry!
If this page still renders fine for you, hopefully everything shows up as outdated. The gliches are also the reason I stopped accepting suggestion through here and simply applied them and committed them regularly.

Copy link
Contributor

@whyoleg whyoleg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like I have nothing more to add here except for a few minor final touches :)
Tremendous amount of work! Thank you for your contribution and fixing all the comments!

P.S. I've mostly reviewed public API all the time as I don’t know all the details of CBOR implementation

* to annotate every `ByteArray` in a class hierarchy.
*
*/
public class CborConfiguration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ExperimentalSerializationApi as Cbor is all experimental and probably it is better to make constructor internal. toString would be nice, like in JsonConfiguration, but optional

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, you are right. will do!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 167877b

return reader.decodeSerializableValue(deserializer)
}
}

@OptIn(ExperimentalSerializationApi::class)
private class CborImpl(encodeDefaults: Boolean, ignoreUnknownKeys: Boolean, serializersModule: SerializersModule) :
Cbor(encodeDefaults, ignoreUnknownKeys, serializersModule)
private class CborImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I believe it can accept just configuration and redirect to Cbor constructor (like in Json)
Overall, IMO, it would be nice to have even just small implementation details be aligned between formats, so that those who will want to develop custom formats will see the pattern and the whole ecosystem of formats will be aligned on public API and internals somehow. But it is a minor and can be done in a follow-up PR if needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working with what I got and did not really consider aligning across formats, but you are right, of course!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 2294cc7

val reference = ClassAs1Array(alg = -7)

val cbor = Cbor(from = Cbor.CoseCompliant) {
useDefiniteLengthEncoding = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flag is not needed here anymore as it is set in CoseCompilant. Same in several more places in this file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 507ccd1

import kotlin.test.*


class CborDefLenTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: would be nice to have full name here instead of shortcuts :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 15bbc3f

@@ -13,9 +15,9 @@ class CborReaderTest {

private val ignoreUnknownKeys = Cbor { ignoreUnknownKeys = true }

private fun withDecoder(input: String, block: CborDecoder.() -> Unit) {
private fun withDecoder(input: String, block: CborParser.() -> Unit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, optional: class is now called CborParser so both test name and function can be renamed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split up into tow distinct Test classes in 46227cf

keyTags.joinToString(
prefix = "[",
postfix = "]"
) { it.toString() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you copied it.toString()from CborParser.processTags here, CborReader now also requires @SuppressAnimalSniffer annotation. But I would prefer if you just add another method to CborParser — e.g. verifyTagsAndThrow(expected: ULongArray, actual: ULongArray) to reduce the amount of copy-pasted code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I understood correctly. See 7ec8e02

Copy link
Contributor

@whyoleg whyoleg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Github is not very good at tracking large conversations.
Looks good to me after fixing minor comments!

docs/formats.md Outdated Show resolved Hide resolved
docs/formats.md Outdated Show resolved Hide resolved
docs/formats.md Outdated Show resolved Hide resolved
docs/formats.md Outdated Show resolved Hide resolved
Co-authored-by: Leonid Startsev <sandwwraith@users.noreply.github.com>
@JesusMcCloud
Copy link
Author

I have to say, I now understand why kotlinx.serialization works as well and reliably as it does. you really do take QA seriously!

@sandwwraith sandwwraith merged commit 2017084 into Kotlin:dev Jul 22, 2024
3 checks passed
@sandwwraith
Copy link
Member

I want to thank you again for this huge amount of work and your time. Well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants