Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization depends on reference identity #7209

Open
ulrikrasmussen opened this issue Jun 24, 2022 · 1 comment
Open

Serialization depends on reference identity #7209

ulrikrasmussen opened this issue Jun 24, 2022 · 1 comment

Comments

@ulrikrasmussen
Copy link
Contributor

Consider the following program:

import net.corda.core.serialization.serialize
import net.corda.testing.node.internal.setDriverSerialization

fun main(args: Array<String>) {
    setDriverSerialization(ClassLoader.getSystemClassLoader())
    val x = Pair("foo", "bar")
    val list1 = listOf(x, x.copy())
    val list2 = listOf(x, x)
    println("list1 == list2: ${list1 == list2}")
    println("list1.serialize() == list2.serialize(): ${list1.serialize() == list2.serialize()}")
}

This prints:

list1 == list2: true
list1.serialize() == list2.serialize(): false

Serialization is not bijective!

This is because Corda automatically de-duplicates multiple occurrences of objects in the serialized data using an IdentityHashMap, i.e. by reference equality (see objectHistory in SerializationOutput.kt). Supposedly this is done to reduce the size of serialized data in some cases.

The behavior is a bit unfortunate because transaction hashes now depend on the degree of sharing in the heap. Reference equality is irrelevant from a semantic perspective for all data classes, so two transactions can be semantically equivalent but still have different serialized representations. This doesn't cause any problems when transactions are created by a central party and distributed from there, but consider a use case where a transaction contains fairly large data objects which are already known by all parties in a flow that need to sign the transaction. In that case, the parties can just exchange information such as the privacy nonce and deterministically derive the transaction locally, thereby avoiding transmitting large objects to each other. This approach however becomes quite brittle when reference identity of data (which is often non-deterministic, e.g. in the case of a cache eviction) affects the transaction hash.

I think that this behavior should be disabled or at least be optional.

@r3jirabot
Copy link

Automatically created Jira issue: CORDA-4260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants