Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tearoff fixes #78

Merged
merged 16 commits into from Feb 3, 2017
Merged

Tearoff fixes #78

merged 16 commits into from Feb 3, 2017

Conversation

kasiastreich
Copy link
Contributor

Add timestamp to merkle tree calculation.
Add fix of duplicate node issue by changing duplicated hash by adding an index of duplicated leave at a given level in a tree.

The problem was that it was possible to construct 2 different transactions with the same ids. Having number of leaves that were not power of 2. However, it's hard to use such situation in a meaningful way (in most cases transactions like that shouldn't pass contract verification). We should also consider, if we really want to have a transaction with the same commands or attachments, the same inputs are impossible (double spend). Only way that it is reasonable to have the same entries in tx are outputs and it is theoretically possible to use leaves duplication fact to add additional outputs at the end. For this trick to work such transaction can consist only of inputs or outputs, no commands, attachments, timestamp (but as mentioned before, for cash it wouldn't pass contract verification - amounts don't match).

…ex of duplicated leave at a given level in a tree.

The problem was that it was possible to construct 2 different transactions with the same ids. Trick worked for txs having
number of leaves that were not power of 2.
@kasiastreich kasiastreich self-assigned this Dec 20, 2016
@kasiastreich kasiastreich changed the title Kstreich tearoff fixes Tearoff fixes Dec 20, 2016
@mikehearn
Copy link
Contributor

mikehearn commented Dec 20, 2016

Great! I'll take a look at the code soon but before I start, I was thinking yesterday that a simpler fix than including an index into the hash and checking it would be to just pad the tree with the zero hash. As nobody knows a pre-image for SHA2(x) == 0 that would also fix the issue in a simpler way. Does it make sense?

@mikehearn
Copy link
Contributor

Code looks good - nice improvements - but:

  • I think there are still fields missing like 'type', 'notary' etc.
  • I think using the zero hash would be simpler than the current trickyness around repeated hashing, for the case of a duplicated last node.

WDYT?

@kasiastreich
Copy link
Contributor Author

I thought about zero hashes, they are also fine, I can implement that if you think it's nicer. As to fields missing, from our previous conversations I understood that leaves should include inputs, outputs, attachments, commands and timestamp only (based on whitepaper tear-offs section: "Typically an oracle will be presented with the Merkle branches for the command or state that contains the data, and
the timestamp field, and nothing else."), unless you want that additional data to be just sent with FilteredTransaction.

@mikehearn
Copy link
Contributor

The hash of the transaction has to cover everything. When the white paper says "presented" it means that the Merkle branches for (and contents of) those fields are presented, not that two transactions with different notaries should have an identical Merkle root.

Yeah I think the zero hash is a bit simpler, right?

@kasiastreich
Copy link
Contributor Author

Ok, I misunderstood it at the beginning (conversations, jira etc.) - I will add type, notary, signers and implement zero hashes. Diagrams in the whitepaper also need to be changed.

…en level to padding leaves' list

with zero hash to size of the nearest power of 2 - so we always have a full binary tree.
…struction of Merkle trees - padding with

zero hashes and including all WireTransaction fields in id computation.
@kasiastreich
Copy link
Contributor Author

I addressed the comments.

return x.serialize(kryo).hash
}

val zeroHash = SecureHash.SHA256(ByteArray(32, { 0.toByte() }))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in the SHA256 companion object rather than in this time. That way it's available for everyone.

val newHash = this.hash.hashConcat(right.hash)
return Node(newHash, this, right)
}

// Check if a tree is full binary tree. Returns the height of the tree if full, otherwise throws exception.
@Throws(MerkleTreeException::class)
fun checkFull(level: Int = 0): Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be a public API? If not, make it private and you can take out the @throws annotation (it's only useful for java interop). If it is, make the documentation use standard kdoc / javadoc syntax /** */ so it shows up in the generated docs.


fun hashNodes(right: MerkleTree): MerkleTree {
private fun hashNodes(right: MerkleTree): MerkleTree {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function can be moved into the function where it's used, as it has only one call site.

@@ -102,17 +105,22 @@ sealed class MerkleTree(val hash: SecureHash) {

/**
* Class that holds filtered leaves for a partial Merkle transaction. We assume mixed leaves types.
* Notice that we include only certain parts of wire transaction (no type, signers, notary). Timestamp is always added,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not allow all parts of the tx to be present in the filtered tree? I don't get why it's limited here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It boils down to mixed information I got before. Why Oracle needs to see all signers of tx (I am again referring to tech WP - use case described there is different)? Or if it doesn't have to, we need to provide additional filters on transaction or user needs to check type, either way it's complication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this API would be as generic as possible: it'd let you select the subset of the tx you reveal. The current API is sufficient for the primary use case, but it wouldn't necessarily stretch to others.

@@ -42,7 +43,7 @@ class WireTransaction(
@Volatile @Transient private var cachedBytes: SerializedBytes<WireTransaction>? = null
val serialized: SerializedBytes<WireTransaction> get() = cachedBytes ?: serialize().apply { cachedBytes = this }

//We need cashed leaves hashes and whole tree for an id and Partial Merkle Tree calculation.
// We need cashed leaves hashes and whole tree for an id and Partial Merkle Tree calculation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cached"

I'm not sure why we need this though. The comment says we do, but doesn't explain why. Why not just cache the id hash, why cache the intermediate parts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is ancient, but I think that my reasoning behind this was that you may want to build many partial trees from one tx. Building is done using already calculated hash tree, probably not that costly though (but there is no upper bound on tx size). I guess it's not a problem so I can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if we find we need such a thing and need it so often that we have to optimise hash calculation, we can put it back. For now I'd like to keep it simple.

/**
* Build filtered transaction using provided filtering functions.
*/
fun buildFilteredTransaction(filterFuns: FilterFuns): FilteredTransaction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a bit of thinking around the FilterFuns API. It seems like it may be awkward to use from Java (functions with default arguments need a bit of annotating to work well from Java) and hard to extend in future. Why not just have an (Any) -> boolean function and expect the user to do type testing. That way, if we ever add a new kind of thing to transactions, existing code still works and doesn't break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some reading on interoperability, thanks.
I have few questions to how you see FilterFuns API (as I don't know other use cases of more general approach and I would love to learn some).

  • Do you want to have possibility of filtering every field of a tx? Or, say timestamp/notary/type is always included?
  • What checks will be performed after receiving filtered transaction. In Oracle case I see type checks and contents, however I can imagine some more elaborate verification that starts to complicate in a direction of subtransaction verification (if inputs/outputs are included).
  • Or for now, just don't worry and make it as simple as possible with moving all filtering (with one function) and checking to the user, but this approach is flattening the transaction - I am not sure if it's what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to go for more flexibility here rather than less, as long as the receiver of a filtered TX can easily say "it must have components X, Y and Z" and the API requires them to do so. So trying to make the data format flexible but the receive API hard to misuse. So yeah - why not allow every field to be filtered.

The point here is that we will need to freeze the API this year, and then if we've provided something that's just not flexible enough and which we can't evolve, we'd need to provide a second API, which isn't so great.

entries.forEach { it.mapTo(resultHashes, { x -> serializedHash(x) }) }
if (timestamp != null) resultHashes.add(serializedHash(timestamp))
if (notary != null) resultHashes.add(serializedHash(notary))
resultHashes.add(serializedHash(mustSign.sortedBy { it.hashCode() }))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can assume that List<> is always ordered, or should be, so there is no reason to sort here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it dependent on the order the commands were added when building tx (TransactionBuilder)? Then you will have two txs with different ids and the same content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering is implicitly a part of the content in these sorts of structures. If you reverse the order of inputs, for instance, it's correct that this yields a different hash.

We don't really want to be incorporating java hashcodes into actual secure hashes used for transaction IDs. Ideally it is as close to hashing the serialised binary as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, 'ordering of commands' is a bad example, but there is a case of having transactions with the same contents ordered in exactly the same way on every field but signers in different. When you first add input with notary then commands, against: first commands then input. Is ordering of signers that important? It's a field derived from other fields in the transaction as a function of it's contents, not referentially transparent. Additionally I don't incorporate them in the tree separately - it's one hash - just want to be sure it's not wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mustSign field is nasty and I'd like to get rid of it, so this discussion may not matter much, but having the exact same tx hash result from the same semantically equivalent contents is not necessary.

…alculation.

Instead of many filtering functions over a transaction only one needs to be provided.
Additional change to check and verification of FilteredTransaction.
Make WireTransaction and FilteredLeaves traversable with list functions.
Copy link
Contributor

@mikehearn mikehearn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just a few more things to resolve.

val tree = buildPartialTree(merkleRoot, includeHashes, usedHashes)
//Too much included hashes or different ones.
// Too much included hashes or different ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much -> Too many (indefinite vs definite quantities)

@@ -118,7 +131,7 @@ class PartialMerkleTree(val root: PartialTree) {
fun verify(merkleRootHash: SecureHash, hashesToCheck: List<SecureHash>): Boolean {
val usedHashes = ArrayList<SecureHash>()
val verifyRoot = verify(root, usedHashes)
//It means that we obtained more/less hashes than needed or different sets of hashes.
// It means that we obtained more/less hashes than needed or different sets of hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less -> fewer (this is an obscure English grammar rule that many native speakers get wrong too!)

// We may want to specify our own behaviour on certain tx fields.
// Like if we include them at all, what to do with null values, if we treat list as one or not etc. for building
// torn-off transaction and id calculation.
fun <T>traverseWithListFun(traverseFun: (List<Any>) -> T): T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after <T>

Why not just make this an extension function on WireTransaction, why introduce a new interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hmm, I guess it is so the same interface is implemented by a filtered transaction as well. OK, that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just provide a property that returns all the members as a list? Why insist that a function be applied to the result?

}

/**
* Holds filter functions on transactions fields.
* Functions are used to build a partial tree only out of some subset of original transaction fields.
* Class that holds filtered leaves for a partial Merkle transaction. We assume mixed leaves types, notice that every
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume mixed leaf types

@@ -101,40 +80,65 @@ sealed class MerkleTree(val hash: SecureHash) {
}

/**
* Class that holds filtered leaves for a partial Merkle transaction. We assume mixed leaves types.
* It assures that we always calculate hashes in the same order. Plus lets us define which fields of WireTransaction will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to TraversableTransaction? The "tx" abbreviation is usually kept for little local variables and other places where it's not a big deal to shorten. Also maybe the docs for this interface can be a bit clearer? The current phrasing is a bit awkward.

@@ -15,17 +15,17 @@ You can read more on the concept `here <https://en.wikipedia.org/wiki/Merkle_tre
Merkle trees in Corda
---------------------

Transactions are split into leaves, each of them contains either input, output, command or attachment. Other fields like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason github won't let me comment on line 9, but there's a typo above this section: achive -> achieve

...
// Performing validation of obtained FilteredLeaves.
fun commandValidator(elem: Command): Boolean {
if (!(identity.owningKey in elem.signers && elem.value is Fix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of this line looks wrong.


In the Oracle example this step takes place in ``RatesFixFlow``:

.. container:: codeset

.. sourcecode:: kotlin

val flow = RatesFixFlow(partialTx, filterFuns, oracle, fixOf, "0.675".bd, "0.1".bd)
val flow = RatesFixFlow(partialTx, ::filtering, oracle, fixOf, "0.675".bd, "0.1".bd)


``FilteredTransaction`` holds ``filteredLeaves`` (data that we wanted to reveal) and Merkle branch for them.

.. container:: codeset

.. sourcecode:: kotlin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a way to include parts of source code files directly in the docsite - look at some other files for "literalinclude". That way you don't have to duplicate the code in the docs.

@@ -29,7 +28,7 @@ import java.util.*
*/
open class RatesFixFlow(protected val tx: TransactionBuilder,
/** Filtering functions over transaction, used to build partial transaction presented to oracle. */
private val filterFuns: FilterFuns,
private val filtering: (Any) -> Boolean, // Not the best solution, when lambda expression passed (serialization).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the function still needed after the first checkpoint? It could be marked as transient if we don't want it to survive checkpointing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed in the FixSignFlow subflow after querying the oracle for a fix (filtering is passed in assembleSharedTX to RatesFixFlow). I could just move it inside, but I guess it should belong to assembling transaction? If so, it should survive checkpointing.

}
}

fun filterAllCmds(elem: Any): Boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be fun filterAllCmds(elem: Any) = elem is Command - but then maybe the function name is a bit misleading ("All" in a name normally implies it takes a collection)

@mikehearn
Copy link
Contributor

I thought a bit more about the function and the checkpointing.

You can store a lambda in a checkpoint. It works, technically. But the problem is that it's very likely that developers won't realise how much stuff they're capturing by doing that, and it could result in giant or unworkable checkpoints.

I suggest instead of passing in a lambda, you make the call site call a protected method. The user of the flow can subclass it in order to change what is filtered. This will encourage developers to pass in any data they need explicitly, and that will be more likely to lead to the scope of the checkpoint being limited. It should be a minor change, I think? You can note in the javadoc of the protected method that when overriding, be careful not to make the sub-class an anonymous or inner class that could capture huge chunks of the app.

@mikehearn
Copy link
Contributor

Other than that, it's all looking fine now. This is the last thing we need to consider, I believe.

…ixFlow class.

Comment on situation when capturing too much scope and connected problems with checkpointing.
Change oracle and tear-offs documentation.
@mikehearn
Copy link
Contributor

Looks good to me!

@kasiastreich kasiastreich merged commit 45d8e0f into master Feb 3, 2017
@kasiastreich kasiastreich deleted the kstreich-tearoff-fixes branch February 3, 2017 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants