protocol hardfork: fixed-length OutputID replacing Outpoint #417

oleganza · 2017-01-20T20:11:42Z

REPLACED BY #421

Problem

Outpoint is a variable-length structure <txid>:<index> which is 33-40 bytes long (33 bytes for most transactions). It is used by transaction inputs to identify exact output in the UTXO set ("Assets Merkle Tree"). The tree leafs contain SHA3(output) which allows save space and requires transactions to carry redundant copies of spent outputs to perform validation (otherwise nodes would have to store the entire outputs instead of their hashes — over 2x more data, and the ratio is much bigger in protocol 2). Also, for HSM-friendliness the TXSIGHASH must contain a redundant output's hash: SHA3(txid || input index || SHA3(output)).

Solution:

We define two new terms:

OutputID = SHA3(TxHash || OutputIndex)
UnspentID = SHA3(OutputID || SHA3(OutputCommitment))

How are these used:

Transaction input contains OutputID to identify the output being spent. This is a unique identifier of the output.
Transaction input uses second serialization flag to indicate if it contains the entire previous Output Commitment, or its hash (instead of empty place).
UTXO set becomes a proper set containing UnspentIDs instead of {Outpoint -> SHA3(OutputCommitment)}.

When a node validates a transaction, it computes UnspentID using provided OutputID and previous OutputCommitment. If the given unspent ID is present in the UTXO set, then previous output is proved to be both authentic and available for spending.

Upsides:

The outputID is constant-size and shorter: 32 bytes instead of 33-40 bytes. This simplifies merkle tree design, transaction data structure and all pieces of software that need to handle outpoints.
All outputs (via unspentIDs) in the transaction are randomized across the Assets Merkle Tree instead of being crammed inside a common subpath <txid>||....
Inputs automatically commit directly to the spent outputs, so TXSIGHASH does not need to do that and can be simplified to SHA3(txid || input index). HSM is able to verify which output this input commits to without having access to the entire parent transaction.
We keep the term outpoint to mean a pair (txid, index), but is internal to Chain Core to support random access to UTXOs. Validation protocol no longer uses outpoints.
UTXO takes 2x less RAM because it only contains unpent IDs (32 bytes) instead of a key-value pair (64+ bytes).
When we get to tx entries design, we'll generalize the idea of OutputID to EntryID, so that any entry can have a unique identifier.

Downsides:

OutputID no longer indicates the transaction ID which makes it impossible to navigate the chain of transactions without also having a mapping outpoint -> txid:index. UTXO tree is not enough as it's only reflecting the latest state of the chain and throws away spent outpoints. Note that in order to navigate the transactions in practice one still needs the mapping txid -> tx, so maintaining one more index might not be a significant increase in complexity. Chain is doing this indexing already and we keep that mapping.
Chain Core no longer returns (txid,position) pair for annotated txinputs (called spent_output:{transaction_id:String,position:Int}), but instead returns output_id (spent_output_id:String). To maintain full compatibility, we'd need to make an additional request to locate the previous output's txid and position, but I'm not sure any application actually relies on such historical data. For spending (locating unspents), we fully maintain compatibility with clients using (txid,position) pairs.

This is a part of a package of breaking changes in P1: #239

REPLACED BY #421

oleganza · 2017-01-20T22:40:06Z

PTAL @kr @jbowens @tessr @erykwalder @dominic @jeffomatic @danrobinson @devongundry

jbowens · 2017-01-21T01:42:37Z

core/account/builder.go

@@ -93,11 +93,10 @@ func (a *spendAction) Build(ctx context.Context, b *txbuilder.TemplateBuilder) e
 	return nil
 }

-func (m *Manager) NewSpendUTXOAction(outpoint bc.Outpoint) txbuilder.Action {
+func (m *Manager) NewSpendUTXOAction(outputid bc.OutputID) txbuilder.Action {


nit: s/outputid/outputID/

jbowens · 2017-01-21T01:45:00Z

core/account/indexer.go

@@ -89,24 +99,23 @@ func (m *Manager) indexAccountUTXOs(ctx context.Context, b *bc.Block) error {
 	}

 	// Delete consumed account UTXOs.
-	deltxhash, delindex := prevoutDBKeys(b.Transactions...)
+	deloutputids := prevoutDBKeys(b.Transactions...)


nit: s/deloutputids/delOutputIDs/

jbowens · 2017-01-21T01:50:31Z

core/query/index.go

-				prevoutHashes = append(prevoutHashes, outpoint.Hash[:])
-				prevoutIndexes = append(prevoutIndexes, outpoint.Index)
+				oid := in.OutputID()
+				prevoutOIDs = append(prevoutOIDs, oid[:])


nit: OID might get confused with Postgres's object identifiers. Maybe rename to prevoutIDs?

jbowens · 2017-01-21T01:51:35Z

core/schema.sql

@@ -7,6 +7,7 @@

 SET statement_timeout = 0;
 SET lock_timeout = 0;
+SET idle_in_transaction_session_timeout = 0;


this is Postgres 9.6 only. should probably delete this since some of us are still on 9.5

jbowens · 2017-01-21T01:53:28Z

protocol/bc/outpoint.go

+import (
+	"chain/encoding/blockchain"
+	"io"
+	"strconv"


nit: import grouping

oleganza · 2017-01-21T20:40:54Z

Thanks, @jbowens. Addressed all nits.

kr

Minor comment on the main description for the PR: that text will be used as the commit message, so the wording should probably be tweaked so it's not presented as a "proposal" but instead as "here's what we did".

kr · 2017-01-21T23:49:42Z

app.json

    }
  ],
  "env": {
    "GO_INSTALL_PACKAGE_SPEC": "./cmd/...",
    "CMD": "cored"
  }
-}
+}


nit: looks like this file didn't actually change, so ideally it shouldn't appear in the diff.

yup, reverted.

kr · 2017-01-22T00:03:47Z

core/account/builder.go

+			return txbuilder.MissingFieldsError("output_id")
+		}
+		outid = *a.OutputID
+	}


Maybe this should collapse the error cases into one:

if a.OutputID == nil && (a.TxHash == nil || a.TxOut == nil) { return txbuilder.MissingFieldsError("output_id") } if a.OutputID == nil { outid = *a.OutputID } else { outid = bc.ComputeOutputID(*a.TxHash, *a.TxOut) }

If someone is using the old interface, and using it incorrectly, such that they would have got a MissingFieldsError for transaction_id or position, they'll have to update their code anyway, at which point they might as well start using output_id.

That's harder to parse what's being meant here... But I get the point, how about this flat list of 3 options: 36ad7c0

kr · 2017-01-22T00:07:44Z

core/account/indexer.go

@@ -51,8 +51,14 @@ func (m *Manager) indexAnnotatedAccount(ctx context.Context, a *Account) error {
 	})
 }

-type output struct {
+type outputWithOutpoint struct {


Looking at the names output and outputWithOutpoint side by side, it seems like outputWithOutpoint would have a superset of the field of output, but actually it's the opposite. How do you feel about something like accountOutput and output (respectively) instead?

Good call. Renamed to rawOutput and accountOutput: 31e2a03

kr · 2017-01-22T00:15:48Z

core/migrate/data.go

@@ -107,4 +107,12 @@ var migrations = []migration{
 	{Name: "2017-01-19.0.asset.drop-mutable-flag.sql", SQL: `
 		ALTER TABLE assets DROP COLUMN definition_mutable;
 	`},
+	{Name: "2017-01-20.0.core.add-output-id-to-outputs.sql", SQL: `
+		ALTER TABLE annotated_outputs ADD COLUMN output_id bytea NOT NULL;
+		ALTER TABLE ONLY annotated_outputs ADD CONSTRAINT annotated_outputs_unique_output_id UNIQUE (output_id);


These two lines are equivalent to:

ALTER TABLE annotated_outputs ADD COLUMN output_id bytea UNIQUE NOT NULL;

Thanks. Addressed: 3e96e6b

kr · 2017-01-22T00:18:30Z

core/migrate/data.go

+		ALTER TABLE account_utxos ADD COLUMN output_id bytea NOT NULL;
+		ALTER TABLE ONLY account_utxos ADD CONSTRAINT account_utxos_unique_output_id UNIQUE (output_id);
+		ALTER TABLE account_utxos ADD COLUMN unspent_id bytea NOT NULL;
+		ALTER TABLE ONLY account_utxos ADD CONSTRAINT account_utxos_unique_unspent_id UNIQUE (unspent_id);


You can do more than one thing in a single ALTER TABLE ... statement. Consider this:

ALTER TABLE account_utxos ADD COLUMN output_id bytea UNIQUE NOT NULL, ADD COLUMN unspent_id bytea UNIQUE NOT NULL;

Thanks. Addressed: 3e96e6b

kr · 2017-01-22T00:48:30Z

protocol/bc/outputid.go

+)
+
+// OutputID identifies previous transaction output in transaction inputs.
+type OutputID Hash


This seems like a case where embedding the other type would be preferable, since we want to keep all the methods.

// OutputID identifies previous transaction output in transaction inputs. type OutputID struct{ Hash } // UnspentID identifies and commits to unspent output. type UnspentID struct{ Hash }

If we did that, converting from a Hash to OutputID and back would look like:

outputID = OutputID{hash} hash = outputID.Hash

The memory layout is the same either way.

Thanks, I didn't know about this feature. I've also tried to use outid.Bytes() instead of tmp := outid.Hash; tmp[:] when I need a byteslice. Is that the right way? 9042b59

I've also tried to use outid.Bytes() instead of tmp := outid.Hash; tmp[:] when I need a byteslice. Is that the right way?

I see nothing wrong with writing outid.Hash[:], but I guess when the expression outid is actually a function call, it can't be written without a temporary variable. The Bytes method seems reasonable to me! But I wouldn't get dogmatic about using it everywhere; if outid.Hash[:] in one line is also an option, that's fine too IMO.

kr · 2017-01-22T00:57:45Z

protocol/bc/outputid.go

+}
+
+// WriteTo writes p to w.
+// It assumes w has sticky errors.


This sentence isn't true; it should be deleted.

kr · 2017-01-22T01:08:51Z

protocol/state/outputs.go

-	o.WriteTo(w)
-	return b.Bytes()
+func OutputKey(o bc.UnspentID) (bkey []byte) {
+	// TODO(oleg): check if we no longer need this buffer writing.


You're right, we don't need it. The buffer writing was just to serialize the outpoint. Since an UnspentID's internal representation is the actual bytes we want to put in the tree, we can just return them as you're doing here.

In fact, maybe we should inline (get rid of) this whole function.

Removed. And added convenience Bytes()->[]byte methods to avoid creating local var all the time to do a slice. Is it a good idea? 7327600

kr · 2017-01-22T01:11:08Z

protocol/vm/introspection.go

@@ -193,7 +193,7 @@ func opIndex(vm *virtualMachine) error {
 	return vm.pushInt64(int64(vm.inputIndex), true)
 }

-func opOutpoint(vm *virtualMachine) error {
+func opOutputid(vm *virtualMachine) error {


nit: opOutputID

kr · 2017-01-22T01:19:09Z

wording should probably be tweaked

Actually this might just consist of replacing the one word "proposal" with "solution" or something along those lines.

…h setOutputId

… return output ids yet

…local variable

oleganza · 2017-01-23T00:47:20Z

Replaced by #421

**Problem** Outpoint is a variable-length structure `<txid>:<index>` which is 33-40 bytes long (33 bytes for most transactions). It is used by transaction inputs to identify exact output in the UTXO set ("Assets Merkle Tree"). The tree leafs contain `SHA3(output)` which allows save space and requires transactions to carry redundant copies of spent outputs to perform validation (otherwise nodes would have to store the entire outputs instead of their hashes — over 2x more data, and the ratio is much bigger in protocol 2). Also, for HSM-friendliness the TXSIGHASH must contain a redundant output's hash: `SHA3(txid || input index || SHA3(output))`. **Solution:** We define two new terms: * `OutputID = SHA3(TxHash || OutputIndex)` * `UnspentID = SHA3(OutputID || SHA3(OutputCommitment))` How are these used: 1. Transaction input contains **OutputID** to identify the output being spent. This is a unique identifier of the output. 2. Transaction input uses **second serialization flag** to indicate if it contains the entire previous Output Commitment, or its hash (instead of empty place). 3. UTXO set becomes a **proper set** containing **UnspentIDs** instead of `{Outpoint -> SHA3(OutputCommitment)}`. When a node validates a transaction, it computes `UnspentID` using provided `OutputID` and previous `OutputCommitment`. If the given unspent ID is present in the UTXO set, then previous output is proved to be both authentic and available for spending. **Upsides:** 1. The outputID is constant-size and shorter: 32 bytes instead of 33-40 bytes. This simplifies merkle tree design, transaction data structure and all pieces of software that need to handle outpoints. 2. All outputs (via unspentIDs) in the transaction are randomized across the Assets Merkle Tree instead of being crammed inside a common subpath `<txid>||...`. 3. Inputs automatically commit directly to the spent outputs, so TXSIGHASH does not need to do that and can be simplified to `SHA3(txid || input index)`. HSM is able to verify which output this input commits to without having access to the entire parent transaction. 4. We keep the term _outpoint_ to mean a pair `(txid, index)`, but is internal to Chain Core to support random access to UTXOs. Validation protocol no longer uses outpoints. 5. UTXO takes 2x less RAM because it only contains unpent IDs (32 bytes) instead of a key-value pair (64+ bytes). 6. When we get to _tx entries_ design, we'll generalize the idea of OutputID to EntryID, so that any entry can have a unique identifier. **Downsides:** 1. OutputID no longer indicates the transaction ID which makes it impossible to navigate the chain of transactions without also having a mapping `outpoint -> txid:index`. UTXO tree is not enough as it's only reflecting the latest state of the chain and throws away spent outpoints. Note that in order to navigate the transactions in practice one still needs the mapping `txid -> tx`, so maintaining one more index might not be a significant increase in complexity. Chain is doing this indexing already and we keep that mapping. 2. Chain Core no longer returns (txid,position) pair for annotated txinputs (called `spent_output:{transaction_id:String,position:Int}`), but instead returns output_id (`spent_output_id:String`). To maintain full compatibility, we'd need to make an additional request to locate the previous output's txid and position, but I'm not sure any application actually relies on such historical data. For spending (locating unspents), we fully maintain compatibility with clients using (txid,position) pairs. This is a part of a package of breaking changes in P1: #239 See previous reviews: #417 Closes #421

oleganza added the blockchain reset label Jan 20, 2017

This was referenced Jan 20, 2017

Breaking changes to the Protocol 1 #239

Closed

REPLACED protocol: [hardfork] fixed-length OutputID replacing Outpoint #270

Closed

oleganza force-pushed the protocol-outputid branch from e5b3127 to 2837c16 Compare January 20, 2017 23:56

jbowens reviewed Jan 21, 2017

View reviewed changes

kr reviewed Jan 22, 2017

View reviewed changes

oleganza force-pushed the protocol-outputid branch from 0d678f9 to 607dbe9 Compare January 22, 2017 07:27

oleganza added 20 commits January 22, 2017 16:29

squashed on top of new refactoring

45028a4

fixed up block binary for the benchmark

0879a85

do not recompute outputid unnecessarily

6d4b810

compat layer to support both txid:index and output_id in API

2651c2f

output_id support in Ruby SDK

8835abc

updated java sdk to support output_id

84c2a24

replace examples and test code using setPosition/setTransactionId wit…

91451b6

…h setOutputId

hide output navigation from the sample code snippets

069a6a8

add output_id to unspent docs

551b431

update docs for api objects

3c53474

typo fix

6e98c58

replace calls to spentOutput with spentOutputId

d8febc7

semicolon, rly?

92cee94

make the examples execute w/o output_id since SubmitResponse does not…

c11d04c

… return output ids yet

making jfmt happy

452d002

added outputId to node SDK

aab7e2c

addressed nits

1dd20c7

txin.OutputID renamed to txin.SpentOutputID for clarity and to match API

fa3b0b9

revert formatting in app.json

e3f444f

reorganize error checking to follow them easier

2990520

oleganza added 6 commits January 22, 2017 16:29

(output,outputWithOutpoint) -> (accountOutput,rawOutput)

f5fd7ac

cleaner schema migration

34d1788

removed incorrect comment

5fd2ed9

better name for opOutputID

c1e3952

eliminate OutputKey(), add Bytes() method to avoid assigning a nasty …

d8b0eff

…local variable

embed Hash into OutputID and UnspentID types

8d24965

oleganza force-pushed the protocol-outputid branch from 9042b59 to 8d24965 Compare January 23, 2017 00:29

oleganza mentioned this pull request Jan 23, 2017

protocol/state: replace Outpoint with OutputID #421

Merged

oleganza closed this Jan 23, 2017

oleganza deleted the protocol-outputid branch January 26, 2017 04:38

jeffomatic mentioned this pull request Feb 13, 2017

Spec deviation: input writes prevout unconditionally for all serialization flags #381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protocol hardfork: fixed-length OutputID replacing Outpoint #417

protocol hardfork: fixed-length OutputID replacing Outpoint #417

oleganza commented Jan 20, 2017 •

edited

oleganza commented Jan 20, 2017

jbowens Jan 21, 2017

jbowens Jan 21, 2017

jbowens Jan 21, 2017

jbowens Jan 21, 2017

jbowens Jan 21, 2017

oleganza commented Jan 21, 2017

kr left a comment

kr Jan 21, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 23, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr Jan 22, 2017

oleganza Jan 22, 2017

kr commented Jan 22, 2017

oleganza commented Jan 23, 2017

protocol hardfork: fixed-length OutputID replacing Outpoint #417

protocol hardfork: fixed-length OutputID replacing Outpoint #417

Conversation

oleganza commented Jan 20, 2017 • edited

REPLACED BY #421

REPLACED BY #421

oleganza commented Jan 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleganza commented Jan 21, 2017

kr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kr commented Jan 22, 2017

oleganza commented Jan 23, 2017

oleganza commented Jan 20, 2017 •

edited