New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocol hardfork: fixed-length OutputID replacing Outpoint #417
Conversation
e5b3127
to
2837c16
Compare
@@ -93,11 +93,10 @@ func (a *spendAction) Build(ctx context.Context, b *txbuilder.TemplateBuilder) e | |||
return nil | |||
} | |||
|
|||
func (m *Manager) NewSpendUTXOAction(outpoint bc.Outpoint) txbuilder.Action { | |||
func (m *Manager) NewSpendUTXOAction(outputid bc.OutputID) txbuilder.Action { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/outputid/outputID/
@@ -89,24 +99,23 @@ func (m *Manager) indexAccountUTXOs(ctx context.Context, b *bc.Block) error { | |||
} | |||
|
|||
// Delete consumed account UTXOs. | |||
deltxhash, delindex := prevoutDBKeys(b.Transactions...) | |||
deloutputids := prevoutDBKeys(b.Transactions...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/deloutputids/delOutputIDs/
prevoutHashes = append(prevoutHashes, outpoint.Hash[:]) | ||
prevoutIndexes = append(prevoutIndexes, outpoint.Index) | ||
oid := in.OutputID() | ||
prevoutOIDs = append(prevoutOIDs, oid[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: OID
might get confused with Postgres's object identifiers. Maybe rename to prevoutIDs
?
@@ -7,6 +7,7 @@ | |||
|
|||
SET statement_timeout = 0; | |||
SET lock_timeout = 0; | |||
SET idle_in_transaction_session_timeout = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is Postgres 9.6 only. should probably delete this since some of us are still on 9.5
import ( | ||
"chain/encoding/blockchain" | ||
"io" | ||
"strconv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import grouping
Thanks, @jbowens. Addressed all nits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment on the main description for the PR: that text will be used as the commit message, so the wording should probably be tweaked so it's not presented as a "proposal" but instead as "here's what we did".
} | ||
], | ||
"env": { | ||
"GO_INSTALL_PACKAGE_SPEC": "./cmd/...", | ||
"CMD": "cored" | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this file didn't actually change, so ideally it shouldn't appear in the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, reverted.
return txbuilder.MissingFieldsError("output_id") | ||
} | ||
outid = *a.OutputID | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should collapse the error cases into one:
if a.OutputID == nil && (a.TxHash == nil || a.TxOut == nil) {
return txbuilder.MissingFieldsError("output_id")
}
if a.OutputID == nil {
outid = *a.OutputID
} else {
outid = bc.ComputeOutputID(*a.TxHash, *a.TxOut)
}
If someone is using the old interface, and using it incorrectly, such that they would have got a MissingFieldsError for transaction_id or position, they'll have to update their code anyway, at which point they might as well start using output_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's harder to parse what's being meant here... But I get the point, how about this flat list of 3 options: 36ad7c0
@@ -51,8 +51,14 @@ func (m *Manager) indexAnnotatedAccount(ctx context.Context, a *Account) error { | |||
}) | |||
} | |||
|
|||
type output struct { | |||
type outputWithOutpoint struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the names output
and outputWithOutpoint
side by side, it seems like outputWithOutpoint
would have a superset of the field of output
, but actually it's the opposite. How do you feel about something like accountOutput
and output
(respectively) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Renamed to rawOutput
and accountOutput
: 31e2a03
@@ -107,4 +107,12 @@ var migrations = []migration{ | |||
{Name: "2017-01-19.0.asset.drop-mutable-flag.sql", SQL: ` | |||
ALTER TABLE assets DROP COLUMN definition_mutable; | |||
`}, | |||
{Name: "2017-01-20.0.core.add-output-id-to-outputs.sql", SQL: ` | |||
ALTER TABLE annotated_outputs ADD COLUMN output_id bytea NOT NULL; | |||
ALTER TABLE ONLY annotated_outputs ADD CONSTRAINT annotated_outputs_unique_output_id UNIQUE (output_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are equivalent to:
ALTER TABLE annotated_outputs
ADD COLUMN output_id bytea UNIQUE NOT NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Addressed: 3e96e6b
ALTER TABLE account_utxos ADD COLUMN output_id bytea NOT NULL; | ||
ALTER TABLE ONLY account_utxos ADD CONSTRAINT account_utxos_unique_output_id UNIQUE (output_id); | ||
ALTER TABLE account_utxos ADD COLUMN unspent_id bytea NOT NULL; | ||
ALTER TABLE ONLY account_utxos ADD CONSTRAINT account_utxos_unique_unspent_id UNIQUE (unspent_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do more than one thing in a single ALTER TABLE ...
statement. Consider this:
ALTER TABLE account_utxos
ADD COLUMN output_id bytea UNIQUE NOT NULL,
ADD COLUMN unspent_id bytea UNIQUE NOT NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Addressed: 3e96e6b
) | ||
|
||
// OutputID identifies previous transaction output in transaction inputs. | ||
type OutputID Hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a case where embedding the other type would be preferable, since we want to keep all the methods.
// OutputID identifies previous transaction output in transaction inputs.
type OutputID struct{ Hash }
// UnspentID identifies and commits to unspent output.
type UnspentID struct{ Hash }
If we did that, converting from a Hash to OutputID and back would look like:
outputID = OutputID{hash}
hash = outputID.Hash
The memory layout is the same either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know about this feature. I've also tried to use outid.Bytes()
instead of tmp := outid.Hash; tmp[:]
when I need a byteslice. Is that the right way? 9042b59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also tried to use outid.Bytes() instead of tmp := outid.Hash; tmp[:] when I need a byteslice. Is that the right way?
I see nothing wrong with writing outid.Hash[:]
, but I guess when the expression outid
is actually a function call, it can't be written without a temporary variable. The Bytes
method seems reasonable to me! But I wouldn't get dogmatic about using it everywhere; if outid.Hash[:]
in one line is also an option, that's fine too IMO.
} | ||
|
||
// WriteTo writes p to w. | ||
// It assumes w has sticky errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence isn't true; it should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
o.WriteTo(w) | ||
return b.Bytes() | ||
func OutputKey(o bc.UnspentID) (bkey []byte) { | ||
// TODO(oleg): check if we no longer need this buffer writing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we don't need it. The buffer writing was just to serialize the outpoint. Since an UnspentID's internal representation is the actual bytes we want to put in the tree, we can just return them as you're doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, maybe we should inline (get rid of) this whole function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. And added convenience Bytes()->[]byte
methods to avoid creating local var all the time to do a slice. Is it a good idea? 7327600
@@ -193,7 +193,7 @@ func opIndex(vm *virtualMachine) error { | |||
return vm.pushInt64(int64(vm.inputIndex), true) | |||
} | |||
|
|||
func opOutpoint(vm *virtualMachine) error { | |||
func opOutputid(vm *virtualMachine) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: opOutputID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Actually this might just consist of replacing the one word "proposal" with "solution" or something along those lines. |
0d678f9
to
607dbe9
Compare
… return output ids yet
9042b59
to
8d24965
Compare
Replaced by #421 |
**Problem** Outpoint is a variable-length structure `<txid>:<index>` which is 33-40 bytes long (33 bytes for most transactions). It is used by transaction inputs to identify exact output in the UTXO set ("Assets Merkle Tree"). The tree leafs contain `SHA3(output)` which allows save space and requires transactions to carry redundant copies of spent outputs to perform validation (otherwise nodes would have to store the entire outputs instead of their hashes — over 2x more data, and the ratio is much bigger in protocol 2). Also, for HSM-friendliness the TXSIGHASH must contain a redundant output's hash: `SHA3(txid || input index || SHA3(output))`. **Solution:** We define two new terms: * `OutputID = SHA3(TxHash || OutputIndex)` * `UnspentID = SHA3(OutputID || SHA3(OutputCommitment))` How are these used: 1. Transaction input contains **OutputID** to identify the output being spent. This is a unique identifier of the output. 2. Transaction input uses **second serialization flag** to indicate if it contains the entire previous Output Commitment, or its hash (instead of empty place). 3. UTXO set becomes a **proper set** containing **UnspentIDs** instead of `{Outpoint -> SHA3(OutputCommitment)}`. When a node validates a transaction, it computes `UnspentID` using provided `OutputID` and previous `OutputCommitment`. If the given unspent ID is present in the UTXO set, then previous output is proved to be both authentic and available for spending. **Upsides:** 1. The outputID is constant-size and shorter: 32 bytes instead of 33-40 bytes. This simplifies merkle tree design, transaction data structure and all pieces of software that need to handle outpoints. 2. All outputs (via unspentIDs) in the transaction are randomized across the Assets Merkle Tree instead of being crammed inside a common subpath `<txid>||...`. 3. Inputs automatically commit directly to the spent outputs, so TXSIGHASH does not need to do that and can be simplified to `SHA3(txid || input index)`. HSM is able to verify which output this input commits to without having access to the entire parent transaction. 4. We keep the term _outpoint_ to mean a pair `(txid, index)`, but is internal to Chain Core to support random access to UTXOs. Validation protocol no longer uses outpoints. 5. UTXO takes 2x less RAM because it only contains unpent IDs (32 bytes) instead of a key-value pair (64+ bytes). 6. When we get to _tx entries_ design, we'll generalize the idea of OutputID to EntryID, so that any entry can have a unique identifier. **Downsides:** 1. OutputID no longer indicates the transaction ID which makes it impossible to navigate the chain of transactions without also having a mapping `outpoint -> txid:index`. UTXO tree is not enough as it's only reflecting the latest state of the chain and throws away spent outpoints. Note that in order to navigate the transactions in practice one still needs the mapping `txid -> tx`, so maintaining one more index might not be a significant increase in complexity. Chain is doing this indexing already and we keep that mapping. 2. Chain Core no longer returns (txid,position) pair for annotated txinputs (called `spent_output:{transaction_id:String,position:Int}`), but instead returns output_id (`spent_output_id:String`). To maintain full compatibility, we'd need to make an additional request to locate the previous output's txid and position, but I'm not sure any application actually relies on such historical data. For spending (locating unspents), we fully maintain compatibility with clients using (txid,position) pairs. This is a part of a package of breaking changes in P1: #239 See previous reviews: #417 Closes #421
REPLACED BY #421
Problem
Outpoint is a variable-length structure
<txid>:<index>
which is 33-40 bytes long (33 bytes for most transactions). It is used by transaction inputs to identify exact output in the UTXO set ("Assets Merkle Tree"). The tree leafs containSHA3(output)
which allows save space and requires transactions to carry redundant copies of spent outputs to perform validation (otherwise nodes would have to store the entire outputs instead of their hashes — over 2x more data, and the ratio is much bigger in protocol 2). Also, for HSM-friendliness the TXSIGHASH must contain a redundant output's hash:SHA3(txid || input index || SHA3(output))
.Solution:
We define two new terms:
OutputID = SHA3(TxHash || OutputIndex)
UnspentID = SHA3(OutputID || SHA3(OutputCommitment))
How are these used:
{Outpoint -> SHA3(OutputCommitment)}
.When a node validates a transaction, it computes
UnspentID
using providedOutputID
and previousOutputCommitment
. If the given unspent ID is present in the UTXO set, then previous output is proved to be both authentic and available for spending.Upsides:
<txid>||...
.SHA3(txid || input index)
. HSM is able to verify which output this input commits to without having access to the entire parent transaction.(txid, index)
, but is internal to Chain Core to support random access to UTXOs. Validation protocol no longer uses outpoints.Downsides:
outpoint -> txid:index
. UTXO tree is not enough as it's only reflecting the latest state of the chain and throws away spent outpoints. Note that in order to navigate the transactions in practice one still needs the mappingtxid -> tx
, so maintaining one more index might not be a significant increase in complexity. Chain is doing this indexing already and we keep that mapping.spent_output:{transaction_id:String,position:Int}
), but instead returns output_id (spent_output_id:String
). To maintain full compatibility, we'd need to make an additional request to locate the previous output's txid and position, but I'm not sure any application actually relies on such historical data. For spending (locating unspents), we fully maintain compatibility with clients using (txid,position) pairs.This is a part of a package of breaking changes in P1: #239
REPLACED BY #421