New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a getutxos command to the p2p protocol #4351

Merged
merged 1 commit into from Aug 25, 2014

Conversation

Projects
None yet
@mikehearn
Contributor

mikehearn commented Jun 16, 2014

Introduction

The getutxo command allows querying of the UTXO set given a set of of outpoints. It has a simple implementation and the results are not authenticated in any way. Despite this, there are times when it is a useful capability to have. I believe @jgarzik also has a use case for this, though I don't know what it is.

As a motivating example I present Lighthouse, an app I'm writing that implements assurance contracts:

http://blog.vinumeris.com/2014/05/17/lighthouse/

Lighthouse works by collecting pledges, which contain an invalid transaction signed with SIGHASH_ANYONECANPAY. Once sufficient pledges are collected to make the combination valid, we say the contract is complete and it can be broadcast onto the network, spending the pledged outputs. Before that occurs however, a pledge can be revoked and the pledged money redeemed by double spending the pledged output. For instance you might want to do this if it becomes clear not enough people care about the assurance contract for it to reach its goal in a timely manner, or if you simply need the money back due to some kind of cashflow crunch.

It is convenient to be able to see when a pledge has been revoked, so the user interface can be updated, and so when the final contract is created revoked pledges can be left out. For this purpose "getutxos" is used.

Protocol

The getutxos message takes a boolean which controls whether outputs in the memory pool are considered, and a vector of COutPoint structures. It returns a bitmap with the same number of bits as outputs specified rounded up to the nearest 8 bits, and then a list of CTxOut structures, one for each set bit in the bitmap. The bitmap encodes whether the UTXO was found (i.e. is indeed unspent).

Authentication

The results of getutxos is not authenticated. This is because the obvious way to do this requires the work maaku has been doing on UTXO commitments to be merged, activated by default, miners to upgrade and a forking change made to enforce their accuracy. All this is a big project that may or may not ever come to fruition.

For the Lighthouse security model however, this doesn't matter much. The reason is that the pledge transactions you're getting (which may be malicious) don't come from the P2P network. They come in the form of files either from a simple rendezvous server, or e.g. a shared folder or email attachments. The people sending these files have no way to influence the choice of peers made by the app. Once the outputs are returned, they are used to check the signatures on the pledge, thus verifying that the pledge spends the UTXO returned by the P2P network.

So we can be attacked in the following ways:

  • The pledge may be attempting to pledge non-existent outputs, but this will be detected if the majority of peers are honest.
  • The peers may be malicious and return a wrong or bogus output, but this will be detected when the signature is checked, except for the value (!) but we want to fix this by including the value into the sighash at some point anyway because we need it to make the TREZOR efficient/faster.
  • The peers may bogusly claim no such UTXO exists when it does, but this would result in the pledge being seen as invalid. When the project creator asks the pledgor why they revoked their money, and learns that in fact they never did, the bogus peers will be detected. No money is at risk as the pledges cannot be spent.
  • If the pledgor AND all the peers collaborate (i.e. the pledgor controls your internet connection) then they can make you believe you have a valid pledge when you don't. This would result in the app getting "jammed" and attempting to close an uncloseable contract. No money is at risk and the user will eventually wonder why their contract is not confirming. Once they get to a working internet connection the subterfuge will be discovered.

There is a final issue: the answer to getutxos can of course change the instant the result is generated, thus leading you to construct an uncloseable transaction if the process of revocation races with the construction. The app can detect this by watching for either a reject message, or an inv requesting one of the inputs that is supposed to be in the UTXO set (i.e. the remote peer thinks it's an orphan). This can then cause the app to re-request the UTXO set and drop the raced pledge.

In practice I do not anticipate such attacks are likely to occur, as they're complicated to pull off and it's not obvious what the gain is.

There may be other apps that wish to use getutxos, with different security models. They may find this useful despite the lack of UTXO commitments, and the fact that the answer can change a moment later, if:

  • They are connecting to a trusted peer, i.e. localhost.
  • They trust their internet connection and peer selection, i.e. because they don't believe their datacenter or ISP will commit financial fraud against them, or they are tunnelling via endpoints they trust like a randomly chosen Tor exit.
  • They are not using the response for anything important or worth attacking, like some kind of visualisation.

Upgrade

If enforced UTXO commitments are added to the block chain in future, it would make sense to rev the P2P protocol to add the proofs (merkle branches) to the response.

Testing

I attempted to write unit tests for this, but Core has no infrastructure for building test chains .... the miner_tests.cpp code does it but at the cost of not allowing any other unit test to do so, as it doesn't reset or clean up the global state afterwards! I tried to fix this and ended up down a giant rabbit hole.

So instead I've tested it with a local test app I wrote, which also exercises the client side part in bitcoinj.

BIP

If the code is ACKd then I will write a short BIP explaining the new message.

Philosophy

On IRC I have discussed this patch a little bit before. One objection that was raised is that we shouldn't add anything to the P2P protocol unless it's unattackable, because otherwise it's a sharp knife that people might use to cut themselves.

I personally disagree with this notion for the following reasons.

Firstly, many parts of the P2P protocol are not completely unattackable: malicious remote nodes can withhold broadcast transactions, invent fictional ones (you'd think they're orphans), miss out Bloom filter responses, send addr messages for IP's that were never announced, etc. We shouldn't hold new messages to a standard existing messages don't meet.

Secondly, even with UTXO commitments in the block chain, given the sick state of mining this only requires a collaboration of two people to undo, although that failure would be publicly detectable which is at least something. But anyway, there's a clean upgrade path if/when UTXO authentication becomes available.

Thirdly, we have a valid use case that's actually implemented. This isn't some academic pie in the sky project.

Finally, Bitcoin is already the sharpest knife imaginable. I don't think we should start rejecting useful features on the grounds that someone else might screw up with them.

If the above analysis ends up not holding for some reason, and people do get attacked due to the lack of authentication, then Lighthouse and other apps can always fall back to connecting to trusted nodes (perhaps over SSL). But I would like to optimistically assume success up front and see what happens, than pessimistically assume the worst and centralise things up front.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 16, 2014

Contributor

Why is there absolutely no privacy at all in this feature? You could easily search by prefix rather than being forced to always give the peer the exact outputs you are interested in. (recall how leveldb queries work re: the iterators)

Also, re: security, Lighthouse is particularly bad as lying about UTXO's - falsely claiming they don't exist/are spent when they are unspent - can certainly lead to serious exploits where clients are fooled into thinking an assurance contract is not fully funded when in fact it is over-funded, leading to large fees being paid to miners. You've not only got a potential exploit, you've got a strong financial motivation to exploit that exploit.

Contributor

petertodd commented Jun 16, 2014

Why is there absolutely no privacy at all in this feature? You could easily search by prefix rather than being forced to always give the peer the exact outputs you are interested in. (recall how leveldb queries work re: the iterators)

Also, re: security, Lighthouse is particularly bad as lying about UTXO's - falsely claiming they don't exist/are spent when they are unspent - can certainly lead to serious exploits where clients are fooled into thinking an assurance contract is not fully funded when in fact it is over-funded, leading to large fees being paid to miners. You've not only got a potential exploit, you've got a strong financial motivation to exploit that exploit.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 16, 2014

Contributor

One last thing: needs a NODE_GETUTXO service bit - having an unencrypted copy of the UTXO set is definitely a service that not all nodes can be expected to have. (recall @gmaxwell's clever suggestion of self-encrypting the UTXO set to avoid issues around storage of problematic data)

Contributor

petertodd commented Jun 16, 2014

One last thing: needs a NODE_GETUTXO service bit - having an unencrypted copy of the UTXO set is definitely a service that not all nodes can be expected to have. (recall @gmaxwell's clever suggestion of self-encrypting the UTXO set to avoid issues around storage of problematic data)

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 16, 2014

Contributor

If the app thinks a pledge is revoked it won't be included in the contract that is broadcast, so it can't lead to overpayment.

Re: encrypted UTXO set. That makes no sense. Nodes must be able to do this lookup internally to operate. Gregory's suggestion was to obfuscate the contents on disk only to avoid problems with silly AV scanners, not that the node itself can't read its own database.

There is no prefix filtering because that would complicate the implementation considerably. You are welcome to implement such an upgrade in a future patch, if you like.

Contributor

mikehearn commented Jun 16, 2014

If the app thinks a pledge is revoked it won't be included in the contract that is broadcast, so it can't lead to overpayment.

Re: encrypted UTXO set. That makes no sense. Nodes must be able to do this lookup internally to operate. Gregory's suggestion was to obfuscate the contents on disk only to avoid problems with silly AV scanners, not that the node itself can't read its own database.

There is no prefix filtering because that would complicate the implementation considerably. You are welcome to implement such an upgrade in a future patch, if you like.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 16, 2014

Contributor

If the app thinks a pledge is revoked it won't be included in the contract that is broadcast, so it can't lead to overpayment.

The attacker would of course broadcast the pledges themselves; pledges are public information.

Re: encrypted UTXO set. That makes no sense.

I was thinking in the case where privacy is implemented, but actually on second thought my complaint is invalid for this implementation as you're not returning the UTXO data associated with the UTXO.

There is no prefix filtering because that would complicate the implementation considerably. You are welcome to implement such an upgrade in a future patch, if you like.

You can query leveldb with the prefix, get a cursor in return, then just scan the cursor until the end of the prefix range. There's no good reason to create yet more infrastructure with zero privacy, and the lack of privacy makes attacking specific targets without being detected much easier.

Contributor

petertodd commented Jun 16, 2014

If the app thinks a pledge is revoked it won't be included in the contract that is broadcast, so it can't lead to overpayment.

The attacker would of course broadcast the pledges themselves; pledges are public information.

Re: encrypted UTXO set. That makes no sense.

I was thinking in the case where privacy is implemented, but actually on second thought my complaint is invalid for this implementation as you're not returning the UTXO data associated with the UTXO.

There is no prefix filtering because that would complicate the implementation considerably. You are welcome to implement such an upgrade in a future patch, if you like.

You can query leveldb with the prefix, get a cursor in return, then just scan the cursor until the end of the prefix range. There's no good reason to create yet more infrastructure with zero privacy, and the lack of privacy makes attacking specific targets without being detected much easier.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 16, 2014

Contributor

Pledges are not public. You're making assumptions about the design without understanding it.

Your second statement is nonsensical. The code does return "the UTXO data associated with the UTXO", what else would it do?

Your third statement is something I already know: I am the one who implemented LevelDB for Bitcoin. My point stands. This patch is what it is. If you'd like it to be better feel free to contribute code to make it so.

Contributor

mikehearn commented Jun 16, 2014

Pledges are not public. You're making assumptions about the design without understanding it.

Your second statement is nonsensical. The code does return "the UTXO data associated with the UTXO", what else would it do?

Your third statement is something I already know: I am the one who implemented LevelDB for Bitcoin. My point stands. This patch is what it is. If you'd like it to be better feel free to contribute code to make it so.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 16, 2014

Contributor

Pledges are not public. You're making assumptions about the design without understanding it.

Either pledges are public information and can be attacked, or they are not and some single user is running the crowdfund, (the project owner) in which case the overhead of just using existing systems is not a big deal. In particular, in the "single project owner design" all pledges can easily be added to a single bloom filter and the chain scanned to keep the state of spent/unspent up to date at the same low cost as keeping an SPV wallet up-to-date. (remember that users are going to be pledging specific amounts and/or using a specific wallet for their pledges, so in the vast majority of cases you'll need to create a transaction output for that pledge, which means the bloom filter behavior is identical to that of a standard SPV wallet)

Relying on pledges not being public information for your security is a rather risky design with difficult to predict consequences. Easy to imagine, for instance, a user publishing a list of pledge amounts showing the progress of their campaign, and that list being used for the attack. (trick project owner into publishing an invalid tx with an input spent, record signatures, then make it look like other pledges are now spent and get more pledges) Even a "multiple utxo's is one pledge" design can easily fall to an attacker who just guesses what UTXO's are probably part of the pledge based on amounts pledged. Again, all attacks that are much more difficult to pull off if the app isn't giving away exact info on what transaction outputs it's looking for. (although UTXO anonymity does suffer from the inherent problem that UTXO size grows indefinitely, so your k-anonymity set is much weaker than it looks as many entries can be ignored due to old age - another argument for the bloom filter alternative)

Your second statement is nonsensical. The code does return "the UTXO data associated with the UTXO", what else would it do?

It can return only a spent/unspent bit. What's the use-case for requiring more than that? It's easy to foresee this encouraging applications to abuse the UTXO set as a database. (back to the policy question: do we really want to redefine NODE_NETWORK as being able to provide UTXO data to peers as well?)

Contributor

petertodd commented Jun 16, 2014

Pledges are not public. You're making assumptions about the design without understanding it.

Either pledges are public information and can be attacked, or they are not and some single user is running the crowdfund, (the project owner) in which case the overhead of just using existing systems is not a big deal. In particular, in the "single project owner design" all pledges can easily be added to a single bloom filter and the chain scanned to keep the state of spent/unspent up to date at the same low cost as keeping an SPV wallet up-to-date. (remember that users are going to be pledging specific amounts and/or using a specific wallet for their pledges, so in the vast majority of cases you'll need to create a transaction output for that pledge, which means the bloom filter behavior is identical to that of a standard SPV wallet)

Relying on pledges not being public information for your security is a rather risky design with difficult to predict consequences. Easy to imagine, for instance, a user publishing a list of pledge amounts showing the progress of their campaign, and that list being used for the attack. (trick project owner into publishing an invalid tx with an input spent, record signatures, then make it look like other pledges are now spent and get more pledges) Even a "multiple utxo's is one pledge" design can easily fall to an attacker who just guesses what UTXO's are probably part of the pledge based on amounts pledged. Again, all attacks that are much more difficult to pull off if the app isn't giving away exact info on what transaction outputs it's looking for. (although UTXO anonymity does suffer from the inherent problem that UTXO size grows indefinitely, so your k-anonymity set is much weaker than it looks as many entries can be ignored due to old age - another argument for the bloom filter alternative)

Your second statement is nonsensical. The code does return "the UTXO data associated with the UTXO", what else would it do?

It can return only a spent/unspent bit. What's the use-case for requiring more than that? It's easy to foresee this encouraging applications to abuse the UTXO set as a database. (back to the policy question: do we really want to redefine NODE_NETWORK as being able to provide UTXO data to peers as well?)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

Looks OK to me, implementation-wise. Talking of testing, if you cannot integrate testing into the unit tester suite for some reason I think at least some Python script should be included in qa/... to be able to test this functionality. Such a test script could create a node, import a bootstrap file, and launch getutxos queries at it.

I think we are at the point that we need to define an extension mechanism - whether that's done with a NODE_* bit or some other way doesn't matter. That would encourage experimentation, and I think this is an excellent use-case for such. No need to force all NODE_NETWORK nodes above a certain version to provide a specific query service. Alternative implementations (obelisk, btcd) may or not want to implement this, and may want to experiment with their own extensions. Then the bootstrapping network needs a way to find only nodes that provide a certain extension. Let's not repeat the bloom debacle.

Member

laanwj commented Jun 17, 2014

Looks OK to me, implementation-wise. Talking of testing, if you cannot integrate testing into the unit tester suite for some reason I think at least some Python script should be included in qa/... to be able to test this functionality. Such a test script could create a node, import a bootstrap file, and launch getutxos queries at it.

I think we are at the point that we need to define an extension mechanism - whether that's done with a NODE_* bit or some other way doesn't matter. That would encourage experimentation, and I think this is an excellent use-case for such. No need to force all NODE_NETWORK nodes above a certain version to provide a specific query service. Alternative implementations (obelisk, btcd) may or not want to implement this, and may want to experiment with their own extensions. Then the bootstrapping network needs a way to find only nodes that provide a certain extension. Let's not repeat the bloom debacle.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 17, 2014

Contributor

@laanwj I suggested awhile back we use a simple bitmask:

x0000000000000001.testnet-seed-mask.bitcoin.petertodd.org

Returning all seeds with at least NODE_NETWORK set. There's most likely to be a relatively small number of combinations people use, so DNS caching will still work fine. (though I'm no DNS expert) Of course, as always relying heavily on seeds is foolish, so just setting up app-specific seeds probably makes more sense in many cases and lets the authors of those apps implement whatever feature tests they need to ensure they're serving up seeds that actually support the features required by the apps. (e.g. right now if there ever exist NODE_BLOOM-using nodes on the network and someone does a bloom IO attack against the nodes returned by the seeds, maliciously or by accident, you'll easily wind up with only NODE_BLOOM-supporting nodes being returned, breaking anything relying on bloom filters)

Also, as an aside it'd be reasonable to set aside a few service bits for experimental usage, with an understanding that occasional conflicts will be inevitable. In my replace-by-fee implementation that uses preferential peering to let replace-by-fee nodes find each other quickly I have:

// Reserve 24-31 for temporary experiments
NODE_REPLACE_BY_FEE = (1 << 26)

https://github.com/petertodd/bitcoin/blob/f789d6d569063fb92d1ca6d941cc29034a7f19ef/src/protocol.h#L66

Contributor

petertodd commented Jun 17, 2014

@laanwj I suggested awhile back we use a simple bitmask:

x0000000000000001.testnet-seed-mask.bitcoin.petertodd.org

Returning all seeds with at least NODE_NETWORK set. There's most likely to be a relatively small number of combinations people use, so DNS caching will still work fine. (though I'm no DNS expert) Of course, as always relying heavily on seeds is foolish, so just setting up app-specific seeds probably makes more sense in many cases and lets the authors of those apps implement whatever feature tests they need to ensure they're serving up seeds that actually support the features required by the apps. (e.g. right now if there ever exist NODE_BLOOM-using nodes on the network and someone does a bloom IO attack against the nodes returned by the seeds, maliciously or by accident, you'll easily wind up with only NODE_BLOOM-supporting nodes being returned, breaking anything relying on bloom filters)

Also, as an aside it'd be reasonable to set aside a few service bits for experimental usage, with an understanding that occasional conflicts will be inevitable. In my replace-by-fee implementation that uses preferential peering to let replace-by-fee nodes find each other quickly I have:

// Reserve 24-31 for temporary experiments
NODE_REPLACE_BY_FEE = (1 << 26)

https://github.com/petertodd/bitcoin/blob/f789d6d569063fb92d1ca6d941cc29034a7f19ef/src/protocol.h#L66

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

@petertodd The problem with service bits is that there is only a very limited number of them. It would, IMO, be better to have a string namespace defining extensions. A new version of the network protocol could add a command that returns a list of strings defining the supported extensions and maybe even per-extension versions. It's still very simple and conflicts could be much more easily avoided.

Member

laanwj commented Jun 17, 2014

@petertodd The problem with service bits is that there is only a very limited number of them. It would, IMO, be better to have a string namespace defining extensions. A new version of the network protocol could add a command that returns a list of strings defining the supported extensions and maybe even per-extension versions. It's still very simple and conflicts could be much more easily avoided.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 17, 2014

Contributor

@laanwj Well we've got 64 of them, 56 if you reserve some for experiments; I don't see us using up that many all that soon. A string namespace thing can be added in the future for sure, but I just don't see the short-term, or even medium-term, need. After all, NODE_BLOOM was AFAIK the first fully fleshed out proposal to even use a single service bit, with the closest runner up being @sipa's thoughts on pruning.

That said, strings, and especially UUIDs, (ugh) would definitely reduce the politics around them.

Contributor

petertodd commented Jun 17, 2014

@laanwj Well we've got 64 of them, 56 if you reserve some for experiments; I don't see us using up that many all that soon. A string namespace thing can be added in the future for sure, but I just don't see the short-term, or even medium-term, need. After all, NODE_BLOOM was AFAIK the first fully fleshed out proposal to even use a single service bit, with the closest runner up being @sipa's thoughts on pruning.

That said, strings, and especially UUIDs, (ugh) would definitely reduce the politics around them.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

It's not about fear of running out but about reducing the need for central coordination. Anyhow, let's stop hijacking this thread.
Using a service bit in this case is fine with me.

Member

laanwj commented Jun 17, 2014

It's not about fear of running out but about reducing the need for central coordination. Anyhow, let's stop hijacking this thread.
Using a service bit in this case is fine with me.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 17, 2014

Contributor

@laanwj Agreed.

Contributor

petertodd commented Jun 17, 2014

@laanwj Agreed.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

I don't think the attack you have in mind works.

Let's assume that pledges are public for a moment, e.g. because the user chooses to publish them or collect them in a way that inherently makes them public, like people attaching them to forum posts. I don't fully get what attack you have in mind, but I think you're saying if you can control the internet connection of the fundraiser for an extended period of time, you could ensure they don't close the contract as early as possible and continue to solicit pledges. Then Dick Dasterdly steps in, takes all the pledges and steals the excess by working with a corrupt miner.

But this attack makes no sense. If the pledges are public any of the legitimate pledgors can also observe the contracts state and close it. The attacker has no special privileges. Unless you control the internet connection of all of them simultaneously and permanently, the attack cannot work: legitimate users will stop pledging once they see it's reached the goal and then either close it themselves, or ask the owner via a secure channel why they aren't doing so.

What you're talking about is only an issue if the pledges are NOT public, but the attacker is able to obtain them all anyway, AND control the users internet connection so they do not believe the contract is closeable AND they continue to solicit funds and raise money. Given that Lighthouse includes an integrated Tor client and can therefore tunnel through your control anyway, I don't think this is a realistic scenario.

It can return only a spent/unspent bit. What's the use-case for requiring more than that?

It's explained in the nice document I wrote above, please read it! Then you would actually know what data is returned and why.

@laanwj As far as I know only Peter thinks Bloom filtering was a debacle: if we had added a service bit, all nodes on the network would set it except for one or two that don't follow the protocol properly anyway, so who knows what they would do. If a node doesn't wish to support this simple command they can just not increase their protocol version past 90000. If they do, it should be only a few lines of code to add. As you note, using a service bit means implementations can be found but there are only a handful of them to go around, and not using a service bit means some entirely new mechanism must be designed which is way out of scope for this patch.

Re; qa tests, as far as I can tell they are only capable of covering the JSON-RPC interface. We don't seem to have any testing infrastructure for doing P2P tests except for the pull tester. I could try adding some code to that, but that code is maintained in the bitcoinj repository.

Contributor

mikehearn commented Jun 17, 2014

I don't think the attack you have in mind works.

Let's assume that pledges are public for a moment, e.g. because the user chooses to publish them or collect them in a way that inherently makes them public, like people attaching them to forum posts. I don't fully get what attack you have in mind, but I think you're saying if you can control the internet connection of the fundraiser for an extended period of time, you could ensure they don't close the contract as early as possible and continue to solicit pledges. Then Dick Dasterdly steps in, takes all the pledges and steals the excess by working with a corrupt miner.

But this attack makes no sense. If the pledges are public any of the legitimate pledgors can also observe the contracts state and close it. The attacker has no special privileges. Unless you control the internet connection of all of them simultaneously and permanently, the attack cannot work: legitimate users will stop pledging once they see it's reached the goal and then either close it themselves, or ask the owner via a secure channel why they aren't doing so.

What you're talking about is only an issue if the pledges are NOT public, but the attacker is able to obtain them all anyway, AND control the users internet connection so they do not believe the contract is closeable AND they continue to solicit funds and raise money. Given that Lighthouse includes an integrated Tor client and can therefore tunnel through your control anyway, I don't think this is a realistic scenario.

It can return only a spent/unspent bit. What's the use-case for requiring more than that?

It's explained in the nice document I wrote above, please read it! Then you would actually know what data is returned and why.

@laanwj As far as I know only Peter thinks Bloom filtering was a debacle: if we had added a service bit, all nodes on the network would set it except for one or two that don't follow the protocol properly anyway, so who knows what they would do. If a node doesn't wish to support this simple command they can just not increase their protocol version past 90000. If they do, it should be only a few lines of code to add. As you note, using a service bit means implementations can be found but there are only a handful of them to go around, and not using a service bit means some entirely new mechanism must be designed which is way out of scope for this patch.

Re; qa tests, as far as I can tell they are only capable of covering the JSON-RPC interface. We don't seem to have any testing infrastructure for doing P2P tests except for the pull tester. I could try adding some code to that, but that code is maintained in the bitcoinj repository.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

@mikehearn Binding features to version numbers assumes a linear, centralized progression. It means that everyone that implements A also needs to implement B even though they are unrelated. I don't think this is desirable anymore.

And as said above, using a service bit is fine with me. I do think we need another mechanism for signalling extensions to the protocol in the future, but for now we're stuck with that.

Member

laanwj commented Jun 17, 2014

@mikehearn Binding features to version numbers assumes a linear, centralized progression. It means that everyone that implements A also needs to implement B even though they are unrelated. I don't think this is desirable anymore.

And as said above, using a service bit is fine with me. I do think we need another mechanism for signalling extensions to the protocol in the future, but for now we're stuck with that.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

OK, I can add a service bit, although AFAIK nobody actually has any code that searches out nodes with a particular bit? I'm not sure Core does and bitcoinj definitely doesn't. But that can be resolved later.

The question of optionality in standards is one with a long history, by the way. The IETF has a guide on the topic here:

http://tools.ietf.org/html/rfc2360#section-2.10

Deciding when to make features optional is tricky; when PNG was designed, there were (iirc) debates over whether gz compression should be optional or mandatory. The feeling was that if it were optional, at least a few implementations would be lazy and skip it, then in order to ensure that their images rendered everywhere PNG creators would always avoid using it, thus making even more implementations not bother, and in the end PNGs would just end up bigger for no good reason: just because a few minority implementors didn't want to write a bit of extra code. So they made gz compression mandatory. Another feature that GIF had (animation) was made optional and put into a separate MNG format (later another attempt, APNG). Needless to say, the situation they feared did happen and animated images on the web today are all GIFs.

So I will add a service bit, even though this feature is so trivial everyone could implement it. If it were larger and represented a much bigger cost, I'd be much keener on the idea. As is, I advise caution - simply making every feature from now on optional is not necessarily good design. The tradeoffs must be carefully balanced.

Contributor

mikehearn commented Jun 17, 2014

OK, I can add a service bit, although AFAIK nobody actually has any code that searches out nodes with a particular bit? I'm not sure Core does and bitcoinj definitely doesn't. But that can be resolved later.

The question of optionality in standards is one with a long history, by the way. The IETF has a guide on the topic here:

http://tools.ietf.org/html/rfc2360#section-2.10

Deciding when to make features optional is tricky; when PNG was designed, there were (iirc) debates over whether gz compression should be optional or mandatory. The feeling was that if it were optional, at least a few implementations would be lazy and skip it, then in order to ensure that their images rendered everywhere PNG creators would always avoid using it, thus making even more implementations not bother, and in the end PNGs would just end up bigger for no good reason: just because a few minority implementors didn't want to write a bit of extra code. So they made gz compression mandatory. Another feature that GIF had (animation) was made optional and put into a separate MNG format (later another attempt, APNG). Needless to say, the situation they feared did happen and animated images on the web today are all GIFs.

So I will add a service bit, even though this feature is so trivial everyone could implement it. If it were larger and represented a much bigger cost, I'd be much keener on the idea. As is, I advise caution - simply making every feature from now on optional is not necessarily good design. The tradeoffs must be carefully balanced.

@maaku

This comment has been minimized.

Show comment
Hide comment
@maaku

maaku Jun 17, 2014

Contributor

NODE_NETWORK is a hack. It is conflating two things: storing the whole block chain, and storing the current UTXO set. These are orthogonal things. I think there should be a service bit here, but the meaning is not constrained to just a 'getutxos' call. NODE_NETWORK should be split into NODE_ARCHIVAL and NODE_UTXOSET, with the latter eventually indicating presence of other things as well, such as a future p2p message that returns utxo proofs.

Contributor

maaku commented Jun 17, 2014

NODE_NETWORK is a hack. It is conflating two things: storing the whole block chain, and storing the current UTXO set. These are orthogonal things. I think there should be a service bit here, but the meaning is not constrained to just a 'getutxos' call. NODE_NETWORK should be split into NODE_ARCHIVAL and NODE_UTXOSET, with the latter eventually indicating presence of other things as well, such as a future p2p message that returns utxo proofs.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

I agree it would be preferable for everyone to agree and do the same thing, but that makes progress incredibly difficult. From my (maybe over-cynical) view of the bitcoin community that means that nothing new ever happens. There's always some reason not to agree with a change, it could be some perceived risk, disagreement on the feature set or how the interface should look, or even paranoid fantasies.

Having optional features could mean the difference between something like this, which is useful but not absolutely perfect, being merged, or nothing being done at all. So I also advice caution on trying to push it to the entire network with a version bump.

Member

laanwj commented Jun 17, 2014

I agree it would be preferable for everyone to agree and do the same thing, but that makes progress incredibly difficult. From my (maybe over-cynical) view of the bitcoin community that means that nothing new ever happens. There's always some reason not to agree with a change, it could be some perceived risk, disagreement on the feature set or how the interface should look, or even paranoid fantasies.

Having optional features could mean the difference between something like this, which is useful but not absolutely perfect, being merged, or nothing being done at all. So I also advice caution on trying to push it to the entire network with a version bump.

@maaku

This comment has been minimized.

Show comment
Hide comment
@maaku

maaku Jun 17, 2014

Contributor

@laanwj how else do you indicate presence of this one particular p2p message except by version bump? That's what the version field is for.

Contributor

maaku commented Jun 17, 2014

@laanwj how else do you indicate presence of this one particular p2p message except by version bump? That's what the version field is for.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

Ah, you're right, that's why software projects have maintainers instead of requiring universal agreement from whoever shows up :) There will always be people who disagree or want something better (but don't want to do the work). Sometimes those disagreements will make sense, and other times they will be bike shedding.

If we look at projects like the kernel, it's successful partly because Linus lets debates run for a while, he develops opinions and then if things aren't going anywhere he steps in and moves things forward. Bitcoin has worked the same way in the past with @gavinandresen doing that, and I hope we will retain good project leadership going forward.

Gavin, what are your thoughts on protocol extensibility / optionality? As it seems nobody has problems with the code in this patch itself.

Contributor

mikehearn commented Jun 17, 2014

Ah, you're right, that's why software projects have maintainers instead of requiring universal agreement from whoever shows up :) There will always be people who disagree or want something better (but don't want to do the work). Sometimes those disagreements will make sense, and other times they will be bike shedding.

If we look at projects like the kernel, it's successful partly because Linus lets debates run for a while, he develops opinions and then if things aren't going anywhere he steps in and moves things forward. Bitcoin has worked the same way in the past with @gavinandresen doing that, and I hope we will retain good project leadership going forward.

Gavin, what are your thoughts on protocol extensibility / optionality? As it seems nobody has problems with the code in this patch itself.

@maaku

This comment has been minimized.

Show comment
Hide comment
@maaku

maaku Jun 17, 2014

Contributor

@mikehearn it would be better imho if the return value included the height and hash of the best block. That would help you figure out what is going on when you get different answers from peers, and parallels the information returned by a future getutxos2 that returns merkle proofs.

Contributor

maaku commented Jun 17, 2014

@mikehearn it would be better imho if the return value included the height and hash of the best block. That would help you figure out what is going on when you get different answers from peers, and parallels the information returned by a future getutxos2 that returns merkle proofs.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

Good idea! I'll implement that this afternoon or tomorrow.

Contributor

mikehearn commented Jun 17, 2014

Good idea! I'll implement that this afternoon or tomorrow.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 17, 2014

Contributor

@mikehearn Sybil attacking the Bitcoin really isn't all that hard; I really hope Lighthouse doesn't blindly trust the DNS seeds like so much other bitcoinj code does. re: having getutxos return actual UTXO's vs. spent/unspent, I see nothing in the design of Lighthouse that prevents pledges from containing the transactions required to prove the UTXO data. Also, last I talked to Gregory Maxwell about the issue he had strong opinions that NODE_BLOOM was the right idea - he did after all ask me to implement it. Warren Togami also is in that camp. (and asked me to re-base the patch and submit the BIP)

Contributor

petertodd commented Jun 17, 2014

@mikehearn Sybil attacking the Bitcoin really isn't all that hard; I really hope Lighthouse doesn't blindly trust the DNS seeds like so much other bitcoinj code does. re: having getutxos return actual UTXO's vs. spent/unspent, I see nothing in the design of Lighthouse that prevents pledges from containing the transactions required to prove the UTXO data. Also, last I talked to Gregory Maxwell about the issue he had strong opinions that NODE_BLOOM was the right idea - he did after all ask me to implement it. Warren Togami also is in that camp. (and asked me to re-base the patch and submit the BIP)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

@mikehearn It may have been that way in the past, but Bitcoin Core is not the only node implementation anymore. Don't confuse leadership over this project with leadership over the global P2P network, which has various other actors as well now.

Edit: another concrete advantage of an optional-feature approach is that features can be disabled again if they either prove to be not so useful for what they were imagined for, or the implementation causes problems, or a later extension provides a better alternative. Locking it to >= a protocol version means every version in the future is expected to implement it.

Member

laanwj commented Jun 17, 2014

@mikehearn It may have been that way in the past, but Bitcoin Core is not the only node implementation anymore. Don't confuse leadership over this project with leadership over the global P2P network, which has various other actors as well now.

Edit: another concrete advantage of an optional-feature approach is that features can be disabled again if they either prove to be not so useful for what they were imagined for, or the implementation causes problems, or a later extension provides a better alternative. Locking it to >= a protocol version means every version in the future is expected to implement it.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

@laanwj New version numbers can mean anything, including "feature X is no longer supported". So I don't think we need service bits for that.

Contributor

mikehearn commented Jun 17, 2014

@laanwj New version numbers can mean anything, including "feature X is no longer supported". So I don't think we need service bits for that.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 17, 2014

Member

We've talked about it, and I'm sure you're aware of my opinion already, but
I'll still repeat it here to offer for wider discussion.

I do not believe we should encourage users of the p2p protocol to rely on
unverifiable data. Anyone using 'getutxos' is almost certainly not
maintaining a UTXO set already, and thus not doing chain verification, so
not able to check that whatever a peer claims as respond to getutxos is
remotely meaningful. As opposed to other data SPV clients use, this does
not even require faking PoW.

Yes, there are other parts of the protocol that are largely unverified.
Addr messages for new peers, the height at startup, requesting the mempool
contents, ... But those are either being deprecated (like the height at
startup), or have infrastructure in place to minimize the impact of false
data. In contrast, I do not see any use of getutxos where the result can
be verified; if you're verifying, you don't need it. To the extent
possible, Bitcoin works as zero-trust as possible, and I believe improving
upon that should be a goal of the core protocol.

Of course, that does not mean that the ability to request UTXO data is
useless. I just don't believe it should be part of the core protocol.

I think the problem is that in some cases, there are very good reasons to
connect to a particular (set of) full node, and trusting its responses. For
example, when you have different Bitcoin-related services running within a
(local and trusted) network, connected to the outside world using a
bitcoind "gateway". In this case, you are using bitcoind as a service for
your system, rather than as a pure p2p node.

So far, we have separated service providing done through RPC than through
P2P. This makes often sense, but is not standardized, is not very
efficient, and is inconvenient when most of the data you need is already
done through P2P.

My proposal would therefore be to add "trusted extensions" to the P2P
protocol. They would only be available to trusted clients of a full node
(through IP masking, different listening port, maybe host authentication at
some point, ...). I've seen several use cases for these:

  • When a local network wallet rebroadcasts transactions, you want the
    gateway to rebroadcast as well. Default current behavior is to only relay
    the first time you see a transaction.
  • You want local clients to bypass rate limiting systems, without
    triggering DoS banning (currently done for localhost, whuch is broken for
    Tor).
  • Some functionality really is only available when you have a trusted
    bitcoind. Mempool acceptance checking for detecting conflicting wallet
    transactions is one, getutxos is another. Mechanisms for these could be
    available but only to trusted clients.

This may be controversial (and probably needs a separate
mailinglist/issue), as it could all be done through a out of band separate
non-P2P protocol, or just RPC.

Comments?

Member

sipa commented Jun 17, 2014

We've talked about it, and I'm sure you're aware of my opinion already, but
I'll still repeat it here to offer for wider discussion.

I do not believe we should encourage users of the p2p protocol to rely on
unverifiable data. Anyone using 'getutxos' is almost certainly not
maintaining a UTXO set already, and thus not doing chain verification, so
not able to check that whatever a peer claims as respond to getutxos is
remotely meaningful. As opposed to other data SPV clients use, this does
not even require faking PoW.

Yes, there are other parts of the protocol that are largely unverified.
Addr messages for new peers, the height at startup, requesting the mempool
contents, ... But those are either being deprecated (like the height at
startup), or have infrastructure in place to minimize the impact of false
data. In contrast, I do not see any use of getutxos where the result can
be verified; if you're verifying, you don't need it. To the extent
possible, Bitcoin works as zero-trust as possible, and I believe improving
upon that should be a goal of the core protocol.

Of course, that does not mean that the ability to request UTXO data is
useless. I just don't believe it should be part of the core protocol.

I think the problem is that in some cases, there are very good reasons to
connect to a particular (set of) full node, and trusting its responses. For
example, when you have different Bitcoin-related services running within a
(local and trusted) network, connected to the outside world using a
bitcoind "gateway". In this case, you are using bitcoind as a service for
your system, rather than as a pure p2p node.

So far, we have separated service providing done through RPC than through
P2P. This makes often sense, but is not standardized, is not very
efficient, and is inconvenient when most of the data you need is already
done through P2P.

My proposal would therefore be to add "trusted extensions" to the P2P
protocol. They would only be available to trusted clients of a full node
(through IP masking, different listening port, maybe host authentication at
some point, ...). I've seen several use cases for these:

  • When a local network wallet rebroadcasts transactions, you want the
    gateway to rebroadcast as well. Default current behavior is to only relay
    the first time you see a transaction.
  • You want local clients to bypass rate limiting systems, without
    triggering DoS banning (currently done for localhost, whuch is broken for
    Tor).
  • Some functionality really is only available when you have a trusted
    bitcoind. Mempool acceptance checking for detecting conflicting wallet
    transactions is one, getutxos is another. Mechanisms for these could be
    available but only to trusted clients.

This may be controversial (and probably needs a separate
mailinglist/issue), as it could all be done through a out of band separate
non-P2P protocol, or just RPC.

Comments?

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

I think I put all my comments on that in the original writeup. Yes, in the ideal world everything would be perfect and authenticated by ghash.io ;) However we do not live in such a world and are dragging ourselves towards it one step at a time.

BTW, on height in version message being "deprecated", that's the first I've heard of this. SPV clients use it. If someone wants to deprecate that they're welcome to update all the clients that require it. But let's discuss that in a separate thread.

Contributor

mikehearn commented Jun 17, 2014

I think I put all my comments on that in the original writeup. Yes, in the ideal world everything would be perfect and authenticated by ghash.io ;) However we do not live in such a world and are dragging ourselves towards it one step at a time.

BTW, on height in version message being "deprecated", that's the first I've heard of this. SPV clients use it. If someone wants to deprecate that they're welcome to update all the clients that require it. But let's discuss that in a separate thread.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 17, 2014

Member

Oh, not in the protocol. I just mean that full nodes don't use it at all
anymore. I wish it didn't exist in the first place, but it's too
unimportant to bother changing in the protocol.

Member

sipa commented Jun 17, 2014

Oh, not in the protocol. I just mean that full nodes don't use it at all
anymore. I wish it didn't exist in the first place, but it's too
unimportant to bother changing in the protocol.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

Ah, OK.

Contributor

mikehearn commented Jun 17, 2014

Ah, OK.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 17, 2014

Contributor

@sipa +1

There's nothing wrong with trust. We'd like everything to be decentralized, but we don't live in a perfect world so occasionally we introduce trust to solve problems that we don't have decentralized solutions for yet. We did that in the payment protocol because we had no other way to authenticate payments; we should be doing that in UTXO lookup, because we have no other way to authenticate UTXO's. (yet)

We also have a responsibility to design systems that naturally lead to safe implementations that are robust against attack. This patch is anything but that on multiple levels - even little details like how it gives you 100% unauthenticated UTXO data rather than just a "known/unknown" response encourage inexperienced programmers to take dangerous shortcuts, like relying on your untrusted peer(s) for utxo data corresponding to a tx rather than at least getting actual proof via the tx and its merkle path to the block header. (PoW proof) Equally the presence of vulnerable targets encourages attackers to sybil attack the Bitcoin network to exploit those targets - we don't want to encourage that.

That said, I'm not convinced we need to add trusted extensions to the Bitcoin Core P2P protocol; that functionality already exists in the form of Electrum among others. UTXO's can be looked up easily, you can authenticate the identity of the server you're talking to via SSL, and it is already used for that purpose by a few applications. (e.g. the SPV colored coin client ChromaWallet) A client implementation is simple(1) and Electrum supports things like merkle paths where possible to reduce the trust in the server(s) to a minimum. Why reinvent the wheel?

  1. https://github.com/bitcoinx/ngcccbase/blob/master/ngcccbase/services/electrum.py
Contributor

petertodd commented Jun 17, 2014

@sipa +1

There's nothing wrong with trust. We'd like everything to be decentralized, but we don't live in a perfect world so occasionally we introduce trust to solve problems that we don't have decentralized solutions for yet. We did that in the payment protocol because we had no other way to authenticate payments; we should be doing that in UTXO lookup, because we have no other way to authenticate UTXO's. (yet)

We also have a responsibility to design systems that naturally lead to safe implementations that are robust against attack. This patch is anything but that on multiple levels - even little details like how it gives you 100% unauthenticated UTXO data rather than just a "known/unknown" response encourage inexperienced programmers to take dangerous shortcuts, like relying on your untrusted peer(s) for utxo data corresponding to a tx rather than at least getting actual proof via the tx and its merkle path to the block header. (PoW proof) Equally the presence of vulnerable targets encourages attackers to sybil attack the Bitcoin network to exploit those targets - we don't want to encourage that.

That said, I'm not convinced we need to add trusted extensions to the Bitcoin Core P2P protocol; that functionality already exists in the form of Electrum among others. UTXO's can be looked up easily, you can authenticate the identity of the server you're talking to via SSL, and it is already used for that purpose by a few applications. (e.g. the SPV colored coin client ChromaWallet) A client implementation is simple(1) and Electrum supports things like merkle paths where possible to reduce the trust in the server(s) to a minimum. Why reinvent the wheel?

  1. https://github.com/bitcoinx/ngcccbase/blob/master/ngcccbase/services/electrum.py
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 17, 2014

Member

@sipa Agreed - getutxos is in the same category of 'information queries from trusted node' as the mempool check for unconfirmed/conflicted transactions that an external wallet could use.

Regarding the height in version messages: yes, nodes have lied about this, resulting in 'funny' information in the UI so we don't use it anymore, not even behind a median filter. See #4065.

Member

laanwj commented Jun 17, 2014

@sipa Agreed - getutxos is in the same category of 'information queries from trusted node' as the mempool check for unconfirmed/conflicted transactions that an external wallet could use.

Regarding the height in version messages: yes, nodes have lied about this, resulting in 'funny' information in the UI so we don't use it anymore, not even behind a median filter. See #4065.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 17, 2014

Contributor

I have a section in the commit message about philosophy for a reason - this discussion is now firmly in the realm of the philosophical.

There have been cases in the past few years where people loudly proclaimed that something should not be done because of $ATTACK or $CONCERN, then we did it, and so far things worked out OK. A good example of this was SPV clients in general, a few people said:

  • Nodes will silently drop transactions just because they can, so every client should use a trusted server.
  • People will DoS the network by requesting lots of blocks just because they can, so Bloom filters should be disabled by default (litecoin did this)
  • People will sybil the network and make clients believe in non-existent mempool transactions, so everyone should just use a trusted server.

In fact none of these things have happened, the concerns were overblown. Will they happen in future? Maybe! But also maybe not. So far we benefited tremendously: pushing SPV forward was the right call.

When any change is proposed it's natural and human to immediately come up with as many objections as possible. The reason is, if we object now and something does go wrong, we can make ourselves look smart and wise by saying "told you so". But if we object and nothing goes wrong, people usually forget about it and move on. This gives a huge incentive to consider only risks and not benefits, it gives huge incentives to try and shoot things down. Sometimes people call this stop energy, a term coined by Dave Winer: http://www.userland.com/whatIsStopEnergy - I see it here. Nobody above is talking about the considerable benefits of a fully decentralised assurance contract app. Instead people are focusing only on costs, costs like "maybe someone will do something dumb", which is always a concern with Bitcoin.

Now there are two possible outcomes here:

  1. Although I have explained why various attacks are not a concern above, let's say my analysis is wrong somehow and someone finds a way to exploit the lack of block chain authentication on getutxos and causes problems for my users. Let's also say that other ways to fix the problem, like using Tor and cross-checking nodes don't work. In that case I will have to fall back to using a set of trusted nodes instead and people can say "told you so". I'm sure they will enjoy it.

  2. In fact the concerns are overblown and nobody mounts successful attacks, either because it's too hard, or because there's no benefit, or because by the time someone finds a way to do it and decides they want to the world has moved on and e.g. we have UTXO commitments or simply Bitcoin assurance contracts are irrelevant for some reason.

In the latter case, it's a repeat of Bloom filtering so far - we will have benefited! More decentralisation! More simplicity!

The argument being made here is, let's just assume failure and skip straight to the centralised trust based solution. Or more subtly, let's set up a hypothetical straw developer who we assume does something dumb, and use that as a reason to not add features.

I have a different idea - let's add this feature and see what happens. Maybe it turns out to be useless and people get attacked too much in practice, in which case it would fall out of use and in future could be removed from the protocol with another version bump. Or maybe it works out OK, eventually gets extended to contain UTXO proofs despite the lack of real-world attacks, and the story has a happy ending.

Contributor

mikehearn commented Jun 17, 2014

I have a section in the commit message about philosophy for a reason - this discussion is now firmly in the realm of the philosophical.

There have been cases in the past few years where people loudly proclaimed that something should not be done because of $ATTACK or $CONCERN, then we did it, and so far things worked out OK. A good example of this was SPV clients in general, a few people said:

  • Nodes will silently drop transactions just because they can, so every client should use a trusted server.
  • People will DoS the network by requesting lots of blocks just because they can, so Bloom filters should be disabled by default (litecoin did this)
  • People will sybil the network and make clients believe in non-existent mempool transactions, so everyone should just use a trusted server.

In fact none of these things have happened, the concerns were overblown. Will they happen in future? Maybe! But also maybe not. So far we benefited tremendously: pushing SPV forward was the right call.

When any change is proposed it's natural and human to immediately come up with as many objections as possible. The reason is, if we object now and something does go wrong, we can make ourselves look smart and wise by saying "told you so". But if we object and nothing goes wrong, people usually forget about it and move on. This gives a huge incentive to consider only risks and not benefits, it gives huge incentives to try and shoot things down. Sometimes people call this stop energy, a term coined by Dave Winer: http://www.userland.com/whatIsStopEnergy - I see it here. Nobody above is talking about the considerable benefits of a fully decentralised assurance contract app. Instead people are focusing only on costs, costs like "maybe someone will do something dumb", which is always a concern with Bitcoin.

Now there are two possible outcomes here:

  1. Although I have explained why various attacks are not a concern above, let's say my analysis is wrong somehow and someone finds a way to exploit the lack of block chain authentication on getutxos and causes problems for my users. Let's also say that other ways to fix the problem, like using Tor and cross-checking nodes don't work. In that case I will have to fall back to using a set of trusted nodes instead and people can say "told you so". I'm sure they will enjoy it.

  2. In fact the concerns are overblown and nobody mounts successful attacks, either because it's too hard, or because there's no benefit, or because by the time someone finds a way to do it and decides they want to the world has moved on and e.g. we have UTXO commitments or simply Bitcoin assurance contracts are irrelevant for some reason.

In the latter case, it's a repeat of Bloom filtering so far - we will have benefited! More decentralisation! More simplicity!

The argument being made here is, let's just assume failure and skip straight to the centralised trust based solution. Or more subtly, let's set up a hypothetical straw developer who we assume does something dumb, and use that as a reason to not add features.

I have a different idea - let's add this feature and see what happens. Maybe it turns out to be useless and people get attacked too much in practice, in which case it would fall out of use and in future could be removed from the protocol with another version bump. Or maybe it works out OK, eventually gets extended to contain UTXO proofs despite the lack of real-world attacks, and the story has a happy ending.

@gavinandresen

This comment has been minimized.

Show comment
Hide comment
@gavinandresen

gavinandresen Jun 17, 2014

Contributor

Sorry @sipa, I agree with Mike-- lets add this feature.

RE: service bits versus version numbers: In my experience, APIs/protocols fail when they wimp out and make lots of things optional. It becomes impossible to test the 2^N combinations of N optional features once N is greater than... oh, two.

The unspent transaction output set is something every 'full' node should know, so I see no reason to do a service bit over bumping the version number.

RE: fears that lazy programmers will Architect In Bad Ways: "better is better." Letting SPV clients query the state of a full node's UTXO set is useful functionality. And simple is generally more secure than complex.

Contributor

gavinandresen commented Jun 17, 2014

Sorry @sipa, I agree with Mike-- lets add this feature.

RE: service bits versus version numbers: In my experience, APIs/protocols fail when they wimp out and make lots of things optional. It becomes impossible to test the 2^N combinations of N optional features once N is greater than... oh, two.

The unspent transaction output set is something every 'full' node should know, so I see no reason to do a service bit over bumping the version number.

RE: fears that lazy programmers will Architect In Bad Ways: "better is better." Letting SPV clients query the state of a full node's UTXO set is useful functionality. And simple is generally more secure than complex.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 17, 2014

Member

RE: fears that lazy programmers will Architect In Bad Ways: "better is
better." Letting SPV clients query the state of a full node's UTXO set is
useful functionality. And simple is generally more secure than complex.

I don't disagree at all that it is useful. I even gave an extra use case
for it (mempool conflict checking).

I just want to not make the distinction between the p2p system and services
offered by full nodes fuzzier.

Getutxos is not costly (in the current way of implementation) and I'm not
particularly worried about DoS attacks that could result from it. I'm
worried about providing a service that the ecosystem grows to rely upon,
making it harder to change implementations (gmaxwell's idea of provable
deniability of chainstate data through encryption utxos is a nice example).

If you're going to use data that needs trusting a full node, fine. Let's
just make sure people actually trust it.

Member

sipa commented Jun 17, 2014

RE: fears that lazy programmers will Architect In Bad Ways: "better is
better." Letting SPV clients query the state of a full node's UTXO set is
useful functionality. And simple is generally more secure than complex.

I don't disagree at all that it is useful. I even gave an extra use case
for it (mempool conflict checking).

I just want to not make the distinction between the p2p system and services
offered by full nodes fuzzier.

Getutxos is not costly (in the current way of implementation) and I'm not
particularly worried about DoS attacks that could result from it. I'm
worried about providing a service that the ecosystem grows to rely upon,
making it harder to change implementations (gmaxwell's idea of provable
deniability of chainstate data through encryption utxos is a nice example).

If you're going to use data that needs trusting a full node, fine. Let's
just make sure people actually trust it.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Jun 17, 2014

Member

This doesn't appear incompatible with the txout set encryption. The idea there is to key the utxo set with some hash of the txid:vout and encrypt the data with some different hash of the txid:vout, thus the node itself does not have the data needed to decrypt the txout until the moment its needed. Since this would provide the txid's it would still work even if it returned the data... though for the motivation of the encrypted txouts we might prefer to not receive the txid until strictly needed, and instead do query by hash for this kind of spendability.

That said, I consider serving additional unauthenticated data strongly inadvisable. It risks incentivizing sibyl attacks against the network and we've already seen people (apparently) trying to attack miners in the past with nodes lying about the time— so these kinds of attack are not just theoretical. We should be moving in the opposite direction in the core protocol, not making it worse. And if we do provide facilities which are not necessary for the basic operation of the system they should be behind service flags so we have the freedom to abandon them later without instantly breaking any node that calls them.

Trusted services are already offered by electrum nodes— which have authenticated and encrypted connections and a curated node database which may prevent sybil attacks, at the expense of a more centralized dependency— which should be acceptable here, since the argument was that the data doesn't need to be authenticated at all. Why can't this use the existing electrum infrastructure for quasi-trusted wallet data?

Is the mempool bool really the right design? ISTM that nodes that want to know if the txout is confirmed or mempool is going to need to query all of them twice.

Member

gmaxwell commented Jun 17, 2014

This doesn't appear incompatible with the txout set encryption. The idea there is to key the utxo set with some hash of the txid:vout and encrypt the data with some different hash of the txid:vout, thus the node itself does not have the data needed to decrypt the txout until the moment its needed. Since this would provide the txid's it would still work even if it returned the data... though for the motivation of the encrypted txouts we might prefer to not receive the txid until strictly needed, and instead do query by hash for this kind of spendability.

That said, I consider serving additional unauthenticated data strongly inadvisable. It risks incentivizing sibyl attacks against the network and we've already seen people (apparently) trying to attack miners in the past with nodes lying about the time— so these kinds of attack are not just theoretical. We should be moving in the opposite direction in the core protocol, not making it worse. And if we do provide facilities which are not necessary for the basic operation of the system they should be behind service flags so we have the freedom to abandon them later without instantly breaking any node that calls them.

Trusted services are already offered by electrum nodes— which have authenticated and encrypted connections and a curated node database which may prevent sybil attacks, at the expense of a more centralized dependency— which should be acceptable here, since the argument was that the data doesn't need to be authenticated at all. Why can't this use the existing electrum infrastructure for quasi-trusted wallet data?

Is the mempool bool really the right design? ISTM that nodes that want to know if the txout is confirmed or mempool is going to need to query all of them twice.

@maaku

This comment has been minimized.

Show comment
Hide comment
@maaku

maaku Jun 17, 2014

Contributor

My +1 goes to both @sipa and @mikehearn on this. This is a trusted call, and we are giving people enough rope to shoot themselves in the foot. Ideally stuff like this should be disabled by default and/or placed behind a special authenticated connection. But that is a separate issue necessitating a separate pull request -- this gets my ACK once best block hash & height is addd.

When I first heard about this at the conference I thought this was crazy -- we need to be implementing trustless mechanisms for these things! If there is a usecase now, then let that drive development on, e.g. UTXO commitments, and let's do this the right way. I still feel that way generally. However in this particular case I am more than willing to make an exception: Lighthouse will be transformative to bitcoin development, and is exactly the ideal platform for crowdfunding work on trustless tech. So I'm okay with merging this now, and reaping the benefits for bitcoin while also working on all the improvements mentioned.

The UTXO commitments I'm working on are currently developer-time limited. There's a working Python implementation and complete test vectors, as well as an optimized commitment mechanism (#3977, which needs a rebase), and C++ serialization code for the proofs. All we need is the C++ code for updating proofs, and the LevelDB backend. So if there are other bitcoind hackers out there interested in doing this The Right Way, contact me. However it requires a soft-fork, so rolling it out necessitates some degree of developer consensus, community eduction, and miner voting process (or the beneficence of ghash.io...), all of which together requires as much as a year if past experience is a judge. Lighthouse, on the other hand, can do good for bitcoin crowdfudned development right now.

@gavinandresen It is only temporarily the case that the full UTXO set is something every full node needs to know. With either TxO or UTxO commitments it becomes possible to prepend spentness proofs to block and transaction propagation messages, at which point nodes are free to drop (portions of) the UTXO set. There is consensus we are heading in a direction which enables this, just not consensus over TxO vs UTxO and the exact details of the data structure.

Contributor

maaku commented Jun 17, 2014

My +1 goes to both @sipa and @mikehearn on this. This is a trusted call, and we are giving people enough rope to shoot themselves in the foot. Ideally stuff like this should be disabled by default and/or placed behind a special authenticated connection. But that is a separate issue necessitating a separate pull request -- this gets my ACK once best block hash & height is addd.

When I first heard about this at the conference I thought this was crazy -- we need to be implementing trustless mechanisms for these things! If there is a usecase now, then let that drive development on, e.g. UTXO commitments, and let's do this the right way. I still feel that way generally. However in this particular case I am more than willing to make an exception: Lighthouse will be transformative to bitcoin development, and is exactly the ideal platform for crowdfunding work on trustless tech. So I'm okay with merging this now, and reaping the benefits for bitcoin while also working on all the improvements mentioned.

The UTXO commitments I'm working on are currently developer-time limited. There's a working Python implementation and complete test vectors, as well as an optimized commitment mechanism (#3977, which needs a rebase), and C++ serialization code for the proofs. All we need is the C++ code for updating proofs, and the LevelDB backend. So if there are other bitcoind hackers out there interested in doing this The Right Way, contact me. However it requires a soft-fork, so rolling it out necessitates some degree of developer consensus, community eduction, and miner voting process (or the beneficence of ghash.io...), all of which together requires as much as a year if past experience is a judge. Lighthouse, on the other hand, can do good for bitcoin crowdfudned development right now.

@gavinandresen It is only temporarily the case that the full UTXO set is something every full node needs to know. With either TxO or UTxO commitments it becomes possible to prepend spentness proofs to block and transaction propagation messages, at which point nodes are free to drop (portions of) the UTXO set. There is consensus we are heading in a direction which enables this, just not consensus over TxO vs UTxO and the exact details of the data structure.

@jgarzik

This comment has been minimized.

Show comment
Hide comment
@jgarzik

jgarzik Jun 17, 2014

Contributor

<vendor hat: on>
This duplicates multiple other open source projects such as Insight, which provides the same queries and more: https://github.com/bitpay/insight-api

Running Insight is trivial for anyone running bitcoind. Anyone not running bitcoind can probably ask or find someone trusted who is already running such a server.

I'm just not seeing a driving use case here [that is not already filled by existing software]. You don't have hordes asking for this feature; and if people are asking for this feature, it is easy to point them to an existing project that can roll this out instantly.

(Because, remember, you cannot start using this functionality even if you merge the PR today)

Contributor

jgarzik commented Jun 17, 2014

<vendor hat: on>
This duplicates multiple other open source projects such as Insight, which provides the same queries and more: https://github.com/bitpay/insight-api

Running Insight is trivial for anyone running bitcoind. Anyone not running bitcoind can probably ask or find someone trusted who is already running such a server.

I'm just not seeing a driving use case here [that is not already filled by existing software]. You don't have hordes asking for this feature; and if people are asking for this feature, it is easy to point them to an existing project that can roll this out instantly.

(Because, remember, you cannot start using this functionality even if you merge the PR today)

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 18, 2014

Member

@jgarzik Using insight for this seems overkill, as it needs no extra indexes - bitcoind has the required information ready. If this would have been a request to add a private RPC call instead of a public P2P network message, IMO it would have been an easy ACK.

@sipa

I don't disagree at all that it is useful. I even gave an extra use case for it (mempool conflict checking). I just want to not make the distinction between the p2p system and services offered by full nodes fuzzier.

@gmaxwell

We should be moving in the opposite direction in the core protocol, not making it worse. And if we do provide facilities which are not necessary for the basic operation of the system they should be behind service flags so we have the freedom to abandon them later without instantly breaking any node that calls them.

Right, that was my idea as well. It can be useful but if you want to offer these kind of 'courtesy' services, they should be separate - either behind a service bit or such or a separate network like Electrum. Not mandatory extensions that the Bitcoin P2P network is stuck with forever.
(as you say, can't remove features you have introduced with a version: all existing software will interpret >=version as 'has this and this feature' no matter what is specified later).

As @maaku says, NODE_NETWORK already is a hack that conflates different 'services'. I'd prefer to make this part of a new service and add a NODE_QUERIES bit or such.

Member

laanwj commented Jun 18, 2014

@jgarzik Using insight for this seems overkill, as it needs no extra indexes - bitcoind has the required information ready. If this would have been a request to add a private RPC call instead of a public P2P network message, IMO it would have been an easy ACK.

@sipa

I don't disagree at all that it is useful. I even gave an extra use case for it (mempool conflict checking). I just want to not make the distinction between the p2p system and services offered by full nodes fuzzier.

@gmaxwell

We should be moving in the opposite direction in the core protocol, not making it worse. And if we do provide facilities which are not necessary for the basic operation of the system they should be behind service flags so we have the freedom to abandon them later without instantly breaking any node that calls them.

Right, that was my idea as well. It can be useful but if you want to offer these kind of 'courtesy' services, they should be separate - either behind a service bit or such or a separate network like Electrum. Not mandatory extensions that the Bitcoin P2P network is stuck with forever.
(as you say, can't remove features you have introduced with a version: all existing software will interpret >=version as 'has this and this feature' no matter what is specified later).

As @maaku says, NODE_NETWORK already is a hack that conflates different 'services'. I'd prefer to make this part of a new service and add a NODE_QUERIES bit or such.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Jun 18, 2014

Contributor

It does not make sense to run a full blown block explorer for something that all bitcoind's can serve without any additional indexes.

@Drak There is no danger to people's money. Have you read my original post or the analysis above? Even if you disagree for some reason, you just .... don't run an app that uses the message.

@maaku Thanks, it's nice to see someone weigh the potential benefits too. I'll try and make some time to test/play with your UTXO commitments work in future.

I do not believe adding such messages is "going in the wrong direction". Telling people to pick some trusted people and trust them is the wrong direction. This is the same reasoning by which we'd have no SPV clients at all because some theoretical attack exists, so we should all just use Electrum and presumably if one or two Electrum server operators turn out to be corrupt, we ask governments to regulate them? Or remove them from a magic list and wait until they come back under another identity? Am I supposed to ask Electrum operators to submit to an exchange-style passport examination? Or just assume because I met the guy in a bar once and he seemed OK he must be trustworthy? I don't think "use Electrum" is quite the silver bullet it's being made out to be. Actually I suspect that randomly picking a few nodes out of 8000 (especially via a Tor circuit using one of their ~1000 available exits) and cross checking their answers stands just as good a chance of being robust.

Additionally, let's be realistic about what UTXO commitment authentication in this context really means: we have something close to a miner cartel today. If we assume that nodes are untrustworthy and it's the proofs that matter, then those proofs could be mis-generated by a grand total of two or three people; hardly a quantum leap in trustlessness. UTXO proofs make sense on the assumption we can fix the mining market though, so we should still optimistically do them.

Now, look, there's genuine and reasonable disagreement here. But we should all be willing to consider the idea that we're wrong. I faced that possibility above: if attacks happen, then I will have to rewrite things to use some trusted nodes (probably bitcoin p2p tunnelled over SSL to nodes either run by myself, or by people I know). So I know what my plan is. But if you're against this feature, what if you are wrong? What if the attacks don't happen? Then you have made a real Bitcoin app less decentralised, and set a powerful precedent that nobody should propose new features for Bitcoin unless every imaginable risk can be mitigated, including risks borne by developers who haven't come along yet. This practically ensures nobody ever tries to make the protocol better. That's a big cost we should take very seriously.

Anyway, I'm encouraged enough by support from Gavin and Mark that I'll go ahead and add the height/block hash to the message. I've already added the service bit so I think that meets what @laanwj wants. When the code is merged I'll write the BIP.

Contributor

mikehearn commented Jun 18, 2014

It does not make sense to run a full blown block explorer for something that all bitcoind's can serve without any additional indexes.

@Drak There is no danger to people's money. Have you read my original post or the analysis above? Even if you disagree for some reason, you just .... don't run an app that uses the message.

@maaku Thanks, it's nice to see someone weigh the potential benefits too. I'll try and make some time to test/play with your UTXO commitments work in future.

I do not believe adding such messages is "going in the wrong direction". Telling people to pick some trusted people and trust them is the wrong direction. This is the same reasoning by which we'd have no SPV clients at all because some theoretical attack exists, so we should all just use Electrum and presumably if one or two Electrum server operators turn out to be corrupt, we ask governments to regulate them? Or remove them from a magic list and wait until they come back under another identity? Am I supposed to ask Electrum operators to submit to an exchange-style passport examination? Or just assume because I met the guy in a bar once and he seemed OK he must be trustworthy? I don't think "use Electrum" is quite the silver bullet it's being made out to be. Actually I suspect that randomly picking a few nodes out of 8000 (especially via a Tor circuit using one of their ~1000 available exits) and cross checking their answers stands just as good a chance of being robust.

Additionally, let's be realistic about what UTXO commitment authentication in this context really means: we have something close to a miner cartel today. If we assume that nodes are untrustworthy and it's the proofs that matter, then those proofs could be mis-generated by a grand total of two or three people; hardly a quantum leap in trustlessness. UTXO proofs make sense on the assumption we can fix the mining market though, so we should still optimistically do them.

Now, look, there's genuine and reasonable disagreement here. But we should all be willing to consider the idea that we're wrong. I faced that possibility above: if attacks happen, then I will have to rewrite things to use some trusted nodes (probably bitcoin p2p tunnelled over SSL to nodes either run by myself, or by people I know). So I know what my plan is. But if you're against this feature, what if you are wrong? What if the attacks don't happen? Then you have made a real Bitcoin app less decentralised, and set a powerful precedent that nobody should propose new features for Bitcoin unless every imaginable risk can be mitigated, including risks borne by developers who haven't come along yet. This practically ensures nobody ever tries to make the protocol better. That's a big cost we should take very seriously.

Anyway, I'm encouraged enough by support from Gavin and Mark that I'll go ahead and add the height/block hash to the message. I've already added the service bit so I think that meets what @laanwj wants. When the code is merged I'll write the BIP.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Jun 18, 2014

Contributor

@sipa made a great point above in that getutxos is fundamentally even worse than bloom filters in terms of trust because there is absolutely no security at all. At least CMerkleBlock's have strong assurances that a given transaction exists even if there's no current way to know if you've seen all transactions. (which incidentally is something we can even reasonably good assurance for via random sampling without a soft-fork) Similarly it's knowledge that only gets better every time you connect to an honest node, even after connecting to a dishonest one - the set of transactions you know about and block headers you know about can only be added too.

getutxos doesn't have any of that assurance. There's no proof what-so-ever and there's no way to reconcile conflicting responses. You can handwave and say you'll cross check answers, but that's assuming you even have a set of nodes to randomly pick from - you don't. Fundamentally your root of trust in this design is the DNS seeds you started to learn nodes from. Compromise those nodes and your SPV will never learn about an honest node and your screwed. But unlike Electrum because there is no authentication anywhere in the P2P network not only are you trusting that root of trust, you're trusting your ISP, you're trusting Tor exit node operators, etc. It gets even worse when you remember that "compromising" the DNS seeds can mean nothing more than just running a few dozen fast nodes and DoS attacking the other nodes so yours end up at the top of the DNS seed lists.

The whole rant about "What do you do if an Electrum node operator is dishonest?" is particularly bizarre: obviously you update the software to stop using that node. It'll happen once in a while, it'll be detected, and you have a very clear and simple procedure to follow if it does. That's why Electrum itself has a config screen that lets you pick what node(s) you want to trust. (and cross-checks them too) Just like Tor it relies on the fact that the operators of the service are well known. (incidentally, over half the Tor bandwidth is provided by people/organizations the Tor project knows, and that's a number that's watched carefully)

Why are we wasting our time on this very insecure design with no privacy? Electrum already exists, use it in the short term. As for the long term @maaku or someone else will finish a proper authenticated solution and we can add a secure NODE_GETUTXO service. Heck, the saddest part of all this is the use-case initially presented - Lighthouse - simply doesn't need NODE_GETUTXO and would work perfectly well using bloom filters as I explained above.

The success of Bitcoin's decentralization is based on cryptographic proof, not blindly trusting strangers.

Contributor

petertodd commented Jun 18, 2014

@sipa made a great point above in that getutxos is fundamentally even worse than bloom filters in terms of trust because there is absolutely no security at all. At least CMerkleBlock's have strong assurances that a given transaction exists even if there's no current way to know if you've seen all transactions. (which incidentally is something we can even reasonably good assurance for via random sampling without a soft-fork) Similarly it's knowledge that only gets better every time you connect to an honest node, even after connecting to a dishonest one - the set of transactions you know about and block headers you know about can only be added too.

getutxos doesn't have any of that assurance. There's no proof what-so-ever and there's no way to reconcile conflicting responses. You can handwave and say you'll cross check answers, but that's assuming you even have a set of nodes to randomly pick from - you don't. Fundamentally your root of trust in this design is the DNS seeds you started to learn nodes from. Compromise those nodes and your SPV will never learn about an honest node and your screwed. But unlike Electrum because there is no authentication anywhere in the P2P network not only are you trusting that root of trust, you're trusting your ISP, you're trusting Tor exit node operators, etc. It gets even worse when you remember that "compromising" the DNS seeds can mean nothing more than just running a few dozen fast nodes and DoS attacking the other nodes so yours end up at the top of the DNS seed lists.

The whole rant about "What do you do if an Electrum node operator is dishonest?" is particularly bizarre: obviously you update the software to stop using that node. It'll happen once in a while, it'll be detected, and you have a very clear and simple procedure to follow if it does. That's why Electrum itself has a config screen that lets you pick what node(s) you want to trust. (and cross-checks them too) Just like Tor it relies on the fact that the operators of the service are well known. (incidentally, over half the Tor bandwidth is provided by people/organizations the Tor project knows, and that's a number that's watched carefully)

Why are we wasting our time on this very insecure design with no privacy? Electrum already exists, use it in the short term. As for the long term @maaku or someone else will finish a proper authenticated solution and we can add a secure NODE_GETUTXO service. Heck, the saddest part of all this is the use-case initially presented - Lighthouse - simply doesn't need NODE_GETUTXO and would work perfectly well using bloom filters as I explained above.

The success of Bitcoin's decentralization is based on cryptographic proof, not blindly trusting strangers.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 18, 2014

Member

Talking of Electrum: At some point in a mailing list discussion the author of Electrum was also interested in UTXO queries. Though the exact semantics were different: a query of transactions by output/address instead of txout point (see https://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg04744.html). That would make it possible to implement Electrum server on top of bitcoind - although in that case it doesn't matter whether the calls gets added to RPC or 'trusted extensions' on P2P.

Member

laanwj commented Jun 18, 2014

Talking of Electrum: At some point in a mailing list discussion the author of Electrum was also interested in UTXO queries. Though the exact semantics were different: a query of transactions by output/address instead of txout point (see https://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg04744.html). That would make it possible to implement Electrum server on top of bitcoind - although in that case it doesn't matter whether the calls gets added to RPC or 'trusted extensions' on P2P.

@jgarzik

This comment has been minimized.

Show comment
Hide comment
@jgarzik

jgarzik Aug 26, 2014

Contributor

IRC report:
<dhill> so getutxos makes bitcoind send mesages larger than the max 32MB

Contributor

jgarzik commented Aug 26, 2014

IRC report:
<dhill> so getutxos makes bitcoind send mesages larger than the max 32MB

@dajohi

This comment has been minimized.

Show comment
Hide comment
@dajohi

dajohi Aug 26, 2014

Contributor

ReadMessage: message payload is too large - header indicates 2892934254 bytes, but max message payload is 33554432 bytes.

Just some thoughts:

  1. bitcoind should not attempt to send messages that exceed the max message payload (currently 32MB)
  2. bitcoind should ignore duplicate outpoints in a getutxos request.
  3. perhaps using MAX_INV_SZ (50000) is too high for a getutxos request limit. Perhaps it needs its own define which is much smaller.

I produced this by using a small script and btcwire testnet tx bd1f9401a9c284a04353f925276af62f23f452d297eb2cc582d037064b2a795f, and getutxo s requesting outpoint 1 ... 50,000 times (the limit).

Contributor

dajohi commented Aug 26, 2014

ReadMessage: message payload is too large - header indicates 2892934254 bytes, but max message payload is 33554432 bytes.

Just some thoughts:

  1. bitcoind should not attempt to send messages that exceed the max message payload (currently 32MB)
  2. bitcoind should ignore duplicate outpoints in a getutxos request.
  3. perhaps using MAX_INV_SZ (50000) is too high for a getutxos request limit. Perhaps it needs its own define which is much smaller.

I produced this by using a small script and btcwire testnet tx bd1f9401a9c284a04353f925276af62f23f452d297eb2cc582d037064b2a795f, and getutxo s requesting outpoint 1 ... 50,000 times (the limit).

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Aug 27, 2014

Member

I would have hoped this kind of testing was done in all the time that this was still a pull request.
But it's clear that there are still too many gotchas, going to revert.

Member

laanwj commented Aug 27, 2014

I would have hoped this kind of testing was done in all the time that this was still a pull request.
But it's clear that there are still too many gotchas, going to revert.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

MAX_INV_SZ is used for getdata as well, which returns entire transactions, and that doesn't seem to remove duplicate requests either. There's a manual check against the size of the send buffer on the getdata code path which would be fairly easy to duplicate here, but this sort of network code is easy to duplicate.

At any rate, it should be an easy fix. @laanwj Why don't you wait for me to fix it instead? It's not a big surprise that things in master get more testing than things that are not.

Contributor

mikehearn commented Aug 27, 2014

MAX_INV_SZ is used for getdata as well, which returns entire transactions, and that doesn't seem to remove duplicate requests either. There's a manual check against the size of the send buffer on the getdata code path which would be fairly easy to duplicate here, but this sort of network code is easy to duplicate.

At any rate, it should be an easy fix. @laanwj Why don't you wait for me to fix it instead? It's not a big surprise that things in master get more testing than things that are not.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Aug 27, 2014

Member

getdata returns the results as individual tx/block messages, which are
throttled based on space in network send buffers, so the same problem does
not exist there.

Member

sipa commented Aug 27, 2014

getdata returns the results as individual tx/block messages, which are
throttled based on space in network send buffers, so the same problem does
not exist there.

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Aug 27, 2014

Member

@mikehearn @laanwj Better to revert this. Clearly this PR needs a lot of work and testing first.

Member

btcdrak commented Aug 27, 2014

@mikehearn @laanwj Better to revert this. Clearly this PR needs a lot of work and testing first.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

Yes, that's what I said.

As CCoin is a fixed size structure and the size of the bitmap is equal to the size of the inputs, just keeping track of the space required to send and stopping if there'd be insufficient space is good enough. We could remove duplicates too but it doesn't seem necessary if there's a simple counter.

With respect to reverting things - no, reverting any change that someone finds a bug in is not a sane strategy. If we were about to do a major release it'd be different, but then we shouldn't be merging features at all. Applied consistently this strategy would have resulted in every major change to Bitcoin being rejected. Recent example: using this patch I discovered a serious regression in re-org handling that was the result of (I think) headers first. Rolling back all of headers first would have been a mistake for all kinds of reasons, not least of which is that I wasn't intending to find that bug and wouldn't have done so if the buggy code had still been sitting on a branch.

Contributor

mikehearn commented Aug 27, 2014

Yes, that's what I said.

As CCoin is a fixed size structure and the size of the bitmap is equal to the size of the inputs, just keeping track of the space required to send and stopping if there'd be insufficient space is good enough. We could remove duplicates too but it doesn't seem necessary if there's a simple counter.

With respect to reverting things - no, reverting any change that someone finds a bug in is not a sane strategy. If we were about to do a major release it'd be different, but then we shouldn't be merging features at all. Applied consistently this strategy would have resulted in every major change to Bitcoin being rejected. Recent example: using this patch I discovered a serious regression in re-org handling that was the result of (I think) headers first. Rolling back all of headers first would have been a mistake for all kinds of reasons, not least of which is that I wasn't intending to find that bug and wouldn't have done so if the buggy code had still been sitting on a branch.

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Aug 27, 2014

Member

@mikehearn For a new feature like this, recently merged (just yesterday!), a revert is absolutely the right course of action. It means if the feature is fixed and merged in again there is one clean merge, one unit. It makes the history much more understandable. From what I can see, we also need more discussion about the rationale of this PR. Clearly there has been a lot of 'hand waving' and not enough research, which when someone actually did, uncovered some pretty nasty issues. Throwing caution to the wind is not right for Bitcoin Core. I still don't see why this isn't done over RPC personally.

Looking at history, other PRs have been reverted pending further investigation, I don't see why this PR needs special treatment.

Member

btcdrak commented Aug 27, 2014

@mikehearn For a new feature like this, recently merged (just yesterday!), a revert is absolutely the right course of action. It means if the feature is fixed and merged in again there is one clean merge, one unit. It makes the history much more understandable. From what I can see, we also need more discussion about the rationale of this PR. Clearly there has been a lot of 'hand waving' and not enough research, which when someone actually did, uncovered some pretty nasty issues. Throwing caution to the wind is not right for Bitcoin Core. I still don't see why this isn't done over RPC personally.

Looking at history, other PRs have been reverted pending further investigation, I don't see why this PR needs special treatment.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Aug 27, 2014

Member

Keeping track of the output size constructed would indeed remove the
problem. But then what to do to inform the peer? Split the result in two?
Truncate it? DoS ban the peer? At least some thought is needed.

I would prefer just to only return spentness. No matter what, that data is
not authenticated, and can't be by the setup envisioned here. Adding the
full txout data just complicates things (and yes, makes it a bit more
expensive to lie, but your system doesn't require peers to be honest,
right?). Most concerns about potential incentive shifts and DoS potential
are gone that way.

Of course, I would still prefer us not to provide unauthenticated access to
UTXO information at all. If there is no way to prevent peers lying, either
you don't care about the truth, or you're better off using a (set of)
central servers, either trusted or with reputation to lose.

Member

sipa commented Aug 27, 2014

Keeping track of the output size constructed would indeed remove the
problem. But then what to do to inform the peer? Split the result in two?
Truncate it? DoS ban the peer? At least some thought is needed.

I would prefer just to only return spentness. No matter what, that data is
not authenticated, and can't be by the setup envisioned here. Adding the
full txout data just complicates things (and yes, makes it a bit more
expensive to lie, but your system doesn't require peers to be honest,
right?). Most concerns about potential incentive shifts and DoS potential
are gone that way.

Of course, I would still prefer us not to provide unauthenticated access to
UTXO information at all. If there is no way to prevent peers lying, either
you don't care about the truth, or you're better off using a (set of)
central servers, either trusted or with reputation to lose.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

It can just be truncated. The result bitmap is supposed to be the same length as the input query. If it's not then you know the result has been truncated. I don't think any normal client would ever hit this case anyway. Not complicated.

Can we please stop going over the rationale again and again and again? @btcdrak the RPC stack doesn't even try to handle resource usage attacks so your suggestion would make things worse rather than better.

@sipa The patch is implemented in this way for a reason. My app runs the scripts to try and ensure you can sign for the output you are pledging. Because a pledge is not a valid transaction there is no other way to test this: you cannot broadcast it and see what happens (and proxying transactions would just allow clients to get you banned anyway in the current architecture). The assumption is that your network peers (randomly chosen) are not the same as the people sending you pledges. This is a very realistic and reasonable assumption.

Both of the above things have been explained multiple times over the past few months including in the description of the patch. I really don't know what I can do here to make things clearer. Is there a problem with my writing style or something? I get the overwhelming impression this entire community of people comes to strong opinions on my work yet does not read a word I've written, and it's unbelievably frustrating.

I will add some more DoS controls to this feature, although we should all remember that Bitcoin Core can be DoSd in lots of different ways - this is hardly changing the status quo.

Contributor

mikehearn commented Aug 27, 2014

It can just be truncated. The result bitmap is supposed to be the same length as the input query. If it's not then you know the result has been truncated. I don't think any normal client would ever hit this case anyway. Not complicated.

Can we please stop going over the rationale again and again and again? @btcdrak the RPC stack doesn't even try to handle resource usage attacks so your suggestion would make things worse rather than better.

@sipa The patch is implemented in this way for a reason. My app runs the scripts to try and ensure you can sign for the output you are pledging. Because a pledge is not a valid transaction there is no other way to test this: you cannot broadcast it and see what happens (and proxying transactions would just allow clients to get you banned anyway in the current architecture). The assumption is that your network peers (randomly chosen) are not the same as the people sending you pledges. This is a very realistic and reasonable assumption.

Both of the above things have been explained multiple times over the past few months including in the description of the patch. I really don't know what I can do here to make things clearer. Is there a problem with my writing style or something? I get the overwhelming impression this entire community of people comes to strong opinions on my work yet does not read a word I've written, and it's unbelievably frustrating.

I will add some more DoS controls to this feature, although we should all remember that Bitcoin Core can be DoSd in lots of different ways - this is hardly changing the status quo.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Aug 27, 2014

Contributor

@mikehearn re: "not reading a word I've written", you're doing the exact same thing: @btcdrak made clear above that he believed you should be running a full node: "I can't see a valid reason for not running a full node when you need access to UTXO whatever your project is."; his suggestion of using RPC is for local nodes where resource usage attacks are irrelevant.

In any case, stop trying to deflect technical criticisms with responses based on personal disputes and personal attacks.

Additionally remember that when there exists a technical concern about a proposal of local and global DoS attacks and economics/resource usage issues, submitting a patch with a fairly obvious DoS attack in it is a strong indication that the submitter hasn't thought through the consequences. We don't have the time to consider in detail every single pull-req, and for that matter, fix all the bugs in them, so it's only reasonable that finding such an issue be considered a strong indication that the patch should be rejected/reverted for now.

I will add some more DoS controls to this feature, although we should all remember that Bitcoin Core can be DoSd in lots of different ways - this is hardly changing the status quo.

It's adding to a yet unsolved problem. Don't be surprised if people are reluctant to dig the hole deeper when we don't know if we're ever going to find a ladder out.

Contributor

petertodd commented Aug 27, 2014

@mikehearn re: "not reading a word I've written", you're doing the exact same thing: @btcdrak made clear above that he believed you should be running a full node: "I can't see a valid reason for not running a full node when you need access to UTXO whatever your project is."; his suggestion of using RPC is for local nodes where resource usage attacks are irrelevant.

In any case, stop trying to deflect technical criticisms with responses based on personal disputes and personal attacks.

Additionally remember that when there exists a technical concern about a proposal of local and global DoS attacks and economics/resource usage issues, submitting a patch with a fairly obvious DoS attack in it is a strong indication that the submitter hasn't thought through the consequences. We don't have the time to consider in detail every single pull-req, and for that matter, fix all the bugs in them, so it's only reasonable that finding such an issue be considered a strong indication that the patch should be rejected/reverted for now.

I will add some more DoS controls to this feature, although we should all remember that Bitcoin Core can be DoSd in lots of different ways - this is hardly changing the status quo.

It's adding to a yet unsolved problem. Don't be surprised if people are reluctant to dig the hole deeper when we don't know if we're ever going to find a ladder out.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

There's a fix here:

#4770

Contributor

mikehearn commented Aug 27, 2014

There's a fix here:

#4770

@genjix

This comment has been minimized.

Show comment
Hide comment
@genjix

genjix Aug 27, 2014

wow what a stupid change, all the more reason why we can't have a single group making unilateral decisions on one codebase. I had never heard of this. I think you guys need to stop trying to throw all this crap into the Bitcoin protocol, and focus on keeping it small + focused. Bloated software overextends itself introducing security flaws through new attack surfaces.

genjix commented Aug 27, 2014

wow what a stupid change, all the more reason why we can't have a single group making unilateral decisions on one codebase. I had never heard of this. I think you guys need to stop trying to throw all this crap into the Bitcoin protocol, and focus on keeping it small + focused. Bloated software overextends itself introducing security flaws through new attack surfaces.

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

I actually can't make my bitcoind crash even when firing many such bogus queries in parallel, even without the fix. Memory usage does go up, and if you don't have much RAM that could be a problem, but it drops down again immediately after.

With the extra ten lines to track bytes used the code is clearly better, but it's way overkill to go running to reddit and claim it's an "easy way to crash bitcoind". Heck our networking code doesn't even have a limit to aim for. Bitcoind could run out of RAM on some systems just by handling a lot of clients, or if there was a lot of transactions in the mempool.

Contributor

mikehearn commented Aug 27, 2014

I actually can't make my bitcoind crash even when firing many such bogus queries in parallel, even without the fix. Memory usage does go up, and if you don't have much RAM that could be a problem, but it drops down again immediately after.

With the extra ten lines to track bytes used the code is clearly better, but it's way overkill to go running to reddit and claim it's an "easy way to crash bitcoind". Heck our networking code doesn't even have a limit to aim for. Bitcoind could run out of RAM on some systems just by handling a lot of clients, or if there was a lot of transactions in the mempool.

@petertodd

This comment has been minimized.

Show comment
Hide comment
@petertodd

petertodd Aug 27, 2014

Contributor

Lots of nodes out there without all that much RAM. Other DoS attacks are prevented by existing resource limits, e.g. tx fees, coin-age, etc. and "handling lots of clients" is something we already have limits on via the connection limits. I should know - I've spent a lot of time looking for and fixing cheap DoS attacks. (e.g. the sigops one I fixed) getutxos is unique in how easy it is to use to crash systems at no cost.

@mikehearn You'd be smart to just own up and say "oops, I screwed that up" rather than trying to make excuses. Heck, I personally screwed up a bit by ACKing the patch without noticing that flaw.

Contributor

petertodd commented Aug 27, 2014

Lots of nodes out there without all that much RAM. Other DoS attacks are prevented by existing resource limits, e.g. tx fees, coin-age, etc. and "handling lots of clients" is something we already have limits on via the connection limits. I should know - I've spent a lot of time looking for and fixing cheap DoS attacks. (e.g. the sigops one I fixed) getutxos is unique in how easy it is to use to crash systems at no cost.

@mikehearn You'd be smart to just own up and say "oops, I screwed that up" rather than trying to make excuses. Heck, I personally screwed up a bit by ACKing the patch without noticing that flaw.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Aug 27, 2014

Member

I had never heard of this

It is not as if this was kept secret. To be fair, the BIP was posted to the bitcoin development mailing list, https://sourceforge.net/p/bitcoin/mailman/message/32590257/
The BIP was reviewed, finalized and merged at some point, bitcoin/bips#88
This pull has been open for months as well.
You could have heard of this, and given your opinion in all of those instances...

Member

laanwj commented Aug 27, 2014

I had never heard of this

It is not as if this was kept secret. To be fair, the BIP was posted to the bitcoin development mailing list, https://sourceforge.net/p/bitcoin/mailman/message/32590257/
The BIP was reviewed, finalized and merged at some point, bitcoin/bips#88
This pull has been open for months as well.
You could have heard of this, and given your opinion in all of those instances...

@mikehearn

This comment has been minimized.

Show comment
Hide comment
@mikehearn

mikehearn Aug 27, 2014

Contributor

Peter, I wrote a patch right? I'm grateful that Dave did this testing and found this problem. I am less grateful that he then ran to reddit and started complaining about how badly unit tested Bitcoin Core is (blame Satoshi for that one, if he must).

I think it's an open question about how such things can be caught in future. Any future change could result in large temporary memory usage without anyone noticing. The lack of any definition for "large" makes this harder - as I said, I can't actually make my testnet node crash even when repeating the conditions that were given. So how to find this sort of thing systematically is tricky. Ideally Bitcoin Core would print a warning if its memory usage went over a certain amount, but we don't have that.

One solution that will definitely NOT work is blaming people for being imperfect programmers. Core has shipped fatal bugs before and will do so again. The right solution is usually to ask "how can we stop such mistakes systematically"?

Contributor

mikehearn commented Aug 27, 2014

Peter, I wrote a patch right? I'm grateful that Dave did this testing and found this problem. I am less grateful that he then ran to reddit and started complaining about how badly unit tested Bitcoin Core is (blame Satoshi for that one, if he must).

I think it's an open question about how such things can be caught in future. Any future change could result in large temporary memory usage without anyone noticing. The lack of any definition for "large" makes this harder - as I said, I can't actually make my testnet node crash even when repeating the conditions that were given. So how to find this sort of thing systematically is tricky. Ideally Bitcoin Core would print a warning if its memory usage went over a certain amount, but we don't have that.

One solution that will definitely NOT work is blaming people for being imperfect programmers. Core has shipped fatal bugs before and will do so again. The right solution is usually to ask "how can we stop such mistakes systematically"?

@genjix

This comment has been minimized.

Show comment
Hide comment
@genjix

genjix Aug 27, 2014

There's a ton of silly discussion on that mailing list which consists of "who has the most stamina to invest in arguments" which I don't have time for. Therefore I cannot sift through all that looking for the gems of important discussion to register my single objection.

@mikehearn you can avoid mistakes by taking features out not trying to stuff more features in (which we don't need). This is a case of developers going mad for features that want which should be happening on another layer of Bitcoin, not the core protocol which should stay pure and focused. I see very little impetus for real implementation work happening apart from odd bugfixes or cosmetic changes, and a lot of "THE NEXT BIG THING" spurned by corporations who see Bitcoin as the new payments innovation instead of wanting to protect the integrity, security and values of Bitcoin. I'm a protocol conservative.

genjix commented Aug 27, 2014

There's a ton of silly discussion on that mailing list which consists of "who has the most stamina to invest in arguments" which I don't have time for. Therefore I cannot sift through all that looking for the gems of important discussion to register my single objection.

@mikehearn you can avoid mistakes by taking features out not trying to stuff more features in (which we don't need). This is a case of developers going mad for features that want which should be happening on another layer of Bitcoin, not the core protocol which should stay pure and focused. I see very little impetus for real implementation work happening apart from odd bugfixes or cosmetic changes, and a lot of "THE NEXT BIG THING" spurned by corporations who see Bitcoin as the new payments innovation instead of wanting to protect the integrity, security and values of Bitcoin. I'm a protocol conservative.

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Aug 27, 2014

Member

Reverted in 70352e1

Member

btcdrak commented Aug 27, 2014

Reverted in 70352e1

@dgenr8

This comment has been minimized.

Show comment
Hide comment
@dgenr8

dgenr8 Sep 1, 2014

Contributor

Nothing about this change is harmful enough to violate the process. A BIP was even merged, for crying out loud.

It would be great if core supported optional and p2p-queryable indexes for everything. An optional way to authenticate data served would also be great. Lack of these extra features should not doom this change. They could be in a layer maintained by the core project. There is absolutely no reason to punt stuff like this to third parties if open-source developers want to create it.

Contributor

dgenr8 commented Sep 1, 2014

Nothing about this change is harmful enough to violate the process. A BIP was even merged, for crying out loud.

It would be great if core supported optional and p2p-queryable indexes for everything. An optional way to authenticate data served would also be great. Lack of these extra features should not doom this change. They could be in a layer maintained by the core project. There is absolutely no reason to punt stuff like this to third parties if open-source developers want to create it.

@btcdrak

This comment has been minimized.

Show comment
Hide comment
@btcdrak

btcdrak Sep 1, 2014

Member

@dgenr8 A BIP getting merged doesn't make it a standard, it just starts it in the 'draft' workflow status: https://github.com/bitcoin/bips/blob/master/bip-0001/process.png

Member

btcdrak commented Sep 1, 2014

@dgenr8 A BIP getting merged doesn't make it a standard, it just starts it in the 'draft' workflow status: https://github.com/bitcoin/bips/blob/master/bip-0001/process.png

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Sep 1, 2014

Member

There are several bad bips which (IMO) no one should ever use, the BIP process doesn't tell you if something is good or not... it just specifies it.

Member

gmaxwell commented Sep 1, 2014

There are several bad bips which (IMO) no one should ever use, the BIP process doesn't tell you if something is good or not... it just specifies it.

jonasschnelli added a commit to jonasschnelli/bitcoin that referenced this pull request Dec 3, 2014

[REST] getutxos REST command (based on Bip64)
has parts of @mhearn #4351
* allows querying the utxos over REST
* same binary input and outputs as mentioned in Bip64
* input format = output format
* various rpc/rest regtests

jonasschnelli added a commit to jonasschnelli/bitcoin that referenced this pull request Mar 4, 2015

[REST] getutxos REST command (based on Bip64)
has parts of @mhearn #4351
* allows querying the utxos over REST
* same binary input and outputs as mentioned in Bip64
* input format = output format
* various rpc/rest regtests

jonasschnelli added a commit to jonasschnelli/bitcoin that referenced this pull request Mar 4, 2015

[REST] getutxos REST command (based on Bip64)
has parts of @mhearn #4351
* allows querying the utxos over REST
* same binary input and outputs as mentioned in Bip64
* input format = output format
* various rpc/rest regtests

jonasschnelli added a commit to jonasschnelli/bitcoin that referenced this pull request Apr 21, 2015

[REST] getutxos REST command (based on Bip64)
has parts of @mhearn #4351
* allows querying the utxos over REST
* same binary input and outputs as mentioned in Bip64
* input format = output format
* various rpc/rest regtests

rebroad added a commit to rebroad/bitcoin that referenced this pull request Sep 1, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Sep 1, 2016

rebroad added a commit to rebroad/bitcoin that referenced this pull request Sep 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment