Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short-id collisions allow attackers to cause arbritary bans of thinblock peers. #485

Closed
gmaxwell opened this issue Apr 27, 2017 · 20 comments

Comments

@gmaxwell
Copy link
Contributor

gmaxwell commented Apr 27, 2017

It was pointed out previously that it is easy to construct txids which makes the shortid's collide (as they're only 64-bits long). Rather than fixing this easily fixed design flaw the excuse was just made that it /merely/ results in an extra roundtrip delay and loss of the thinblock bandwidth savings.

But in the latest released code these collisions cause BU nodes to ban each other-- because they result in an incorrect hash root-- and would allow a malicious attacker to carve up a network of BU nodes into arbitrary largely-attacker-controlled shapes.

(Fortunately there are so few BU nodes that all of them banning each other is hardly a major issue-- but I think it is a bad sign that you're effectively counting on people not using your software to get away with poor design and maintenance practices)

To further highlight that this is a design flaw and not just an implementation gaff: Classic independently introduce the same vulnerability, but with entirely different code.

The thinblock system was poorly designed and its reported flaws have not been fixed for months, and have over and over again caused network wide disruption in BU nodes. Perhaps it will be fixed eventually, but it is irresponsible to leave it running while the fixes are still flowing. Had compact blocks had blown up like this in the Bitcoin project, we would have temporarily turned it off, not continued to make incorrect panicked hot-fixes.

@indomitablesnowman
Copy link

indomitablesnowman commented Apr 27, 2017

I agree with @gmaxwell. Until all bugs are resolved in every line of code, all OSS should be immediately pulled from production. It is only right that SegWit be allowed to become the dominant technology in the space--and its supporters justly enriched--due mostly to gmaxwell's superior intelligence, leadership qualities, and stewardship of the Bitcoin protocol since Satoshi and Gavin abandoned it. As though there could be any further issue, his neckbeard and physique should set aside such trivial concerns about his ability to manage very, very large, convoluted projects. Clearly he knows much better than we mere mortals what is takes to produce thin blocks!

hasta SegWit siempre!

@ptschip
Copy link
Collaborator

ptschip commented Apr 27, 2017 via email

@gmaxwell
Copy link
Contributor Author

gmaxwell commented Apr 27, 2017

@ptschip you need to take a step back and consider your response. I wrote about a serious vulnerability about short-id collisions partitioning you nodes and your response begins with a paragraph of insults ("We seem to be on you mind a great deal these days...we just won't go away will we. How's your ulcer doing?"); then a more or less unrelated argument about compact blocks.

You make no mention actual issue, other than to point out that your PR's making emergency fixes (which have introduced vulnerabilities) are temporary. I totally understand that you cant make an xthin that works right over night, but the right thing to do while it isn't working is to just shut it off... not keep applying bandaids that make it worse.

So I'll take it upon myself to instrument compact blocks as we do for xthins.

Compact blocks has extensive logging. All the performance is readily available from that, just turn up debugging and parse the logs.

I truly believe compact blocks will never perform as well as xthins;

Most of the time compact blocks transfers the block in 0.5 RTT, when it needs to request missing transactions it takes 1.5RTT. Your protocol takes a minimum of 1.5 RTT. It is more or less impossible for your protocol to do better than BIP152 from the perspective of just latency because most of the time BIP152 achieves latency lower than xthin's best case.

From a bandwidth perspective, BIP152 uses short-ids which are 75% of the size of yours, and does not need to send the many kilobytes of bloom filter. The requests for missing transactions are also tens of times more efficient. So when HB mode is not used, BIP152 is unconditionally less bandwidth consuming than xthin.

So in the case where the user cares exclusively about latency or where the user cares exclusively about bandwidth it appears impossible by construction for BIP152 to not outperform xthin.

If compact blocks can do what xthins does well then why use xthins?

You might hypothesize some user that cares about latency but doesn't mind doubling it, and cares a ton about bandwidth.. but not their outbound usage, for some reason... and never uploads blocks.. And then say for that user xthin is narrowly better. Perhaps, but what does it matter when their node is constantly down?

If that user really exists and is concern, it would be far simpler to write a simple bloom based request block that drove BIP152 prefill to be used when HB isn't in use. (We haven't done it because it appeared more or less useless in testing)

I think the bloom filter is an odd duck: it expends bandwidth to optimize latency-- to save a round trip-- but at a the expense of increasing the minimum latency by a round trip (and the average by most of that). Perhaps a bit like hovercraft: not great on land, not great on the water-- though you could come up with situations where its useful. But your hovercraft is full of eels.

And FWIW, for most of xthin's early life the compression stats you published were just wrong: it ignored re-requests and the missized bloom filters caused constant re-requests. Relying on internally generated stats is risky and maintaining that code has a cost. Every minute you spent polishing those stats could have been spent making xthin not an attack vector.

@zander
Copy link
Contributor

zander commented Apr 27, 2017

@gmaxwell did you actually try this? The code actually explicitly guards against the issue you are talking about. The collision is detected using the local mempool and orphans. Granted, you may get very unlucky that your collision tx didn't hit your mempool yet. But this is a corner case with a negligible attack surface.

As far as I can see, your attack is not going to work due to this code;

https://github.com/BitcoinUnlimited/BitcoinUnlimited/blob/release/src/thinblock.cpp#L372

which doesn't continue with the handling of a xthin block in exactly the usecase you describe.

@deadalnix
Copy link
Contributor

deadalnix commented Apr 27, 2017

@gmaxwell Let's not get this devolve into CB vs XThin.
@ptschip This is a legit problem with the changes introduced in the last version of XThin. There is nothing to debate. This is bug report and a serious one, it need to be addressed.
@zander The issue is legit and was raised by Greg a long time ago. While it wasn't that big of a deal at the time, it can now be leveraged to make BU nodes ban each others.

@ftrader
Copy link
Collaborator

ftrader commented Apr 27, 2017

The responsible thing now is to look into the issue with care and analyze it.
If the described partitioning attack is feasible, and the solution possibly takes longer, then recommending to disable xthinblocks until it is resolved is a responsible approach.

At this stage thanks to @gmaxwell for raising the issue again.

@orweinberger
Copy link

@ptschip You managed to produce the most unprofessional comment I've ever seen in Github written as a reply to a serious bug in your software.

@indomitablesnowman
Copy link

Greg is such a great guy. Always looking out for other people.

@gmaxwell
Copy link
Contributor Author

gmaxwell commented Apr 27, 2017

@zander Yes. I tried it. It works fine.

The code actually explicitly guards against the issue you are talking about. The collision is detected using the local mempool and orphans.

Nope. The collisions cannot be in your mempool because they are conflicting transactions (this is, in fact, the most obvious way to produce collisions, to produce colliding txids that do not conflict requires a permuting a large collection of inputs; though its technically faster with savvy use of sighash flags).

If not for this the existing handling would still be inadequate because the attacker can just flood collisions so that you're always missing some (plus conflicting txn propagate far less far...).

You cannot keep thinking that the attacker will collaborate with you and produce only the subset of cases that you handle!

(And just responding to reports with rationalization certainly does not help.)

@zander
Copy link
Contributor

zander commented Apr 28, 2017

Yes. I tried it. It works fine.

Please send your exploit by email to me so I can see how it works. Thanks!

(And just responding to reports with rationalization certainly does not help.)

I agree, I hope the owners of this code will fix the bug. I know that something coming from gmaxwell is sometimes ignored, but I have to agree on this specific item that responding with rationalization is not the way to deal with issue reports.

@gmaxwell
Copy link
Contributor Author

@zander No joy, I'm not giving exploit code to you. Too much risk that it gets used on the network and I get blamed. I think I adequately explained above what the issue is, but if not, I'd be happy to clarify.

I agree, I hope the owners of this code will fix the bug.

Your implementation has the same issue too.

@zander
Copy link
Contributor

zander commented Apr 28, 2017

No joy, I'm not giving exploit code to you.

Well, you explained how it works, I can write it myself, but I thought you would want to show your version.

Your implementation has the same issue too.

Classic has never released anything other than the BU implementation of xthin, maybe with some bugfixes I personally added.

@sickpig
Copy link
Collaborator

sickpig commented Apr 29, 2017

@GregoryMaxwellIsALiar
could we please try to keep the discussion. on technical level? Thanks in advance

@ftrader
Copy link
Collaborator

ftrader commented Apr 29, 2017

I'm going to repeat what I wrote in #311 (comment)

We don't have anyone called 'GregoryMaxwellIsALiar' on the BU team, and that person certainly doesn't speak for us as BU developers.
https://en.wikipedia.org/wiki/False_flag

@indomitablesnowman
Copy link

We don't even know that 'GregoryMaxwellIsALiar' is a 'he'. Let's please be cautious not to senselessly gender the conversation in this way. I'm sure that Gregory means well. He just wants us to think correctly about these things. Given time he will illuminate our understanding of the bitcoin protocol, and we will all be the better for it.

@ftrader
Copy link
Collaborator

ftrader commented Apr 29, 2017

@indominablesnowman : good point, changed my comment

@GregoryMaxwellIsALiar : you might feel it's necessary to vent here, but I assure you it's not.

@awemany
Copy link
Contributor

awemany commented Apr 30, 2017

@GregoryMaxwellIsALiar : In the unlikely case you are genuinely venting and not false flag here - I agree with @ftrader ; this is not helpful.

To put another pair of eyeballs on this:

@zander: I do see how the code gracefully degrades if there is any known/visible conflict. But are we really, REALLY sure it won't degrade without such a conflict?

For example, you say "Granted, you may get very unlucky that your collision tx didn't hit your mempool yet. " - what if I maliciously leave the mismatching transaction out when talking to BU/Classic peers? Or do you think reasonable assumptions about well-connectedness should apply here as well?

This is a lot of assumptions however, and that makes me feel somewhat uneasy. That doesn't seem to be a defense-in-depth approach! Is there a downside to not ban but simply always degrade the xthin fetching to full hash ids in all cases where there is a collision or merkle tree mismatch?

@ptschip: Also, I do think that the comment "There is a remote possiblity of a Tx hash collision" in L370 might mislead and strengthen the wrong mindset when looking at the code - the 64bit collision probability is only "remote" in non-adversarial conditions.

@stefment
Copy link

stefment commented May 1, 2017

why not implement compact blocks and segwit in BU?

@sickpig
Copy link
Collaborator

sickpig commented May 1, 2017 via email

@sickpig
Copy link
Collaborator

sickpig commented May 31, 2017

closed by #506 on release and #479 on dev branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants