Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transitioning to stronger hash function #58

Closed
the8472 opened this issue Feb 23, 2017 · 69 comments
Closed

Transitioning to stronger hash function #58

the8472 opened this issue Feb 23, 2017 · 69 comments

Comments

@the8472
Copy link
Contributor

the8472 commented Feb 23, 2017

With the published collision attack on SHA1 bittorrent may become less attractive for use-cases which don't just want simple integrity-checking (for which it still is perfectly fine) but also authentication.

We should come up with a plan to transition to stronger hashes for all uses, ideally while maintaining backwards compatibility for a transition period.

Thoughts:

  • To reuse existing software/infrastructure we'll have to truncate to 160bits in some places but not necessarily all of them.
  • To avoid bloat up torrents any further the new hashes should probably be exclusively in a merkle tree format to exist alongside with pieces
@the8472
Copy link
Contributor Author

the8472 commented Feb 23, 2017

And if we're doing merkle with a new hash algorithm that also means we don't have to pay attention to backwards compatibility with BEP 30, instead we can incorporate new ideas from issue 29.

@the8472
Copy link
Contributor Author

the8472 commented Feb 23, 2017

Also add hash agility to avoid another migration in the future.

@arvidn
Copy link
Contributor

arvidn commented Feb 23, 2017

I was just going to mention that. It's a good opportunity to start with a hash tree from scratch

@ssiloti
Copy link
Contributor

ssiloti commented Feb 23, 2017

I agree that this should be rolled into a new merkle tree torrent spec. Clients should only have to support two types of torrents: legacy/SHA1 and merkle/<new hash function>.

@arvidn
Copy link
Contributor

arvidn commented Feb 23, 2017

Maybe even going so far to have a tree per file. but maybe that's too much (it would have quite significant consequences of the internals of clients)

@the8472
Copy link
Contributor Author

the8472 commented Feb 23, 2017

I'm sortof in favor of having a tree per file but I recognize its complexity it adds. It would alleviate some headaches about appending to torrents and deduping files. But to be backwards-compatible we'll have to include BEP47 paddings because it requires piece alignment. In a from-scratch spec they would be implicit.

On a more procedural note, how would we handle declaring parts of BEP3 as legacy?

@gubatron
Copy link
Contributor

@dionyziz I think your input on hashing functions to consider/benchmark for the purposes of hashing file pieces (and merkle tree creation) for integrity verification (and other operations) of torrent files would be most appreciated given your field of expertise. (My guess is that SHA-256 and SHA-3 would be the first candidates, but I've no clue in terms of performance of this functions vs the old SHA1)

@ssiloti
Copy link
Contributor

ssiloti commented Feb 23, 2017

@the8472 The body of BEP3 should be left unmodified and all changes described in a new BEP. Depending on how extensive the changes are, it might make sense to use BEP3 as the base for the new BEP to create a new stand-alone protocol spec. Once the new BEP is finalized we can add an "Updated by" or "Obsoleted by" header to BEP3 referencing it.

@bramcohen
Copy link

The breakage of sha1 is a collision rather than a reversal, so it isn't a huge problem for BitTorrent (if someone sends you a torrent with broken data in it they can screw you in lots of other and worse ways ways) but this may be a good impetus for doing all the backwards-incompatible things we'd like to do.

There are three backwards-incompatible things I think are good ideas:

(1) Change the infodict to be a bencoded string of its own rather than an included object. This will get around the problems of bencoding being not really canonical. This is a trivial no-brainer if we're already doing something incompatible.

(2) Switch to separate hashing for the different files in a multifile torrent. There's no technical downside to this except that it's a pain to implement. We should probably just bite the bullet and do it.

(3) Switch to sha256 and using a merkle tree. For various reasons sha256 is clearly the right hash to use. The main problem with bep30 is that it results in the same file hashing to different roots based on the piece size, which is completely unnecessary. The hashing should go down to fixed size terminals, no more than 16kib but maybe 8 or 4 for more flexibility, and it should not change with the piece size. The piece size should still be specified in the info dict but it's there so that peers are all speaking the same language when they discuss what they have, without affecting hashes. This requires that piece sizes only be powers of two, so apologies to anyone fond of setting piece sizes to prime numbers.

Suggestions for other worthwhile changes which can only be done incompatibly are welcome. Most extensions don't require incompatibility.

@kyhwana
Copy link

kyhwana commented Feb 23, 2017

For an example, see the .torrent files inside https://keybase.pub/kyhwana/sha-2/ folders a/ and ab/ "a.pdf" inside a/ and ab/ have the same SHA1's but different SHA256 (and are different images)
Note that I set the piece size to 64KB, so the piece has the same SHA1's.

I imagine you could position the changed data inside pieces at the right offsets to send malicious pieces and have it verify as well.

@dionyziz
Copy link

This is a security problem for BitTorrent, because users may trust a .torrent file download, for example using an HTTPS link on a trusted domain and trusting the PKI system, but may not trust their peers, because their network could be compromised. If a SHA1 collision can be found between a malicious binary file (e.g. a virus) and a non-malicious binary file (e.g. a video game), this is a very reasonable attack: It is a reasonable threat model to assume the network is compromised, but HTTPS is not.

I recommend upgrading to SHA256.

@bramcohen
Copy link

To be clear: The current problem with sha1 is generation of collisions, not reversals, so if you can trust the source of the .torrent file but not peers you're still fine, and that's the usual threat model. That said it's a problem worth fixing and the only reason we haven't done so already is that it requires a backwards incompatible change.

@dionyziz
Copy link

dionyziz commented Feb 23, 2017

Trusting the source of a .torrent file but not your peers is not fine. The attack could work like this: Initially, the attacker generates two different files, A and B. The attacker also creates a torrent file for A and makes sure that there exists a SHA1 collision with the contents of file B. Let's say these are C++ source code files. A is an innocent and useful program, while B is a malicious program. Then the attacker sends A to a trusted third party, for example the package maintainer of a certain linux distribution. The trusted third party manually verifies that the source code for A is benevolent and that the torrent contains the correct hash and vouches for its validity by hosting the verified .torrent file on their HTTPS website. The attacker subsequently replaces the file contents of A on the network with the contents of B during the download by modifying peer data in transit. The torrent client will not be able to detect this tampering.

This attack does not require a reversal, only a collision generation.

@zookozcash
Copy link

I'd like to put in a plug for BLAKE2. It has a deeper security margin than SHA-256 has and it is much faster. BLAKE2 is widely used. It is already included in your favorite crypto library.

BLAKE2 is a descendant of DJB's Salsa20/Chacha20 designs, and its immediate ancestor BLAKE1 was thoroughly reviewed during the SHA3 competition. (Those are slides that I presented at ACNS 2013.)

I'd be interested in a measurement of whether hash function performance difference between SHA-256 and BLAKE2 makes a difference in BitTorrent's usage. Even if hash function performance isn't a critical issue in this case, BLAKE2 is a fine choice.

@bramcohen
Copy link

A reasonable argument can be made for blake2. Its advantages are that it's faster in software and has a larger hash size. Its disadvantages are that it's less standard, meaning people will give me more grief about it if I use it, and that it's slower if there's hardware acceleration which is likely to become commonplace for sha256 and nothing else, and that the speed of this operation doesn't matter much anyway (yes I'm talking out of both sides of my mouth here) and that the longer key length takes up more space without a practical improvement to the already ludicrous security parameter. With using merkle roots maybe the longer length doesn't matter in practice. Or blake2s could be used for shorter hash lengths and better performance on 512 bit inputs. Or maybe blake2b could be used for the terminals and truncated and blake2s could be used climbing up the tree. (blake2b is slower on 512 bit inputs because its block size is 1024.) Truth be known I'd be fine with any of these options, I just want to do the thing which will cause the least bullshit complaining.

@zookozcash
Copy link

You can have BLAKE2b or BLAKE2s generate shorter outputs. This is similar to truncating the outputs, except that the intended output length is mixed in so that H(x, gimme-bytes=24) is not equal to H(x, gimme-bytes=32)[:24]. Either way.

BLAKE2 is fairly widely-accepted nowadays, I think. I'm not sure if people would give you grief about using it. It tends to be the default hash function in new crypto designs — things like the Password Hashing Competition, libsodium, etc.

@jeremyBanks
Copy link

jeremyBanks commented Feb 24, 2017

Is this expected to be a one-time update to a new algorithm, or should some accommodation be made for future algorithm changes, as the8472 suggested?

Presumably, a new default algorithm such as SHA-2-256 or Blake2b will be specified. But the spec could also describe a standard way of indicating the hash algorithm in-band. Clients could choose to implement support for additional algorithms, and they could be adopted as needed without further changes to the protocol.

One option for indicating this on the wire or in magnet links could be the "multihash" digest format used by IPFS (draft spec seems to be here). This adds a ~2-byte prefix to each digest indicating the algorithm used and the digest length (kind-of similar to the BitTorrent protocol handshake prefix). There would be minimal additional complexity for clients that don't implement algorithm flexibility: they would just verify and discard or add the default two-byte hash prefix (e.g. 0x1220 for SHA-2-256, or 0x1214 if it's truncated to 160 bits).

@bramcohen
Copy link

Hash algorithm extensibility is a delusion. If a change is made to use a hash algorithm which isn't currently defined then that change will, by definition, be incompatible, even if you play a semantic game of pretending you were prepared for it. In protocols in general extensibility mechanisms are a huge source of complexity and security problems, and all the new hash functions we're talking about here should be good until the end of time, so it's better to pick one and call it a day.

@the8472
Copy link
Contributor Author

the8472 commented Feb 24, 2017

@bramcohen

(1) Change the infodict to be a bencoded string of its own rather than an included object. This will get around the problems of bencoding being not really canonical. This is a trivial no-brainer if we're already doing something incompatible.

This seems unnecessary. Most of those problems are solved or worked around these days. And it would prevent a transition format that supports both legacy and new hashes.

(2) Switch to separate hashing for the different files in a multifile torrent. There's no technical downside to this except that it's a pain to implement. We should probably just bite the bullet and do it.

agreed

(3) Switch to sha256 and using a merkle tree. For various reasons sha256 is clearly the right hash to use. The main problem with bep30 is that it results in the same file hashing to different roots based on the piece size, which is completely unnecessary. The hashing should go down to fixed size terminals, no more than 16kib but maybe 8 or 4 for more flexibility, and it should not change with the piece size. The piece size should still be specified in the info dict but it's there so that peers are all speaking the same language when they discuss what they have, without affecting hashes.

great idea

This requires that piece sizes only be powers of two, so apologies to anyone fond of setting piece sizes to prime numbers.

😢

Hash algorithm extensibility is a delusion. If a change is made to use a hash algorithm which isn't currently defined then that change will, by definition, be incompatible

Strong disagreement. It obviously is not about forwards compatibility in the sense of supporting not-yet-existing hash functions. It is about making future migrations possible without changing any structural aspects of the protocol. Yes, clients would still need updating to include a new hash function, but all the code branches deciding which hash to use should already be there. To force implementers to spend time on that now we could specify two hash functions. Since functions of vastly different design usually don't fall at the same time that would also mean if one falls we'd still have another one in operation and enough time to add a 3rd one to the spec while deprecating the vulnerable one.

All it takes is a "blake2" or "keccak" string in the info dict and possibly magnets.

Unlike interactive protocols we wouldn't need any negotiations either. Torrents would simply contain a declaration of what they are. Clients can then either complain that they don't support it yet or that something is obsolete and should not be trusted.

and all the new hash functions we're talking about here should be good until the end of time, so it's better to pick one and call it a day.

Unlike asymmetric cryptography those hash functions are not mathematically proven to be NP. And historically hash functions had somewhat limited lifetimes.
Or maybe you're saying the end of time will soon be upon us?

@zookozcash

You can have BLAKE2b or BLAKE2s generate shorter outputs.

We will probably need different truncation in some places since we can accommodate 256bit or even 512bit hashes in some places but have to make do with 160bit lengths in other parts of the protocol (DHT, Trackers, handshake).

@zookozcash
Copy link

zookozcash commented Feb 24, 2017 via email

@the8472
Copy link
Contributor Author

the8472 commented Feb 24, 2017

@bramcohen

but this may be a good impetus for doing all the backwards-incompatible things we'd like to do.

Something that may also be useful to shrink the metadata size is to use directory trees instead of flat lists. In some multifile torrents the per-file information actually dominates the size and not the pieces.

files: [{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb","f1"]
},{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb","f2"]
},{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb","f3"]
},{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb","f4"]
},{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb","f5"]
}]

takes up a lot of space in torrents with many small files. Instead we could do

files: [{
  path: ["aaaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbb"],
  files: [{
    path: ["f1"]
  },{
    path: ["f2"]
  },{
    path: ["f3"]
  },{
    path: ["f4"]
  },{
    path: ["f5"]
  }]
}]

@gubatron
Copy link
Contributor

gubatron commented Feb 24, 2017

Hash algorithm extensibility is a delusion. If a change is made to use a hash algorithm which isn't currently defined then that change will, by definition, be incompatible

Strongly disagree here. Here's great example where this could be of great utility

Last night I was thinking about the problem Bitcoiners have with the initial sync of the blockchain and how it could greatly be improved by all nodes using libtorrent.

If a torrent's metadata specified what hashing algorithm has been used for its pieces in the info section, Bitcoin clients could seed and announce (themselves on the DHT) each block as a separate merkle torrents, where the merkle hash included in the torrent metadata would correspond to merkle root specified for the block header in Bitcoin-land, other cryptocurrencies could make use of this no matter what hashing algorithm they use if the hashing algo were specified.

@dionyziz
Copy link

@gubatron admittedly, the bitcoin case doesn't need authentication – the blockchain can be downloaded over untrusted torrent peers and even using an untrusted bittorrent file. Authenticity can subsequently follow by verifying the proof of work contained within.

@gubatron
Copy link
Contributor

gubatron commented Feb 25, 2017

@dionyziz yes, in the case of bitcoin, if nodes were to sync using torrents, the only way to trust the torrents (doesn't matter their source) is if their inner merkle-tree sum matches the merkle root on the corresponding block header for that torrent.

I guess I might as well ask, as this is the part that is still missing in my mind.

Do torrents support variable piece sizes?

In the case of regular file sharing, can you have smaller pieces so that a piece doesn't span across the end of a file and the beginning of another one?

The idea would be to make each transaction in the block the equivalent of a piece that gets hashed as one of the merkle tree nodes, so that the merkle root in the block header is the same as the one defined inside the torrent. This would be one hell of a custom torrent, but I think it could fly if variable piece sizes are doable in the torrent metadata.

@the8472
Copy link
Contributor Author

the8472 commented Feb 25, 2017

In the case of regular file sharing, can you have smaller pieces so that a piece doesn't span across the end of a file and the beginning of another one?

This is more or less achieved by aligning files in the piece-space via paddings in the end. Clients not aware of paddings download zeroes. Those that are effectively download a shorter piece.

If you define one file per block then you can have that today.

But I think bitcoin is a distraction from bittorrent's core uses. It's nice if it can be mapped nicely onto BT, but the main goal is to distribute files.

@arvidn
Copy link
Contributor

arvidn commented Feb 26, 2017

@bramcohen regarding distinguishing the leaf size of a hash tree and the piece size, I think Steven had a good point (in the other thread about a new merkle tree extension). Fundamentally, the only reason to have pieces be larger than 16kiB is to not spend a lot of bandwidth on the BITFIELD- and HAVE messages. However, it might make more sense to provide alternatives that are more compact/efficient that those. I spent some time with an extension for that (but there's some more work to be done there). There's also the binmap paper.

@MuxZeroNet
Copy link

No one have mentioned magnet links yet. Will existing magnet link URNs also be vulnerable? Do we have to migrate to a new URN like this?

>>> import base64
>>> import hashlib

>>> s = hashlib.sha512("Replace SHA-1 with stronger message digest functions!").digest()
>>> base64.b32encode(s)
'EYVH4MBTKSZQOARCINCZZXUARJFP2SY4GXZITGS27VSZKUANE27XMNUZZ52O6UP6GUKS3LMVSGWYVBKAL3CLY5AXKN4C4HDYLBY4J2A='

>>> new_infohash = base64.b32encode(s).lower().strip("=")
>>> new_urn = "bsha"
>>> new_maglink = "magnet:?xt=urn:%s:%s" % (new_urn, new_infohash)
>>> new_maglink
'magnet:?xt=urn:bsha:eyvh4mbtkszqoarcinczzxuarjfp2sy4gxzitgs27vszkuane27xmnuzz52o6up6guks3lmvsgwyvbkal3cly5axkn4c4hdylby4j2a'

@arvidn
Copy link
Contributor

arvidn commented Feb 26, 2017

In principle the piece and hash tree hash function could be different than the one used to calculate the info-hash, but it wouldn't make much sense to upgrade SHA-1 of pieces without also upgrading the info-hash function. Magnet links normally use hex-encoding for the info-hash though (early versions used base32, but that's mostly phased out)

@the8472
Copy link
Contributor Author

the8472 commented Feb 26, 2017

@MuxZeroNet

Will existing magnet link URNs also be vulnerable?

Existing hashes are never vulnerable to collision attacks. This is not specific to bittorrent, it's a consequence of what collision attacks do. But it is difficult to tell whether a hash predates the first collision attack on a hash function.

Do we have to migrate to a new URN like this?

Are you talking about the syntax of magnets (btih) or the hash function used? We might be able to mostly keep the syntax or simply extend it to keep both legacy and forward compatibility. But we will need to use a new hashfunction to avoid certain attack scenarios that would make bittorrent less viable for some distribution use-cases.

@arvidn

Fundamentally, the only reason to have pieces be larger than 16kiB is to not spend a lot of bandwidth on the BITFIELD- and HAVE messages.

It also limits the worst-case memory use for keeping track of which pieces each peer has. If they are at 50% and select their pieces at random instead of batches then clever in-memory encodings won't save you.

Not to mention that such extensions would increase the complexity of a new protocol if they were strict dependencies. I think being able to specify >16KiB piece sizes would help keeping things simple.

@bramcohen
Copy link

The hashes likely to have anybody behind them are sha256, sha3, blake2b, and blake2s (and maybe a mix of blake2b and blake2s). I continue to maintain that making it parameterizable is a bad idea, and after getting feedback from some more crypto people it sounds like sha256 is viewed as fine by everybody and is depended on by everything, so that should be viewed as the default.

@bramcohen
Copy link

The advantage of going smaller than 16KiB is that if you split smaller requests across peers then you can hash check them individually. That said noone seems interested in going smaller on requests, so I'll just drop that thought. But since nobody ever deviates from 16KiB, why don't we make that the forced amount? Then requests just have to specify an index without specifying a request size, so there's less complexity and less concerns about what request sizes are and are not acceptable, which is a bit of a concern because peers can force extra data to be sent to them after a choke happens by requesting large piece sizes so an already started request has to be finished even after a choke happens.

@the8472
Copy link
Contributor Author

the8472 commented Mar 2, 2017

The advantage of going smaller than 16KiB is that if you split smaller requests across peers then you can hash check them individually.

Reducing leaf hash sizes doesn't help there unless you also reduce the piece size. Clients do not have access to the leaf hashes until after they have verified a whole piece and can derive them from the data itself.

Why don't we make that the forced amount?

At this point? Ontological inertia. Source code exists, let us reuse it if there is no good justification for the effort to change it. We're already doing a fairly big redesign of the metadata, there's no need to touch the wire protocol.

At least in principle low-bandwidth clients could do <16KiB requests to keep the congestion-induced latency for control messages low (maybe coupled to µTP-MTU?). I don't think any implementation actually does this, but at least it's possible.

@arvidn
Copy link
Contributor

arvidn commented Mar 2, 2017

16 kiB has been (for a long time) the upper bound on request sizes (which I think makes sense).

It still makes sense to request smaller blocks for the tail piece, which is done today. With more tail pieces, which the proposed hash tree would create, it becomes even more useful to be able to request smaller blocks.

@bramcohen
Copy link

Reducing leaf hash sizes doesn't help there unless you also reduce the piece size. Clients do not have access to the leaf hashes until after they have verified a whole piece and can derive them from the data itself.

Huh? The whole point of going Merkle is that you can verify the very first chunk you get, even when all you've got is the root.

We're already doing a fairly big redesign of the metadata, there's no need to touch the wire protocol.

We need to add hash proofs if nothing else

It still makes sense to request smaller blocks for the tail piece, which is done today. With more tail pieces, which the proposed hash tree would create, it becomes even more useful to be able to request smaller blocks.

What's wrong with just truncating the last piece? Certainly the last chunk should be assumed to be truncated, which makes indices work fine.

@arvidn
Copy link
Contributor

arvidn commented Mar 2, 2017

What's wrong with just truncating the last piece? Certainly the last chunk should be assumed to be truncated, which makes indices work fine.

That's a fair point. I have a hard time coming up with any other reasons for unaligned or oddly sized requests. However, I think @the8472 has a reasonable point though, that adding a new PIECE message can be done independently, and deferring it probably eases the transition

@the8472
Copy link
Contributor Author

the8472 commented Mar 2, 2017

I think the draft now contains all necessary components for an upgrade plus some of the "always wanted to do that" changes discussed.
General discussion and nitpicks are welcome, nothing is set in stone.

@the8472
Copy link
Contributor Author

the8472 commented Mar 2, 2017

@bramcohen

Huh? The whole point of going Merkle is that you can verify the very first chunk you get, even when all you've got is the root.

That is one possible use of merkle trees. But as specified by BEP30 it is in conflict with other use-cases (discussed in #29). The new draft retains all the old possibilities provided by piece lists, vastly simplifies deduplication. This makes .torrent files including the root dictionary about about as heavy-weight as info dictionaries were before. But magnets/metadata exchange can now provide in a faster startup by grabbing piece level hashes on demand.
If that is not sufficient for some low-latency use-cases then we can, as @arvidn suggested add another extension in the future to provide wire protocol messages that extend hash information down from the root or the piece layer to the block layer. (Note: if piece length == 16KiB then piece layer == block layer.)

So with some extensions can get all the goodies of merkle torrents if we want them, but the draft for the new base protocol aims to cover all the use cases enabled by the old base protocol (and then some).

@kyhwana
Copy link

kyhwana commented Mar 5, 2017

See also https://biterrant.io/ for a Proof of Concept attack involving good/malicious binaries.

@bramcohen
Copy link

There's another old idea which we have the opportunity to add now: Instead of peers presenting the infohash directly, they present a hash of the hash and use the actual infohash as a shared secret for kicking off an encrypted connection so that peers who don't have access to the infodict can't download the file. This relates to the general subject of how radical we want to be and how many stages we want to do this in. What I'm going to do now is read through the detailed proposal and critique it primarily from the point of view of forming a rollout plan.

@the8472
Copy link
Contributor Author

the8472 commented Mar 8, 2017

Well, the attempt at peer protocol encryption for the purpose of traffic shaping evasion kind of failed after working for a short time. Failed in the sense that it is an arms race and we did not continue to refine it to keep winning that race.

But there are numerous other security properties that can be achieved with encrypted peer connections. But I think some of them can be better achieved with payload encryption (#18, #20), but I'll have to revise that proposal in light of a new torrent format.

@arvidn
Copy link
Contributor

arvidn commented Mar 8, 2017

@bramcohen When using protocol encryption today, peers don't advertise the real info-hash, but a hash derived from it. Primarily what I think would be required to achieve a bit more privacy would be to consistently advertise such derived hash to the DHT, trackers and local peer discovery (and also to reject un-encrypted connections)

@the8472
Copy link
Contributor Author

the8472 commented Mar 12, 2017

That conflicts with making public content easily accessible. Many search sites depend on or supplement their data with DHT-indexed torrents and thus provide a valuable service to users.
Of course privacy is also desired by users - many probably would prefer to transfer their data anonymously if that could be had cheaply - but for public content anyone can obtain the torrents, join a swarm and PEX a peer list, so there are limits on what is practically achievable.

Of course there are other privacy aspects. For example passive ISP-level/public wifi surveillance. This could be prevented by unsigned DH-exchanges on the protocol level and for DHT stores.
And then there's private data that's not meant for public distribution. In that case payload encryption could solve that issue and also provide at-rest protection for the data. The other measures wouldn't hurt since it wouldn't make sense to have that indexed anyway.
Another aspect is censorship of specific infohashes on the network level. I don't recall this happening anywhere except trackers blacklisting particular infohashes in response to DMCAs, but one possible countermeasure there is infohash-hopping with some storage-intensive derivation algorithm.

Anyway, my point is we can cover multiple use-cases at the same time but that requires discussion about several distinct threat models and countermeasures. I think that would be better served by a separate discussion.

@zookozcash
Copy link

zookozcash commented Mar 13, 2017 via email

@sigmarelax
Copy link

If "good until the end of time" is your goal as you do not want another transition in x years, I don't believe you should use SHA-256.

While not weakened at full-rounds, there have been some notable attacks found, partial list over at wikipedia. Additionally, the US government has been moving secure systems off of SHA-256 since last year, in favor of SHA-384.

It doesn't seem logical to me to implement SHA-256 for the long-term when there are alternatives (particularly one that is well-vetted, faster, has a better security margin).

@the8472
Copy link
Contributor Author

the8472 commented Mar 18, 2017

I for one am not opposed to stronger hash functions. The question is whether the other devs will agree to them.

Impact:

  • more unwieldy magnets
  • larger multi-file torrents
  • larger torrents at fixed piece sizes (doubling piece size == half the hashes in the root dict)
  • more expensive in terms of cpu-time. the question is whether that would matter anywhere

@ssiloti
Copy link
Contributor

ssiloti commented Mar 18, 2017

I'm going to go out on a limb and say that 128 bits of security should be enough for anyone. It is enough to put brute force attacks firmly in the realm of boiling the oceans. Even if SHA-256 were subject to a break as severe as SHA-1, losing ~20 bits of security, it would still be comfortably secure against practical attacks with conventional computers. It's going to take a radical new attack to find a SHA-256 collision, and such an attack may well render SHA-384 vulnerable as well.

Note that while SHA-512 and its truncated derivatives are faster in software on 64-bit processors, such CPUs are increasingly likely to have even faster hardware support for SHA-256. Software performance is more likely to matter on embedded systems with 32-bit CPUs where SHA-256 is faster.

Given the trade-offs, I'm willing to take a chance with SHA-256.

@the8472
Copy link
Contributor Author

the8472 commented Mar 18, 2017

it would still be comfortably secure against practical attacks with conventional computers

But if we are debating "until the end of time" security then we have to take quantum computers into account. The BHT algorithm claims to reduce collision resistance to 1/3rd of the hash length instead of 1/2. DJB refutes this, but he leaves the question whether that is the final word on the issue open.

@the8472
Copy link
Contributor Author

the8472 commented Mar 18, 2017

Software performance is more likely to matter on embedded systems with 32-bit CPUs where SHA-256 is faster.

Well, 64bit systems are making a foray into the embedded landscape, e.g. the raspi3.

But looking at some openssl benchmarks it looks like most systems are very anemic anyway and the difference between sha256 and sha512 is a factor of 2-3 at most. So if sha512 is prohibitively expensive then sha256 won't fare much better.

Do we have a case where every ounce of performance matters? I don't really want to be responsible for a "256bits ought to be enough for everyone" a few years down the road if we can avoid it.

@ssiloti
Copy link
Contributor

ssiloti commented Mar 18, 2017

I don't think we have a clear enough picture at this point to predict which functions will prevail in a post-quantum world. Practical quantum computing is still a long ways off and the economics of it are totally unknown, so a wait-and-see approach seems appropriate.

Regarding performance. ARMv8 has an optional crypto extension which supports SHA-256, so 64 bit embedded devices at least potentially carry hardware support. To be fair it looks like the RPi3 lacks that extension, but it also lacks enough I/O throughput for it to matter.

The main reason to care about performance is power consumption. Reducing power draw by a factor of 2 or 3 is a big deal in mobile and datacenter settings. It also makes it easier to run a client on a cheap VPS as they tend to be heavily oversubscribed for CPU time and less so for bandwidth.

@the8472
Copy link
Contributor Author

the8472 commented Mar 18, 2017

I guess we'll see in a decade or two. Anyway, I'm mostly waiting on review from @bramcohen. Once any differences are settled we should also generate some test-data and add them to the spec. Due to the merkle trees the complexity is a little higher, so hopefully we can avoid implementation errors that way.

@zookozcash
Copy link

Here's a better link to my arguments for why BLAKE2 is safer than SHA2:

zcash/zcash#706 (comment)

Note in particular the security margin bit — SHA-256 is breakable up to 31 out of 64 rounds. BLAKE2b is breakable up to 2 & ½ of 12 rounds (https://en.wikipedia.org/wiki/Hash_function_security_summary). Note also the "SHA-256's father has turned his back on it" argument, which I present not as an argument that SHA-256 is weak, but that SHA-256 will suffer a worse and worse reputation as time goes by.

@the8472
Copy link
Contributor Author

the8472 commented Aug 7, 2017

Closing this as the BEP has been added as draft.

Feedback is still welcome from implementors, especially if they encounter some roadblocks. At the moment it still is a draft after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests