BEP 30 (merkle-tree) improvements #29

the8472 · 2016-05-27T00:06:17Z

Thinking about PR #28: If we want to migrate to 16KiB pieces and merkle-tree-torrents then i've found that the trees are hostile to BEP 38 and similar file-reuse strategies.

BEP 30 also mentions related caveats:

In theory we can create one swarm. In that swarm, new clients could serve pieces to old clients. For the new clients to benefit from the old clients, however, we need to add some way for the new to obtain the hashes required to check a piece. Designing a fool proof solution for this problem is not trivial.

Because we let the initial seeders recalculate the hash tree, this extension is incompatible with the proposed HTTP Seeding extensions in BEP 17 [4] and 19 [5] .

I think we need to fix this to some extent. I would suggest adding a list of hashes that are somewhere down in the tree, in a way that they roughly align with files, completely falling inside them if the files are large enough. For single-file torrents nothing would change. But multi-file torrents need a way for partial reconstruction that allows clients to generate Tr_hashpiece

Since each file in a torrent comes with its own footprint (path/length) scaling the number of included tree nodes with the number of files should only inflate the metadata size by a constant factor.

The overhead could be squeezed even further by by lumping really small files together, e.g. if a torrent contains one large video file and a directory of hundreds of small image files a user is likely either to have all or none of the images so covering more than one of them in a subhash shouldn't be a big problem for reconstruction either.

Alternatively we could add some subtree_request and subtree_piece so clients that need it could grab a local copy of the whole leaf hash set.

poke @arvidn

The text was updated successfully, but these errors were encountered:

arvidn · 2016-05-27T02:49:47Z

you're saying that files should not only be piece-aligned, but aligned to a power-of-two number of piece, in order to have a node in the tree that represents that file and nothing else?

It would (potetially) create large holes in the piece space.

Maybe I'm misunderstanding you, would this be to support BEP38 across merkle-torrents and non-merkle torrents? or between two merkle-torrents?

In general, I think it was a mistake to have a single root hash for multi-file torrents, perhaps there's some tweak to get some of the benefits of both.

or maybe, given that libtorrent and tribler (afaik) are the only implementations of merkle tree torrents, we could tweak the definition.

the8472 · 2016-05-27T06:56:40Z

you're saying that files should not only be piece-aligned, but aligned to a power-of-two number of piece, in order to have a node in the tree that represents that file and nothing else?

In general, I think it was a mistake to have a single root hash for multi-file torrents, perhaps there's some tweak to get some of the benefits of both.

No, while those things would be helpful they're not sufficient to cover the following scenarios that work today:

MTT failure mode A:

You have a multi-GB file.
It gets slightly corrupted (bitflips, in-place edits).
You want to repair it by shoving it back into the torrent client.

With regular torrents this works fine, since only a single piece will fail. With MTTs it won't because hashing is an all-or-nothing thing if you only have the root hash.

MTT failure mode B:

You have http://www.nyaa.se/?page=view&tid=451691
You want to drop http://www.nyaa.se/?page=view&tid=720034 in its place or let the client reuse data based on filesizes + piece-probing

Again, with regular torrents this works perfectly fine. With MTTs it's impossible to reuse anything since you lack subhashes.

MTT failure mode C:

Validating range downloads from a webseed.

All these scenarios either require or benefit from having hashes at a sub-file granularity available at the start of a download.

the8472 · 2016-05-27T07:12:27Z

If the goal is to have 16KiB resolution then we don't actually need root-hash-only MTTs.

After all the status quo is we're mostly fine with a hash covering something between 256KiB - 4MiB.

So what if we did 16KiB pieces, but kept a flat hash array of merkle hashes N~=8 layers up in the tree, effectively providing a subroot every 4MiB? N can be tweaked of course.

That way anything that needs to reconstruct/match partial files can operate at 4MiB boundaries, which we already know is good enough today. While request/piece/have could operate at 16KiB granularity.

We could still include the root to be backwards-compatible with current implementations.

If early startup performance is a really big concern for some implementation it could even start the payload download early once it receives the root hash in the metadata transfer (assuming a streaming bdecoder) before it decodes the flat subhash array at the small risk that the hash root hash is bogus.

the8472 · 2016-05-27T07:42:31Z

Also poke @arno481

ssiloti · 2016-05-27T17:39:42Z

I'm in favor of a protocol extension to support downloading the leaf hashes. To me the most important feature of merkle torrents is their small metadata so I think we should preserve that as much as possible.

I have a proposal I'm working on to support updating merkle torrents with piece size granularity for which I was already planning on introducing such a protocol extension. The proposal is intended to support applications like streaming database access so having a low time-to-first-piece is critical. I want to be able to defer downloading the leaf hashes until there are no high-priority piece requests outstanding.

A protocol extension to download the leaf hashes would solve all of the failure modes and provide flexibility to applications which might want control over when to spend bandwidth downloading fine-grained piece hashes. I think it would be simpler to implement than dealing with sub-roots too.

the8472 · 2016-05-27T17:59:42Z

I'm a tiny bit skeptical because that requires work by everyone instead of just the one who creates the torrent.

If we don't put additional hashes inside the info dictionary then we should at least specify a way how they can be stored in the root dictionary so files can be shipped in a commonly understood "fat" .torrent format. And it would also serve as a way to locally store the hashes directly in the torrent, which also helps when you migrate torrent client state between computers or something like that.

The proposal is intended to support applications like streaming database access

Sounds crazy, but I guess so are 1TB torrents with 50k files.

the8472 · 2016-05-27T18:16:14Z

A protocol extension to download the leaf hashes would solve all of the failure modes

As proposed it would not solve the webseed case in the scenario where there are only few / incomplete peers in the swarm which may not have a complete set of hashes to exchange either and where the webseed is supposed to act as fallback or even sole initial seed.

So i think a .torrent inclusion should be encouraged. But i think including the leaf hashes themselves may not always be good because it would bloat the file beyond acceptable levels (same problem as we have today). So some higher-level inclusion will still be necessary.

Something like {info: {...}, merklenodes:{height: N, hashes: "..."}} where clients could always swap in a lower height if they can obtain it while increasing height is a lossy but space-saving op.

the8472 · 2016-05-27T18:36:33Z

Alternatively we could just put those things in the info dictionary by default but could let clients opt-out of that if they need that super fast streaming behavior.

Or as I mentioned earlier a clever implementation of the metadata fetch could just skip over the subhashes and only grab the root hash. the downside there would be that the info-hash -> root hash relation could not be established immediately. But I think if you're loading databases from bittorrent that wouldn't be a big issue anyway.

arvidn · 2016-05-27T20:59:50Z

having a common format for including a hash-level in the .torrent forma (and perhaps even the complete tree) is a good idea in general I think. The problem with having it in the info-dictionary is that it becomes mandatory. But outside of the info-dictionary (it can still be verified against the root hash inside the info-dict) would be a good fit I think. Having clients update their .torrent files as they download would probably be a good idea as well, and agreeing on a common format would be ideal I think.

ssiloti · 2016-05-27T21:01:12Z

Defining a standard way of including the hashes at N height in the root of the torrent seems reasonable. I want to keep that out of the info-dict though. Being able to tie the info-hash to the root hash is important, otherwise the chain of authentication is broken and you can't trust the piece data you're receiving. I suppose I could bypass the info-hash and authenticate the root hash directly, but I'd rather not.

I created a read-only proof-of-concept of streaming a sqlite database over bittorrent a while ago. It works pretty well even with an unmodified libtorrent. Typical query latency with a multi-GB database is a few seconds and that could be improved quite a bit with some optimization. You just have to be sure to avoid sequential scans of large tables.

https://github.com/bittorrent/sqltorrent

the8472 · 2016-05-27T21:59:42Z

I think I have just devised a scheme that lets us embed things in the info dictionary and then later omit them or send them out of order while maintaining the full verification chain.

bencode the whole thing, leave a placeholder string somewhere after the part to omit. the chunk consists of <Left, ToOmit, Right>
h-left = sha1(Left)
h-state = sha1_pause(Left, ToOmit), extract sha1 internal state
put h-left and h-state in the placeholder position Right -> RightEx
infohash = sha1(<Left, ToOmit, RightEx>)
now we can either send <Left, RightEx> or <Left, ToOmit, RightEx>
the receiver can verify correctness (infohash == sha1_resume(h-state, RightEx) && h-left == sha1(Left)) || (infohash == sha1(<Left, ToOmit, RightEx>))

If the necessary steps are taken at torrent creation time we can embed the sub-hashes but the metadata-fetching client can skip them and fetch them later, all without protocol extensions.

ssiloti · 2016-05-28T02:28:08Z

That seems like an awfully ugly hack just to avoid defining a couple of extension messages. I don't think crypto libraries typically provide a way of loading a hash's internal state from a buffer so implementations would be required to use a modified SHA1 implementation.

the8472 · 2016-05-28T09:38:06Z

Avoiding those extra messages is not my goal, it's a means to achieve my actual goal, which is ensuring that the hashes get propagated to those who want them by default, i.e. their omission would require extra work and effectively be opt-in.

ssiloti · 2016-05-28T21:51:26Z

I have sympathy for wanting to keep the more common case simple, but in this case I think the solution is worse than the problem. It's far cleaner and easier to request the piece list separately than it is to implement this crazy hash extension scheme. Transferring a separate piece list is damn near trivial considering that it could re-use most of the code that's already required to implement BEP 9, I don't think it's too much of a burden.

arvidn · 2016-05-28T21:53:00Z

@the8472 could you explain a bit what you mean by "ensuring that the hashes get propagated to those who want them by default". do you mean that it's important that they be part of the metadata you get via the ut_metadata protocol? (which is only the info-dict)

the8472 · 2016-05-28T22:13:52Z

Well, if subhashes are not included in the info dictionary then peers obviously do not get them by default. They have to rely on others to get them.

So those others have to
a) support the new piece list extension
b) already have obtained the hashes themselves or be able to synthesize them from payload data

This looks like a potential game of telephone to me where something similar to the missing-piece bottleneck that can happen in small swarms could happen.

By making piece list inclusion the default case that doesn't require protocol additions and clients would have to intentionally skip would avoid that.

ssiloti · 2016-05-28T22:56:07Z

Personally I'd take having merkle torrents not be usable for a few more years while the protocol extension is rolled out over having to live with some hack for all eternity. This stuff isn't very interesting for the existing bulk file transfer users anyways. I think most of the interest will come from people with novel applications who don't care much about backwards compatibility with conventional torrent clients.

ssiloti · 2016-05-28T22:59:05Z

Also, the lack of support for merkle torrents outside of libtorrent means it would take a long time for them to be adopted by existing users anyways.

the8472 · 2016-05-28T23:49:48Z

Right now MTTs also lack support because they're a neat trick, but have too many downsides. So I would like to avoid bifurcating the protocol landscape into mutually exclusive use-cases.

So do you agree that BEP30 needs improvement and we're just haggling over the details?

ssiloti · 2016-05-29T00:13:23Z

I agree that BEP30 needs improvement. I just don't think the hack to include an optional piece list in the info-dict is worthwhile. The only thing it buys you is better compatibility with BEP 30 implementations which don't support this proposed BEP, but libtorrent is both the only widely deployed implementation of BEP 30 and likely to be the first to get this new functionality.

While new versions of libtorrent might not get rolled out as quickly as we'd like, those legacy BEP30 clients will eventually become a small minority. IMO it would be a shame to still have to live with the hack at that point.

the8472 · 2016-05-29T10:20:19Z

Fine, I'll hold my horses until you got your proposal ready.

arvidn · 2016-05-30T05:35:01Z

by "default", you mean over the metadata transfer protocol, right?

the8472 · 2016-05-30T11:29:53Z

Not really. I mean the metadata exchange obviously is involved, but it's not the only thing. For example websites hosting .torrent files might reconstitute them and strip things from the outer dictionary. I don't think many do, but they could.

My intent of saying "by default" here can be read as follows:

No matter how you get the torrent/metadata, you - or anyone else along the supply chain - will have to do no extra work to get the sub-hashes if they were included by the creator. Getting a sized-reduced version for fast startup is what will require some extra work, i.e. be non-default.

That reduces the risk of the chain getting broken somewhere.

the8472 · 2016-07-21T15:28:28Z

Idea in the opposite direction:

Instead of trying to put the hashes into the info dictionary we could put them into the piece-space itself. That way no new protocol extension or metadata transfer layer is needed since clients already know how to exchange pieces.

The included hashes don't have to cover themselves since the whole tree can be calculated towards the root from those hashes as they occur in the tree in two places.

In addition the data can be included in the root dictionary of the torrent. A client aware of this wouldn't have to store the hash pieces in the filesystem but could retrieve them from the torrent itself or store them there.

ssiloti · 2016-07-22T16:53:24Z

Being able to leverage clients' existing file prioritization features to control how the piece list is downloaded would be really nice. I like it.

arvidn · 2016-07-25T03:31:28Z

it would be a bit weird though, assuming I understand you correctly. Would a client implicitly extend the file list with another file at the end, and implicitly extend the size of the torrent, and serve up the piece hashes when chunks/blocks are requested from past-the-end pieces? The bitfields, have, cancel, request, allow-fast, suggest, etc.. all these messages would need additional special case logic (again, afaics).

It sounds like an implementation could quickly become messy doing this, especially considering interoperability with a client that doesn't do this implicit extension.

On the other hand, it seems pretty convenient to leverage the existing HAVE and BITFIELD messages to tell peers that we have hashes (they imply that already, since having a piece means you have its hash too). It would only require one additional request/response message pair to request hashes.

the8472 · 2016-07-25T10:20:34Z

Well, ~~since it would reshape the hash tree anyway it could only apply to a new MTT format anyway.~~ So one could also make it explicit in the info dictionary by adding the necessary information about which part of the piece space is occupied. Clients could then internally just append a pseudo-file into which the pieces are loaded.

And while we're at it the tree could also be restructured into a tree of per-file trees.

the8472 · 2016-07-25T10:33:44Z

Wait, that was nonsense. The hashes could be added to the piece space without altering the tree since they already are part of the tree. Still, an indicating about the piece space being used could still be added to the info dict, signalled in the ltep handshake or possibly piggback on other incompatible changes.

ssiloti · 2016-07-25T17:43:10Z

Apparently I misunderstood then. I thought the piece hashes would be added to the torrent exactly as if they were an extra file added to the end of the torrent. That would allow legacy MTT clients to distribute the piece hashes without knowing their meaning. Users of legacy clients would just see an extra file in the download folder.

the8472 · 2016-07-25T18:08:04Z

That was my initial idea. Arvid's response made me consider another option.

Including as just another file works, but it makes torrent creation a little more complicated since you have to juggle alignments so the tree can be built and then eat itself. The alignment requirements could be fairly large as arvid notes, powers of two. It would also make all MTTs a multi-file torrent. But it would mostly be a creation-side modification.

Including it as a range in the piece space not covered by the MT needs extra signalling, which violates the spreading-pieces-by-default that I've asked for in previous comments.

Wishlist for MTTv2:

way to transfer leaf or high-depth hashes
a common storage definition for the piece hashes in the torrent root dictionary
per-file trees -> possibly large alignment requirements or unbalanced trees. maybe needs some extra information to reconstruct trees? idk.
a better approach for alignment/padding/piece space holes

The last one could go into a separate BEP that can also be used for normal torrents, but it would interact with the MT hashing.

arvidn · 2016-07-25T23:27:05Z

With a from-scratch merkle-tree-torrent implementation, per-file roots would be great I think. It would also solve a long-standing issue with people liking to identify identical files from different torrents, as well as making pieces file-aligned (just like on the filesystem).

the8472 · 2017-02-04T11:07:52Z

Another bullet point for the wishlist: Represent files as a tree of directories and files instead of a flat list. It would eliminate the path prefix redundancy and make the single/multi file distinction go away.

banksJeremy · 2017-02-04T15:13:29Z

Hello. I'm a novice, but after reading this discussion I had a related idea I wanted to mention. (I apologize if this is the wrong forum or my participation is inappropriate.) This would be another way to get the pieces data given a merkle torrent, when needed to verify data from other source like web seeds.

When generating a BEP-30 merkle torrent info dictionary, the client should also generate the equivalent traditional (non-merkle) torrent info dictionary internally (using the same piece size), and add the infohash of this traditional torrent to the merkle torrent info dictionary (perhaps under the key traditional infohash).

This traditional torrent would also be calculated and saved internally when beginning to seed one of these torrents. Seeding peers would accept incoming connections for either the merkle torrent infohash, or the traditional torrent infohash.

Peers leeching the merkle torrent, who realize they need the full metadata for some reason, could download the full metadata (or necessary pieces) by connecting using the traditional torrent infohash and performing ordinary BEP 9 metadata exchange. (We would add an extension flag so that seeds could indicate their support for this proposal, so that leeches know they can connect with other infohash.) Once peers have the full metadata, they could prefer to continue downloading from peers using the traditional torrent protocol because they no longer required the overhead of the merkle torrent messages.

Getting further afield, peers implementing this proposal could also generate and seed the corresponding merkle torrent for every traditional torrent, to bootstrap a merkle swarm that later downloaders could use instead. (Albeit without the very-small piece sizes we'd like. Maybe still a meaningful benefit?) We could extend the magnet link format to include both infohashes such that existing clients would only see the traditional torrent (backwards-compatibility) but newer clients would see and verify both.

I'll be working on a slightly more detailed description here, unless I realize it's a bad idea.

The proposals elsewhere in this thread for more significant changes to the Merkle torrent structure would work better than what I described here. However, perhaps a more conservative change like this would be more practical to have adopted, and help encourage a transition to merkle torrents? I'm planning to implement something like this in a client I'm working on with a friend, but we have a lot of other functionality to build out first, so we won't have a reference implementation to present any time soon.

Thanks for reading.

the8472 · 2017-02-04T15:38:22Z

I was mulling over a similar idea, but to transform between single-root and per-file root representation.

I think this could be extended further to incorporate some of the suggested changes here into the transformation rules. E.g. when projecting something into the piece space it could be represented as regular file in the traditional torrent for example.

the8472 · 2017-02-09T20:29:57Z

Another idea: We can include per-file MTT roots in the file list, similar to BEP47. That way we don't have to build a tree-of-trees and it would remain backwards-compatible. It would also alleviate the need for large alignment gaps since the per-file roots could be calculated as-if they were single-file.

So I think we have a bunch of improvements that could be added in a backwards-compatible way:

make the the whole hash tree from the piece space available via request/piece messages. support can be signaled in the extension handshake. some clever layout or (ab)use of the tr_hashpiece message could also allow it random-access to that list with full validation of that subset to the root.
define a canonical format for embedding the pieces or part of the tree in the outer torrent dictionary
define canonical transformation rules between MTT/flat torrents as @jeremyBanks suggests
embed per-file roots in multi-file lists
maybe allow clients to negotiate that they can use non-tr_hashpiece messages. but that may be unecessary if we have the first two points and clients can reconstruct the hash chain from partial data

It doesn't get us all the way where I want to be, but if that could be the default behavior it might be good enough.

the8472 mentioned this issue Jul 23, 2016

DHT feeds #19

Closed

the8472 mentioned this issue Feb 10, 2017

Encrypted torrents #20

Closed

This was referenced Feb 23, 2017

Transitioning to stronger hash function #58

Closed

Draft: base protocol with merkle trees and new hash algorithms #59

Merged

synctext mentioned this issue Apr 11, 2021

lessons learned over past 15 years Tribler/tribler#143

Open

BEP 30 (merkle-tree) improvements #29

BEP 30 (merkle-tree) improvements #29

Comments

the8472 commented May 27, 2016

arvidn commented May 27, 2016

the8472 commented May 27, 2016

the8472 commented May 27, 2016

the8472 commented May 27, 2016

ssiloti commented May 27, 2016

the8472 commented May 27, 2016

the8472 commented May 27, 2016

the8472 commented May 27, 2016

arvidn commented May 27, 2016

ssiloti commented May 27, 2016

the8472 commented May 27, 2016 • edited Loading

ssiloti commented May 28, 2016

the8472 commented May 28, 2016 • edited Loading

ssiloti commented May 28, 2016

arvidn commented May 28, 2016

the8472 commented May 28, 2016

ssiloti commented May 28, 2016

ssiloti commented May 28, 2016

the8472 commented May 28, 2016

ssiloti commented May 29, 2016

the8472 commented May 29, 2016

arvidn commented May 30, 2016

the8472 commented May 30, 2016

the8472 commented Jul 21, 2016

ssiloti commented Jul 22, 2016

arvidn commented Jul 25, 2016 • edited Loading

the8472 commented Jul 25, 2016 • edited Loading

the8472 commented Jul 25, 2016

ssiloti commented Jul 25, 2016

the8472 commented Jul 25, 2016 • edited Loading

arvidn commented Jul 25, 2016

the8472 commented Feb 4, 2017

banksJeremy commented Feb 4, 2017 • edited Loading

the8472 commented Feb 4, 2017

the8472 commented Feb 9, 2017

the8472 commented May 27, 2016 •

edited

Loading

the8472 commented May 28, 2016 •

edited

Loading

arvidn commented Jul 25, 2016 •

edited

Loading

the8472 commented Jul 25, 2016 •

edited

Loading

the8472 commented Jul 25, 2016 •

edited

Loading

banksJeremy commented Feb 4, 2017 •

edited

Loading