Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bittorrent v2 #2197

Closed
ssiloti opened this issue Jul 31, 2017 · 24 comments
Closed

Bittorrent v2 #2197

ssiloti opened this issue Jul 31, 2017 · 24 comments

Comments

@ssiloti
Copy link
Collaborator

ssiloti commented Jul 31, 2017

I’m opening this issue as a place to discuss and coordinate the implementation of bittorrent v2. A draft spec of the bittorrent v2 protocol has been published as BEP 52.

The first thing I plan to work on is support for creating and loading v2 metadata. This will mainly involve extending create_torrent, torrent_info, and file_storage to support both v1 and v2 metadata.

I predict the most complex and invasive changes will be to support hybrid torrents in torrent. I think we’ll want to split out part of torrent into a swarm class. Then torrent can hold both a v1 and v2 swarm with shared data like piece_picker remaining in torrent.

Should we keep support for BEP 30 merkle tree torrents? BEP 52 effectively renders BEP 30 obsolete, so unless there are some existing users of BEP 30 we can probably drop it and simplify the code.

Obviously this isn’t going into the 1.2 release, so we’ll want to keep bittorrent v2 work on a feature branch at least until the RC_1_2 branch is created.

@arvidn
Copy link
Owner

arvidn commented Aug 1, 2017

I agree that the BEP30 merkle tree does not need to be supported with bittorrent v2.

it sounds like a reasonable approach. I also have a new-disk-io branch which basically revamps the disk I/O (basically re-implements everything behind the disk_interface) to use memory mapped files (on 64 bit systems that supports mmap)

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 7, 2017

Another question of keep or discard: The create_torrent::optimize_alignment option is incompatible with v2 metadata so I'd like to deprecate it. This also raises the larger question of whether we should continue to support generating v1 only metadata at all. For the vast majority of people there's no reason not to generate hybrid metadata. There may be someone who really cares about the size of the torrent files, but on-the-other-hand we really want to push people to generate v2 metadata as it's key to enabling the transition to the v2 protocol.

@arvidn
Copy link
Owner

arvidn commented Aug 7, 2017

right, the reason optimize_alignment is incompatible is because in v2 all files are aligned and "tail-padded" (i.e. no files ever share a piece, regardless of how small the file is).

v2 basically requires the pad_file_limit == 0, alignment == , tail_padding = true.

The optimize_alignment flag is still required to inject the pad files at all right now, so v2 semantics is similar to that, except the "pad files" are implied

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 7, 2017

I'm only talking about dropping support for generating v1 only metadata. By default libtorrent will generate hybrid metadata which will work with both v1 and v2 clients.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 7, 2017

BEP 52 defines the metadata in a way which allows a torrent to have both v1 and v2 metadata in the same info-dict. So you can generate a torrent file which has only v1 keys, only v2 keys, or both. Including both v1 and v2 keys will be the default for the foreseeable future, with v2 only as an option for users who don't care about backwards compatibility.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 7, 2017

I'm not sure what you're referring to taking 10 years. BEP 52 was published a few months ago. The main impetus behind BEP 52 is improved security by changing the hash function from SHA1 to SHA256. This change wasn't seen as urgent until Google published the first SHA1 collision earlier this year.

BEP 52 requires a bit more than BEP 47 pad files. The files also need to be sorted by path.

@the8472
Copy link

the8472 commented Aug 7, 2017

@Col-blimp you can read the background in bittorrent/bittorrent.org#58 and bittorrent/bittorrent.org#59

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 8, 2017

BEP 52 is a modification of BEP 3 so it inherited BEP 3's creation date.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 29, 2017

For those who want to follow along, I've put up a work-in-progress branch. Currently it just has support for generating hybrid torrent files. It's based on arvid's new-disk-io branch because I anticipate that branch will be merged to master before v2 and I'd rather not implement the disk I/O code twice.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Sep 25, 2017

I'm planning on dropping support for generating and parsing torrent files which have a file with the same name as a directory. This is kind of a pain to support in the v2 parsing code and I don't see a good reason to continue support for it. AFAIK most (all?) filesystems forbid such a conflict.

@the8472
Copy link

the8472 commented Sep 25, 2017

This is kind of a pain to support in the v2 parsing code

The v2 spec forbids it anyway.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Sep 25, 2017

Does it? I don't see any language which explicitly forbids it. The file tree structure can certainly encode such a conflict by placing an empty dict key among a directory's subordinate path elements.

@the8472
Copy link

the8472 commented Sep 25, 2017

length
Length of the file in bytes. Presence of this field indicates that the dictionary describes a file, not a directory. Which means it must not have any sibling entries.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Sep 27, 2017

I'm planning on restricting the torrent_info::remap_files feature so that the new files must have a size that's a multiple of the piece size, or equal to the remaining size of the torrent. In other words, piece alignment is required and pad files are forbidden. Otherwise this feature would negate much of the simplification we get from requiring piece aligned files in v2.

@arvidn
Copy link
Owner

arvidn commented Sep 27, 2017

I'm not sure remap_files() offers a ton of utility in general, and it's probably not worth spending a lot of effort on supporting v2 torrents. Would it be simpler to just make it work on v1 files? and not at all on v2.

@oleiba
Copy link

oleiba commented Aug 13, 2018

What's the state of the draft? Communication seemed to pause since August 2017.

In particular I'm interested in collision-resistant hash migration and Merkle roots .torrent files (replacement for BEP30).

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 14, 2018

The draft BEP is unchanged. There's an alpha quality implementation at https://github.com/ssiloti/libtorrent/tree/v2

@X-Coder264
Copy link

@ssiloti I have a question regarding your implementation of the v2 spec.

So since hybrid torrents will have two info hashes (a SHA1 for v1 and SHA2-256 for v2) that means there will be two announces to the tracker (one for each info hash). How will the downloaded and uploaded data be announced to the tracker? For example let's say a peer uploaded 100 MB in the v1 swarm and 200 MB in the v2 swarm. Will the announces to the tracker be like /announce?info_hash=<v1_hash>&uploaded=100 MB and /announce?info_hash=<v2_hash>&uploaded=200 MB or are both announces gonna have uploaded=300 MB? Hopefully it's the former (as the latter doesn't make sense and the former is the only one possible if the torrent is v1 only or v2 only).

I'm asking because of this:

Implementations supporting both formats can join both swarms by calculating the new and old infohashes and downloading them to the same storage.

I don't know anything about libtorrent internals nor am I a C++ dev (I just took a quick look at your v2 branch commits), but it seems to me that if the client can join both swarms the tracking of what traffic goes to which swarm/peer gets considerably more complex. Hopefully libtorrent will still be able to track that and send the announce requests properly.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 15, 2018

Right now both announces will report 300 MB because v1 and v2 peers share the same torrent and thus the same stat object.

As you say, keeping separate stats would add significant complexity. Keeping separate counts of corrupt bytes would be particularly troublesome. Ideally trackers which care about these numbers would gain awareness that the v1 and v2 hashes refer to the same torrent, but I suspect that's not going to happen so we're probably going to have to take on the extra complexity.

@X-Coder264
Copy link

X-Coder264 commented Aug 15, 2018

I was afraid that'd be the answer I'd receive. Actually, I see now that you've already written this in the first post (which I read a long time ago and forgot about it).

I predict the most complex and invasive changes will be to support hybrid torrents in torrent. I think we’ll want to split out part of torrent into a swarm class. Then torrent can hold both a v1 and v2 swarm with shared data like piece_picker remaining in torrent.

It isn't a problem for a tracker to gain awareness that the v1 and v2 hashes refer to the same torrent, but two things:

a) That would make one of the two announces redundant (e.g first announce the tracker is like "OK, cool, your stats have been updated in the database", then the second announce comes and the tracker is again "OK, cool" but the stats have already been updated by the previous announce so we are basically doing nothing here). Also then there's the problem that trackers would have to handle when both of those announce requests get to the server at the same time, which of course also adds complexity to the code.

b) What happens when some client (library) implements the keeping and announcing of separate stats? There's no way for the tracker to behave in two completely different ways (for some clients to just basically "ignore" the second announce while for others having to take into account both announces).

The complete attention of the spec has been given to the client side and to the changes of the .torrent file while none was given to the tracker's side of the story. Stuff like this (how should the client announce when joining both swarms of a hybrid torrent) is something that IMO must be defined in the spec itself. Having some kind of unwritten rules which become the de facto standard somewhere along the road is just bad. It's just a waste of development time (when the trackers need to rewrite stuff to be in line with the client's behavior or vice versa) and also prolongs the adoption of the standard. I don't know where the discussion about the BEP is taking place now (since bittorrent/bittorrent.org#59 was merged), but since both you and @arvidn worked/discussed on that hopefully you can bring this up so that stuff like that will be clearly specified before the BEP status changes from draft to final and accepted version.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 16, 2018

I don't follow what the complexity is that you refer to in paragraph a. Merging announces on multiple infohashes is the same as merging multiple announces on the same infohash. The upload/download stats are cumulative so the tracker takes the maximum of the values it has seen. The tracker doesn't even need to keep track of which infohashes refer to the same torrent, it can key off of the peer_id which will be the same for both v1 and v2 announces.

@X-Coder264
Copy link

Actually you are right, forget what I said about the complexity in paragraph a, it's like I completely forgot that stats are cumulative while I was writing that 😛

btw, I've opened bittorrent/bittorrent.org#87 so that this can be further addressed there.

@ssiloti
Copy link
Collaborator Author

ssiloti commented May 28, 2019

I've updated the v2 branch in my repo with a heavily squashed and cleaned up patch set and rebased on the current master.

@ssiloti
Copy link
Collaborator Author

ssiloti commented Aug 10, 2019

Protocol v2 support has landed in master! See #3873

@ssiloti ssiloti closed this as completed Aug 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@ssiloti @arvidn @the8472 @X-Coder264 @oleiba and others