Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm merging #829

Closed
ericzutter opened this issue Jun 17, 2016 · 27 comments
Closed

Swarm merging #829

ericzutter opened this issue Jun 17, 2016 · 27 comments
Labels

Comments

@ericzutter
Copy link

Can you add the feature "swarm merging" ?

Swarm merging automatically detects when two or more of your incomplete downloads share one or more files of identical size and will attempt to merge the torrent swarms to download the file faster or, possibly, complete an existing file with bad availability.

You can find more info about "swarm merging" on http://wiki.vuze.com/w/Swarm_Merging

@arvidn
Copy link
Owner

arvidn commented Jun 17, 2016

does this overlap with the mutable torrent feature in libtorrent?

@ericzutter
Copy link
Author

I looked at the description of "mutable torrents" at http://bittorrent.org/beps/bep_0038.html There seems to be some overlap.

I have some questions about BEP 38 :

  1. BEP 38 says "Currently, when downloading a torrent, users must download all files within that torrent regardless of whether or not they have already downloaded them using a different torrent." In a BitTorrent client you can choose which files from a torrent you want to download, so I don't understand why BEP 38 says that you "must download ALL files".
  2. BEP 38 says "File Name Comparisons: The client should attempt to re-use the files either in whole or in part.". Does this mean that when 2 torrents contain the same file that it will merge the peers/seeds from the 2 torrents to download the file ?
  3. BEP 38 says "Hard Links : the client should use a hard link on systems that support it rather than create a copy of the file". On Windows 7, when 2 torrents contain the same file, will the file be stored 1 time or 2 times on disk ? Will it use a Windows shortcut as "hard link" ?
  4. Are there any settings in http://libtorrent.org/reference-Settings.html that are related to "mutable torrents" ?

I am using BitTorrent client Deluge on Windows 7 with plugin Ltconfig that let you modify Libtorrent settings. Last version of Deluge 1.3.12 uses Libtorrent 1.0.6. When Deluge will be updated to Libtorrent 1.1.0, I will be able to test the feature "mutable torrents".

@Falcosc
Copy link
Contributor

Falcosc commented Dec 29, 2016

@ericzutter did you test it with libtorent 1.1?
Just make sure that libtorrent is not build with TORRENT_DISABLE_MUTABLE_TORRENTS
Then it should work.

@arvidn
Copy link
Owner

arvidn commented Jan 11, 2017

  1. it's contrasting downloading the file vs. re-using a previously downloaded identical copy. Under the assumption that you want all files, you must download them.
  2. yes, under certain conditions.
    a. the files must be piece aligned (i.e. use pad-files)
    b. the two torrent files must use the same piece size
    c. at least one torrent must mention the other as "similar", or both torrents must be part of the same collection
  3. NTFS supports hard links, and libtorrent will use them on that filesystem.
  4. I can't recall any settings related to mutable torrents off hand

@26486836
Copy link

There's some overlap, yes.
An easy way to add support for this feature would be to check if there are any other torrents containing files with the same file size already present in the list of torrents. If so, the equivalent piece hashes are calculated. (i.e. use offset and piece size from torrentA:fileA to calculate hashes for torrentB:fileB). If they match, the torrents are considered the same until proven otherwise (e.g. a piece is found the hash of which is valid according to torrent A but not torrent B) or the download is complete. But is this outside the scope of libtorrent? This behavior is not present in any specification that I know of.

@arvidn
Copy link
Owner

arvidn commented Jun 20, 2017

supporting torrents that do not have aligned pieces is complicated and error prone. I'm not excited to have that code in libtorrent (let along write it and tests).

I recognize uTorrent does support this in its mutable torrent implementation, but I'm not maintaining uTorrent (anymore) :)

Perhaps a better approach would be to implement support for BitTorrent V2, where pieces are always file-aligned, and I believe pieces will likely be uniform everywhere as well, and there will be file hashes (each file has a root hash).

Once this is supported and in wide use, mutable torrents, and sharing files between torrents, will be a lot simpler.

Now, this ticket, iiuc, is also asking for downloading pieces into a single file from multiple swarms simultaneously. This is significantly more difficult, since that also requires piece-pickers across torrents to share state in some way,

@26486836
Copy link

No, that depends on how you do it. They could be aligned to 16KiB pieces in the "guest torrent", any non-overlapping portion would simply be discarded and the blocks would be treated as incomplete. It would be quite a small loss anyway, <1%, and it would only be temporary for as long as only half the 16KiB chunk is downloaded (i.e. until the missing piece in the "host torrent" is downloaded)

But then it wouldn't work for legacy torrents, and they're the ones in the greatest need of this feature due to their lack of seeders due to their older age.

No, that's a possible improvement, but it's not strictly needed. You could just have it connect to localhost and seed a "translated" piece from torrent A to torrent B. As soon as both downloads are complete and verified, they can be hardlinked, so the file size downsides aren't that big in practice. Also consider that only one file needs to be downloaded in practice, since then you can either prove that torrentA:fileA = torrentB:fileB in which case both are done, or torrentA:fileA ≠ torrentB:fileB in which case the link between them can be removed (or attempted to salvage as per BEP0038)

@arvidn
Copy link
Owner

arvidn commented Jun 20, 2017

I would assume that you don't want to download the same bytes from two separate peers. if you don't share state in the piece pickers, a piece being downloaded in one torrent will still be eligible for picking in the other. That would cause some double-downloading.

@26486836
Copy link

No, only in rare cases. As soon as 2 adjacent pieces are downloaded, the equivalent piece can be inserted into the other torrent, and also the two "half pieces" in the form of 16KiB blocks, treating the neighbor pieces as uncompletely downloaded. So only in rare cases when the two pieces are downloading at the exact same time. While this is unfortunate, it's not the end of the world and can be considered a separate feature request. It's not that big of a spill.

@arvidn
Copy link
Owner

arvidn commented Jun 20, 2017

well, it's rare until it isn't.

How would you quantify the rarity? One way would be to remove the "picked" state from the piece picker and only record the "finished" state once a block or piece is downloaded. This would make all peers in a swarm potentially pick the same piece. This would overestimate the collisions since all peers experience the same piece-rarity and will pick from pieces in the same order. But imagine 100 swarms that share a file, and each swarm has one peer.

@26486836
Copy link

One piece will finish before the other. Say piece A and piece B are picked at the same time. Piece A finishes first. Then piece B is marked as completed and copied over. There is no need to synchronize any state other than that. It can simply act as a node in the other swarm and connect to localhost to feed it the piece it desires, won't even need to change any states "by force"

Piece A and piece B would need to start and complete downloading at the exact same time for anything to be wasted. And if they do, why is it such a huge issue? Most pieces are around 4 mb, this is hardly a great loss.

@arvidn
Copy link
Owner

arvidn commented Jun 21, 2017

The problem isn't piece A and piece B being picked at the same time, the problem is that piece A is picked twice from different peers. You seem to assert that this is a rare. Within a single swarm the piece picker establishes a single-, total ranking of pieces. Peers then pick pieces in priority order. Naturally every peer will tend to pick the same piece.

To illustrate a scenario where not sharing information would be a serious problem, consider the cases where you have two swarms, sharing a file you're trying to download. In each swarm, you're connected to one other peer (which happens to be the same peer, downloading the same file at the same time as you).

That peer is also connected to a seed. Every piece that peer downloads will be announced to you in both swarms. In both of those swarms, there will only be a single piece pickable at any given time (i.e. the same piece). You will end up consistently picking the same piece from both swarms as the download progresses, wasting 50% of your download.

What I meant by "it's rare until it isn't" is exactly this. In most scenarios the waste may be negligible, but every now and then it will cause catastrophic behavior.

@26486836
Copy link

One implies the other. If piece A is downloaded before piece B, it will be "translated" and copied into two half pieces for torrent B. Thus, the waste only occurs if they're downloaded at the exact same time.

And that scenario implies they're downloaded at the same time, like I said. For this to happen, the peer you're connected to needs to have upload speed => their download speed, and download speed <= your download speed. (else they will build up a "buffer" and seed different pieces to the two swarms)
But even in this case, it would need to split the upload 50/50 between the two swarms/upload slots, which isn't common behavior. It wants to complete a piece as soon as possible so it can close the upload slot, and only upload more if it doesn't make anything else upload slower, or am I mistaken?
And it can't increase the upload speed by uploading to you in both swarms at the same time, since the bottleneck is either their upload speed or your download speed.
Also, why are we not connected to the other seeder? We should get their IP via PEX.

I don't see any other scenario in which the availability of two torrents simultaneously creep up from 0.0 to 1.0, involving the same exact two pieces, and you download both of them at the exact same time. So wouldn't it be possible to detect this behavior? Something like dividing the file into n chunks (for example n=ceil(filesize/2*max(piecesize1, piecesize2)), treating these like a series of bits (e.g. 128 bits for 128 chunks), setting all chunks which a piece "touches" to 1, then setting any piece in torrent 2 which also "touches" this chunk as a lower priority? It really just seems like we have a regular race condition on our hands.

@arvidn
Copy link
Owner

arvidn commented Jun 28, 2017

Thus, the waste only occurs if they're downloaded at the exact same time.

I'm not sure what you mean by "exact" here, but downloading a piece is not instantaneous. A piece can take a minute to download, it doesn't have to be particularly exact.

If you want to be convincing in statements about how common scenarios are, you're probably better off collecting and presenting data. Especially if you are to convince me to accept a patch that does this, and incur all that complexity and maintenance, you need good and convincing data.

It really just seems like we have a regular race condition on our hands.

Right, it's not an inherent race though, it would just be a race because of a sloppy implementation. My experience from uTorrent is that even the most unlikely race conditions happen frequently, all you need is to deploy to some 100 million users or so.

@26486836
Copy link

26486836 commented Jul 3, 2017

No, but finishing it is. It's either in a finished or an unfinished state. This transition is instant. And you don't need to do it on the level of the entire piece, doing it on the level of the 16KiB blocks is enough. Those don't take minutes to download.

So say half of the data is downloaded twice. It's still better than all the data being downloaded twice, which is what you have right now. And there's not much complexity as I see it, all you need to do is act as a local peer to yourself.

@awiebe
Copy link

awiebe commented Feb 23, 2018

Hi, I know I'm a bit late to the party but it seems there is an unmentioned use of swarm merging here which I'm going to call file escrow. Over the course of some torrent's life it is common for data to be aggregated into larger torrents.

I use torrent files for moving around image data that needs to be aggregated, the use case being that as collections grow, larger and larger aggregate torrents appear which aggregate the sub-collctions.

I'm going to use the usually illegal but common use case of episodes in a TV series to illustrate.

Consider the tree

  • Series X :{S1, S2, S3}
  • S1:{E1.1, E1.2, E1.3, E1.4, E1.5}
  • S2:{E2.1,....}
  • S3:{E3.1,...}
  • E1.1,E1.2,E1.3,...

As episodes are aggregated into seasons, and seasons series into series, suppose you wanted all of series X, which had files that shared MD5s in the subseries and individual episode torrents. If some pieces become unavailable, or if you previously downloaded a subcollection, it would be reasonable to expect libtorrent should be able to report that duplicate data was found, and the client using libtorrent could ask to either copy the existing data into the aggregate download, or combine the individual downloads into the aggregate download. This also helps connect the swarms as the individual collections are deprecated in favour of asking for pieces of the aggregate torrent.

It's also helpful for aggregating rarer works, such as those which have lapsed copyright, but may have been destroyed or difficult to find

Eg.

  • Fragment torrent 1:{1,4,5,7}
  • Fragment torrent 2:{1,2,4,6}
  • New Full torrent = FT1 + FT2

@arvidn
Copy link
Owner

arvidn commented Feb 23, 2018

@awiebe I think mutable torrents is a good fit for that use case. when you create the new torrent, specify that it shares data with the two other ones. A client supporting mutable torrents will then reuse identical files that have already been downloaded. libtorrent will create hard-links to existing files.

@awiebe
Copy link

awiebe commented Feb 24, 2018

@arvidn Ah thank-you, I suppose the issue is that not enough clients have implemented this yet, but it's good to see that libtorrent supports it.

@arvidn
Copy link
Owner

arvidn commented Feb 24, 2018

utorrent also supports it, but I suspect the feature doesn't see a lot of testing. I don't have extensive tests for it in libtorrent.

@mrandreastoth
Copy link

This may be of interest...

#829

@stale
Copy link

stale bot commented Feb 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 29, 2020
@stale stale bot closed this as completed Mar 20, 2020
@Knocks
Copy link

Knocks commented May 26, 2020

BiglyBT has swarm merging implemented correctly, and it works very well. It allows you to download a complete file from two different torrents, neither of which has seeds. If torrent 1 has a peer with 40% of a movie and torrent 2 has a peer with 60% of the same file, swarm merging will allow you not only to download a complete movie but to seed it to all the peers who had an incomplete file prior to you joining. This happens automatically and transparent to the user.

Swarm merging simply does wonders to torrents that may have been considered dead or incomplete for years. It literally revives parts of the internet. This feature is the only reason I use BiglyBT.

@cocokola
Copy link

cocokola commented Aug 5, 2020

So officially this will not be added? Qbittorrent team determined it needed to be part of this module. The previous comment was if there wasn't additional post it would be closed, but there have been two hits at least since. I think many users don't understand and are requesting it at the client used, not here unless the developers of the client point to it and users follow through.

Perhaps looking for hard interest is a bit of a stretch? Wouldn't it help drive all products dependent of this capable if they decide to include? Otherwise it is just a 'well the other team should do this' and some have, but having the innerworkings be shared ensure consistent experience no matter what client is used once the feature is included?

@arvidn
Copy link
Owner

arvidn commented Aug 5, 2020

as with most open source projects (I think) there is more interest in having features be implemented than there is man-power to implement it. This feature would be exciting to implement, but it's not very high on my list of things that need to be done.

Patches are always welcome. This would be a pretty big feature (I think) so it would require a contributor to be willing a fairly long review and integration process. (the webtorrent patch and gnutls patches are good example of this I think).

I don't think a feature like this could be implemented (at least not effectively and efficiently) outside of libtorrent. It requires a fair amount of surgery in the internals, like the piece picker.

@Knocks
Copy link

Knocks commented Aug 5, 2020

As much as I dislike using Vuze-based clients for hogging system resources and depending on Java, I am continuing to use them because swarm merging is such a game changer. I have abandoned my previous favorite client µTorrent because it doesn't have swarm merging but I would really like to go back to using it.

If you guys give this feature a try and see torrents that have been dead for years come alive and all of a sudden have lots of seeds, you will be converted.

@cocokola
Copy link

cocokola commented Aug 5, 2020

I deal with so much manual crap weekly that this feature has the potential to resolve at least half of them based on a manual effort to look into several dozen examples.

This thread is closed, but I would highly encourage the team to put this in the medium scope pile, at the very least, take it out of closed and add it to a waiting list? it amazes me there aren't hack tools that would be able to make this kind of analysis given multiple search links. Also, a tool that would connect to swarms and get real seed counts and score various indexers for their accuracy or their BS.. but I digress.

@zero77
Copy link

zero77 commented Aug 5, 2020

For the people who would like to see this added but, are unable to open a PR.
Has a crowdfund to get this added by an external developer been considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants