Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when creating and seeding large torrents via webtorrent #752

Closed
marcovidonis opened this issue May 26, 2022 · 13 comments
Closed

Issue when creating and seeding large torrents via webtorrent #752

marcovidonis opened this issue May 26, 2022 · 13 comments
Assignees
Labels

Comments

@marcovidonis
Copy link
Collaborator

When I create a torrent and seed it, I run into the following problems on the download side:

  • if the download client is a browser running webtorrent, the connection to the seeder is established, but file transfer never starts;
  • if the download client is another go module based on this project, the code runs through a long list of EOF errors, likely one for each file piece, before download starts (see error running handshook webrtc conn #716). For a 4GB folder, this can take more than 30 minutes.

On the other hand, if I create a torrent for the same file from in a browser-based webtorrent client such as instant.io and seed it, and then download it from my go module, the download starts immediately and I get no EOF errors.

I've noticed some differences in the torrent file being created with my code compared to the one created for the same file with instant.io: the infoHash is different, and the number of pieces is higher (for a 1.1 GB file I got 1249 pieces from instant.io and 4994 from my go module). I think these differences might point to the reason why seeding from my module seems to be incompatible with the browser client, and why download start is so inefficient in my go module.

Any idea how I can solve these issues?

Here's an extract of the code for my seeder client, based on your demo apps:

func serve(folderPath string, announceList [][]string) error {
	cfg := torrent.NewDefaultClientConfig()
	cfg.Seed = true
	cfg.NoDHT = true
	cfg.DisableTCP = true
	cfg.DisableUTP = true
	cfg.DisableAggressiveUpload = false
	cfg.DisableWebtorrent = false
	cfg.DisableWebseeds = false
	cl, err := torrent.NewClient(cfg)
	if err != nil {
		return fmt.Errorf("new torrent client: %w", err)
	}

	defer cl.Close()
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		cl.WriteStatus(w)
	})

	files, err := os.ReadDir(folderPath)
	if err != nil {
		return fmt.Errorf("error reading %q: %v", folderPath, err)
	}
	for _, fi := range files {
		fileName := fi.Name()
		go createAndServeTorrent(cl, folderPath+fileName, announceList)
	}
	select {}
}

func createAndServeTorrent(cl *torrent.Client, path string, announceList [][]string) error {
	info := metainfo.Info{
		PieceLength: 1 << 18,
	}
	err := info.BuildFromFilePath(path)
	if err != nil {
		return fmt.Errorf("building info from path %q: %w", path, err)
	}

	mi := metainfo.MetaInfo{
		InfoBytes:    bencode.MustMarshal(info),
		AnnounceList: announceList,
	}
	pc, err := storage.NewDefaultPieceCompletionForDir(".")
	if err != nil {
		return fmt.Errorf("new piece completion: %w", err)
	}
	defer pc.Close()
	ih := mi.HashInfoBytes()
	to, _ := cl.AddTorrentOpt(torrent.AddTorrentOpts{
		InfoHash: ih,
		Storage: storage.NewFileOpts(storage.NewFileClientOpts{
			ClientBaseDir: path,
			FilePathMaker: func(opts storage.FilePathMakerOpts) string {
				return filepath.Join(opts.File.Path...)
			},
			TorrentDirMaker: nil,
			PieceCompletion: pc,
		}),
	})

	defer to.Drop()
	err = to.MergeSpec(&torrent.TorrentSpec{
		InfoBytes: mi.InfoBytes,
		Trackers:  mi.AnnounceList,
	})
	if err != nil {
		return fmt.Errorf("setting trackers: %w", err)
	}

	<-to.GotInfo()

	magnet := mi.Magnet(nil, nil)
	fmt.Printf("%v -- %v\n\n", to.Name(), magnet.String())

	select {}
}

And here's an extract of my download client:

func main() {
	log.SetFlags(log.Flags() | log.Lshortfile)
	var args struct {
		MagnetLink string `name:"m" help:"magnet link URI"`
		tagflag.StartPos
	}
	tagflag.Parse(&args, tagflag.Description("Downloads file from magnet link"))

	clientConfig := torrent.NewDefaultClientConfig()
	clientConfig.DataDir = "./Downloads"
	clientConfig.NoDHT = true
	clientConfig.DisableTCP = true
	clientConfig.DisableUTP = true
	clientConfig.DisableWebtorrent = false
	clientConfig.DisableWebseeds = false

	c, _ := torrent.NewClient(clientConfig)
	defer c.Close()
	t, err := c.AddMagnet(args.MagnetLink)
	if err != nil {
		log.Printf("Error adding magnet link: %v", err)
		os.Exit(1)
	}

	<-t.GotInfo()
	start := time.Now()
	t.DownloadAll()
	c.WaitAll()
}

I'm using github.com/anacrolix/torrent v1.43.1

@marcovidonis
Copy link
Collaborator Author

I've done some digging and I've realised that there's room for optimisation when defining the piece length and therefore the number of pieces. I took inspiration from the Taipei-Torrent project, which is another Go-based BitTorrent client.

When running my example above, I got the same number of pieces and performance as when seeding from the browser client.

This didn't solve the compatibility issue when downloading from the browser client, though.

@anacrolix
Copy link
Owner

@marcovidonis

if the download client is a browser running webtorrent, the connection to the seeder is established, but file transfer never starts;

This upload directly I haven't tested extensively. I will try to replicate this myself. The reverse direction, where webtorrent is seeding has been tested much more thoroughly, as you note later.

if the download client is another go module based on this project, the code runs through a long list of EOF errors, likely one for each file piece, before download starts (see #716). For a 4GB folder, this can take more than 30 minutes.

Could you provide a sample of these errors? I want to differentiate peer connection errors as opposed to storage errors. Following up from reading your code: It's not clear if this implementation, when seeding, is aware that the pieces are complete: You may need to force the client to verify the data is correct/complete. The EOF errors you mention may be an issue with locating the data, I wonder if the info generated has the wrong paths for the files, that would cause this. I see you have custom file storage opts, the final path may be incorrect and the client isn't locating the expected data.

I've done some digging and I've realised that there's room for optimisation when defining the piece length and therefore the number of pieces. I took inspiration from the Taipei-Torrent project, which is another Go-based BitTorrent client.

This didn't solve the compatibility issue when downloading from the browser client, though.

Are you interested in submitting this optimization? I don't believe this implementation provides any automatic piece length optimization that many other projects probably do. As you mention, this should absolutely no effect on compatibility: Whatever piece length is specified in the info in use should always apply.

Thank you very much for the detailed issue reports and comments!

@anacrolix
Copy link
Owner

@marcovidonis I tested uploading from this implementation to WebTorrent, no issues. I suspect that there's an issue locating the data with your NewFileClientOpts values.

@marcovidonis
Copy link
Collaborator Author

@anacrolix thanks for your replies. I think I might have solved the issues with NewFileClientOpts: In fact, I'd noticed that .torrent.db files would be created in two different places which might have caused some conflicts. However, this is now fixed and I'm still having issues.

In my setup, I have a 'seeder' client that holds all the pieces, and a 'downloader' client that starts with no pieces. I'm essentially modelling a one-way transfer.
The main problem is when transferring folders with 1000+ files, that is, several thousand pieces. I notice there's a long list of messages of types Extended, HaveAll and Unchoke coming from my seeder, and this goes on for minutes before I get the first Piece message. When seeding the same set of files from a webtorrent browser, I get less than 10 messages before the first Piece. As a result, when seeding from my Go client, it takes much longer than necessary just to start the transfer, even if the transfer itself takes few seconds.
Do you have any idea where this issue might be originated?

I also tested seeding from a Go client and downloading from a browser. It works fine with smaller files with a smaller number of pieces. However, for folders with 1000+ files, I notice that the download starts immediately but then speed drops to 0 at a random point during the transfer (could be 50%, could be 90%...).
Any clue as to why this could be happening? I've got a feeling that addressing the messages issue might solve this one as well.

@anacrolix
Copy link
Owner

@marcovidonis is this one or a small number of torrents with a large number of files, or a lot of torrents? The issue appears when you seed from anacrolix/torrent to another anacrolix/torrent instance? Would you know what protocol they're communicating over (like WebRTC or TCP/uTP)? In the set-up, are the clients on the same network (like localhost)?

The potential message issue you describe is very interesting. Is there some way you could provide the sequence of outbound or inbound messages for a bad connection? Perhaps the logging is deficient here, I could make some improvements to facilitate this better.

My best guesses are some bad behaviour in the upload handlers (not super heavily tested as I mentiond earlier), possibly made worse by very low-latency connections, or WebRTC.

@anacrolix
Copy link
Owner

This looks very similar: #753.

@marcovidonis
Copy link
Collaborator Author

@anacrolix This is a single torrent with a large number of files, seeding from an anacrolix/torrent instance to another anacrolix/torrent instance over WebRTC. All is done on the same network.

I later ran some more tests transferring a single torrent of 4000+ files and 250 MB on the same network, but between different clients. I measured the time between when the download client starts and when the torrent info is available: This is kind of a waiting time that I'm trying to keep under control.
This is what I got:

From To Waiting time
anacrolix/torrent anacrolix/torrent 250s
browser client anacrolix/torrent 330s
browser client browser client 10s
anacrolix/torrent browser client 570s

I initially thought I might be hitting a limitation of the BitTorrent protocol (hence why BitTorrents often send a .rar or .zip file), but the fact that WebTorrent in the browser still has a good performance makes me doubt that.

I agree that #753 looks very similar. In fact, in the last case from the table above, the download started after 570 seconds but then speed dropped to 0 after a few seconds.

I'm not sure how to provide the list of messages, but I can look into that. What exactly are you looking for?

@anacrolix
Copy link
Owner

@marcovidonis could you try the master branch? I've pushed a bunch of performance improvements in seeding to Transmission (512KB/s->16MB/s), and some of them seem to apply to WebTorrent too. Related to #753, when it does upload to qBitTorrent, the speed appears very fast now.

I'm looking for signs the protocol is incorrect, but I'm less inclined to think that's what it is now.

The time to torrent info is interesting, is that the waiting time in your table? There was a bug where sending large infos to WebRTC peers would terminate the connection, there was a debug-level error message for it. Here's the fix (it's on master). On second thoughts I suspect that will fix the problem you're seeing. If it was, there would have been a single log message from here about WebRTC connections not supporting large writes. The cause is pion/webrtc#2712, and I improved the workaround.

@marcovidonis
Copy link
Collaborator Author

marcovidonis commented Jun 17, 2022

Yes, the waiting time is the same as time to torrent info.

I've upgraded my dependency to v1.44.0 and I did see a clear improvement in time to torrent info, both when seeding from an anacrolix/torrent client and downloading from a browser client, and vice versa.
I went from 330s to 1.6s in the browser to anacrolix/torrent case!

EDIT: after actually using master rather than v1.44.0, the time to torrent info improved dramatically also when transferring between two anacrolix/torrent clients. It's down to 350 ms now.

@anacrolix
Copy link
Owner

I think we can consider this fixed? If so, I'll tag it for release. The qBitTorrent problem remains, but that's tracked in another issue.

@marcovidonis
Copy link
Collaborator Author

Mostly fixed, but I'm noticing that in roughly 1 out of 4 transfers of the same torrent, the time to torrent info jumps to 40-50s, while it's normally 0.4 s. It seems to happen quite randomly and I'm trying to understand what could be the cause.

@marcovidonis
Copy link
Collaborator Author

Looks like the issue of the time to torrent info occasionally increasing is somehow related to the tracker. I've repeated my tests by running my own private tracker and never saw the problem.

So I'd say we can consider this fixed now. Thanks for your help!

@anacrolix
Copy link
Owner

Fixed in v1.45.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants