Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hign cpu usage of sha1.blockAMD64 #905

Closed
kslr opened this issue Feb 15, 2024 · 8 comments
Closed

hign cpu usage of sha1.blockAMD64 #905

kslr opened this issue Feb 15, 2024 · 8 comments

Comments

@kslr
Copy link

kslr commented Feb 15, 2024

hi,
I'm looking for ways to increase download speeds, using deluge I can get ~150MB or so, but my headless client is only ~60M(cpu 300%+).
With go pprof I've noticed excessive cpu usage and I'd like to know how to optimize that performance. Like tweaking sqlite parameters/tweaking client parameters/using bolt vs nmap?

basic info:
ubuntu 22.04, 16GB memory, 1T nvme, 1G Network, cpu E5-2650(2.00GHz)

simple client code:

cfg := torrent.NewDefaultClientConfig()
cfg.Bep20 = "-TR2770-"
cfg.ExtendedHandshakeClientVersion = "transmission 2.77"
cfg.HTTPUserAgent = "Transmission/2.77"
cfg.EstablishedConnsPerTorrent = 200
cfg.HalfOpenConnsPerTorrent = 100
cfg.TorrentPeersHighWater = 2000
cfg.ListenPort = 0
cfg.MaxUnverifiedBytes = 128 << 20 // 128mb
cfg.DisableAggressiveUpload = true
cfg.DefaultStorage = storage.NewFileByInfoHash("~/Downloads")

test data:
nyaa seeder top 150

pprof fle:
cpu.pprof.zip

Thank you for your help

@anacrolix
Copy link
Owner

Thanks for the thorough information, I'll take a look soon.

@anacrolix
Copy link
Owner

Very interesting CPU profile. I take it your instance is really hauling ass (at least for the anacrolix/torrent implementation), there must be a lot of data going through it. I'm surprised to see hashing be such an issue. It could be possible to use a faster hash for the smart cache, it seems to account for about 60% of the SHA1 hashing overhead. The smartban hash can be anything that's cryptographic, or possibly that can accept a seed or be salted (it just needs to be unguessable by an attacker, it's not critical). I wonder if I should provide the ability to turn off the smart cache, or use a faster hash.

The other thing of note is a non-negligible scheduling overhead. It might take more than a CPU trace to determine if things are optimal there. But certainly the main download path blocker is anything under receiveChunk so it's best to optimize that. Since piece hashing doesn't block the download path, I think if the smart ban stuff is optimized you will see huge performance gains.

@anacrolix
Copy link
Owner

anacrolix commented Feb 18, 2024

This looks promising

~/ags/torrent % go test -run @ -bench SmartBan
goos: darwin
goarch: arm64
pkg: github.com/anacrolix/torrent
BenchmarkSmartBanRecordBlock/xxhash-10         	  868774	      1433 ns/op	11431.27 MB/s
BenchmarkSmartBanRecordBlock/sha1-10           	  172546	      7025 ns/op	2332.19 MB/s
PASS
ok  	github.com/anacrolix/torrent	2.588s

@anacrolix
Copy link
Owner

@kslr Please try with https://github.com/anacrolix/torrent/tree/issue-905. It should be about 3x faster.

@anacrolix
Copy link
Owner

For context it looks like I forgot to include the smart ban block recording in the primary downloading benchmark.

@kslr
Copy link
Author

kslr commented Feb 18, 2024

Very nice work, now has a crazy speed (~250MB) with a cpu ~200, which I feel is more than enough, mainly because of my weaker CPU.

Allowing the smart ban to be disabled I think would cause other problems and the boost from continuing to optimize HASH probably wouldn't be that great, consider keeping it simple.

@anacrolix
Copy link
Owner

Thank you! I will merge the performance boost to main and release.

@anacrolix
Copy link
Owner

Fixed in v1.54.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants