Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssh.exe uses 25% of CPU (quad core processor) #1076

Closed
darkdragon-001 opened this issue Sep 12, 2016 · 18 comments
Closed

ssh.exe uses 25% of CPU (quad core processor) #1076

darkdragon-001 opened this issue Sep 12, 2016 · 18 comments

Comments

@darkdragon-001
Copy link

When copying file via rsync -ahvzP --append-verify ./source user@host:/destination, the process ssh.exe consumes 25% of CPU.

@aseering
Copy link
Contributor

Thanks @darkdragon-001 for posting. To clarify -- is this unexpected given your particular network setup? rsync uses ssh to transfer data securely over the network, and ssh does some heavyweight encryption. If the network connection between your machines is fast, and/or your CPU is slow, then it's very possible that ssh won't be able to encrypt data fast enough to keep up. In which case you'll see it using a full CPU core (25% on a quad-core machine).

ssh can be configured to use less CPU-intensive (and theoretically less secure) encryption, if that's what you're looking for.

@darkdragon-001
Copy link
Author

8 MBit/s = 1MB/s upstream, Intel Core 2 Quad Q9550 @ 2.83 GHz

Maybe I setup a computer connected with Gigabit to test maximum transfer speed...

@aseering
Copy link
Contributor

aseering commented Sep 12, 2016

For what it's worth, the Core 2 Quad Q9550 is an 8-year-old chip. It's relatively old and slow compared to newer processors.

I feel like that chip ought to be able to push a little more than 8mbits over ssh, but not a whole lot more, and maybe newer versions of ssh are negotiating more CPU-intensive encryption schemes?

Do you have reason to believe that this machine could transfer data significantly faster and/or with less CPU utilization using a real Ubuntu 14.04 installation or VM on the same hardware?

@therealkenc
Copy link
Collaborator

A 1.86ghz Core 2 should be able to push around 139MiByte/s using aes128-ctr, which is the default mode for ssh on Ubuntu. You can do ~13MByte/s on a 200Mhz Pentium II. No idea why his core is maxed, mind.

@aseering
Copy link
Contributor

It has not been my experience that ssh itself achieves those speeds on this hardware... Do you have a benchmark of it (rather than an alternate implementation of just the algorithm) that you can cite?

@aseering
Copy link
Contributor

Hm... I think perhaps the truth is somewhere in the middle:

I used to use an old MacBook Pro with a Core 2 2.33ghz as my main computer. I just dug it out and tried running rsync from it using the OP's arguments. I was able to sustain in the neighborhood of 25MB/sec. Which is considerably slower than 139MB/s, but considerably faster than 1MB/sec.

I'm seeing around 15MB/sec from WSL, on a machine with a Core i7-6500U. Not that old-Mac to new-WSL is a very meaningful comparison... The receiving server was running native Ubuntu 14.04 in both cases.

It's worth observing that the OP is using a compressed data stream (-z), which probably uses nontrivial CPU time, but that happens in the rsync process, not the ssh process. (Unless any of the OP's ssh config files contain Compression yes...)

@darkdragon-001 -- do you have any customizations in /etc/ssh/ssh_config or ~/.ssh/config that would affect this connection?

@darkdragon-001
Copy link
Author

~/.ssh/config does not exist
/etc/ssh/ssh_config contains

Host *
SendEvn LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no

So I don't think this should affect the speed...

@aseering
Copy link
Contributor

aseering commented Sep 13, 2016

What kind of machine are you rsync'ing to? Another machine running WSL and Ubuntu's OpenSSH? An Ubuntu server (what version)? Some other Linux? Something else entirely?

If you're willing to post the output of ssh -vv user@host, that might contain some useful information. Or if not the whole output, at least post what encryption is being negotiated.

@therealkenc
Copy link
Collaborator

therealkenc commented Sep 13, 2016

Your new numbers seem about right. Here's a dude from 2009 getting 23.7MByte/s between his AMD Athlon 4600 and a Core 2 Mac. These are all pre AES-NI instruction machines. On my circa 2011 i7 with AES-NI I am getting around 150MB/s (~1.2Gbit/s) on WSL with scp over localhost.

@darkdragon-001
Copy link
Author

darkdragon-001 commented Sep 13, 2016

For most of the next tests, I omitted the -z of rsync since copying a file full of zeros lead to 80 MB/s over my 1 MB/s internet connection. Further, I used dd if=/dev/urandom to create a file filled with random content to do the tests. The rsync command is always run on WSL.

Test1: Ubuntu 16.04 with SSD connected via Gigabit Ethernet to Windows 10 WSL with SSD (without -z):
~ 42.5 MB/s, rsync and ssh have between 5% and 20% CPU.

Test2: Windows 10 WSL with SSD connected via Gigabit Ethernet to Ubuntu 16.04 with SSD (with -z):
~ 85 MB/s, rsync about 25% and ssh nearly 0% CPU

Test2.5: Windows 10 WSL with SSD connected via Gigabit Ethernet to Ubuntu 16.04 with SSD (without -z):
~ 70 MB/s, rsync about 10% and ssh nearly 25% CPU

Test3: Windows 10 WSL with SSD connected via Internet (50 MBit/s down, 10 MBit/s up) to Ubuntu Server 16.04.01 LTS (without -z):
~ 1 MB/s, rsync about 0% and ssh about 25% CPU

@aseering
Copy link
Contributor

Hm... That's very interesting.

If you run rsync with --rsh="ssh -vvv", ssh will print out connection debugging information. Do you see any interesting differences between its debug output in Test2 vs Test3?

Also, would it be possible to test copying from the Ubuntu 16.04 with SSD machine to the Ubuntu Server machine? Just to see how ssh behaves with WSL out of the picture entirely.

@therealkenc
Copy link
Collaborator

Great tests. I wonder we're really just seeing something like #981 here. How about just:

WSL$ scp you@native-server:bigfile /dev/null

@aseering
Copy link
Contributor

@therealkenc -- ah, another good test. For what it's worth, if none of the above tests turn up anything, I'm wondering if it's related to #971 (which I'm not immediately sure how to test for).

@darkdragon-001
Copy link
Author

Unfortunately, I have some problems with my server so I could only do local tests. I used scp command this time. Writing to SSD or to /dev/null did not make any difference.

3 machines with SSDs connected via Gigabit Ethernet:

  • WSL (Core2Quad -> maximum 25% CPU)
  • U+ (Ubuntu 16.04, Corei7 -> maximum 25% CPU)
  • U- (Ubuntu 16.04, Pentium 4 -> maxium 100% CPU)

* indicates that scp command was run on this machine

1: (ssh: 12%, scp:  2%) `U+`   --55MB/s--> *`WSL` (ssh: 20%, scp:  3%)
2: (ssh: 25%, scp:  7%) `WSL`* --75MB/s-->   `U+` (ssh: 10%, scp:  5%)
3: (ssh: 25%, scp:  2%) `WSL`* --40MB/s-->   `U-` (ssh: 50%, scp:  4%)
4: (ssh: 45%, scp:  3%) `U-`   --40MB/s--> *`WSL` (ssh: 15%, scp:  2%)
5: (ssh: 10%, scp:  1%) `U+`*  --70MB/s-->   `U-` (ssh: 45%, scp: 15%)
6: (ssh: 40%, scp:  8%) `U-`*  --60MB/s-->   `U+` (ssh: 15%, scp: 4%)
7: (ssh: 14%, scp:  1%) `U+`   --55MB/s-->  *`U-` (ssh: 40%, scp: 12%)
8: (ssh: 40%, scp:  7%) `U-`   --75MB/s-->  *`U+` (ssh: 17%, scp:  4%)

I noticed the following:

  • SSH on WSL is bottleneck when sending (25% CPU) [2; 3]
  • WSL somehow limits speed to U- which is capable of more speed [3/5; 4/8]
  • Sending data is usually faster then fetching data (and needs less CPU) [2/1; 5/7]

@therealkenc
Copy link
Collaborator

Great you took the effort to test this more. Can you explain [3/5; 4/8] -- apologies, it will be obvious once you do (threads?). These numbers look pretty reasonable, considering, no? I mean, based on the numbers I've seen online for "older" machines (ie without AES-NI) you're being limited by AES not your GigE local net. We're getting pretty pedantic at this point, but you could try doing the same U- scp tests with a Windows ssh (open or MS) and see if it is any different than WSL on the same box.

One thing I left unmentioned up-post is that AES proper (any implementation) isn't going to cross out of userspace, so this shouldn't theoretically be in WSL's wheelhouse. The WSL emulation layer doesn't even know your data is encrypted. What I was really looking for was WSL related overhead in either the fs or tcp system calls. That's really the only place you should be seeing any WSL influence here.

@therealkenc
Copy link
Collaborator

therealkenc commented Sep 30, 2016

This issue went stale without comment from the team, but I can confirm something is going on here; I am just not sure what. Note the 374.88K bytes/sec below.

This is rsync into a chroot folder but I doubt that's a factor. It isn't hardware factors (cpu speed, network speed) because I happened to do the same operation in a Ubuntu VM on the same machine to the same ISP just a few minutes earlier and the operation was 10x faster. Sorry I don't have exact stats; the VM results are out of my scroll buffer. But in a VM it was "didn't even think to look" speed while in WSL it was "taking so long I left" speed. I am really starting to think it's something related to #981, which I am also seeing in spades. This is in VolFs.

# emerge-webrsync
Fetching most recent snapshot ...
Trying to retrieve 20160929 snapshot from http://gentoo.mirrors.tera-byte.com ...
Fetching file portage-20160929.tar.xz.md5sum ...
Fetching file portage-20160929.tar.xz.gpgsig ...
Fetching file portage-20160929.tar.xz ...
Checking digest ...
Getting snapshot timestamp ...
Syncing local tree ...


Number of files: 207,023 (reg: 179,492, dir: 27,531)
Number of created files: 207,022 (reg: 179,492, dir: 27,530)
Number of deleted files: 0
Number of regular files transferred: 179,492
Total file size: 415.23M bytes
Total transferred file size: 415.23M bytes
Literal data: 415.23M bytes
Matched data: 0 bytes
File list size: 4.06M
File list generation time: 0.007 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 180.33M
Total bytes received: 3.55M

sent 180.33M bytes  received 3.55M bytes  374.88K bytes/sec
total size is 415.23M  speedup is 2.26
Cleaning up ...

Performing Global Updates
(Could take a couple of minutes if you have a lot of binary packages.)

@RowboTony
Copy link

I have experienced this also while compiling Ruby. It seems there are several issues related, not to a specific command, but rather overall to WSL being limited to 25% of CPU. Is this limit an intentional feature of WSL? I'd like to increase this limit if possible. See also #358

@therealkenc
Copy link
Collaborator

Essentially a discussion thread that ran course. A single threaded compile or single threaded ssh can (and will) max out a CPU if that's the bottleneck (contrast disk or network io). There's no known limitation specific to WSL in that regard (if WSL couldn't use all CPUs people's heads would explode). It is possible (but improbable) there are scenarios where WSL behaves different from the Real Thing for reasons, but in the event that is the case what would be needed to track further would be a test case with a tight repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants