Socket activated SSHD shows error in journalctl early at boot #3279

CMCDragonkai · 2014-07-15T04:12:41Z

I've got a socket activated SSHD on my NixOS instance:

In configuration.nix

    services = {
        openssh = {
            enable = true;
            startWhenNeeded = true;
            permitRootLogin = "no";
            passwordAuthentication = false;
            extraConfig = ''
                PrintLastLog no
            '';
        };
    };

When checking my sudo systemctl status, I always get this on every boot:

● matrix-node
    State: degraded
     Jobs: 0 queued
   Failed: 1 units
    Since: Tue 2014-07-15 02:20:03 UTC; 1h 34min ago
   CGroup: /
           ├─1 systemd --log-target=journal
           ├─system.slice
           │ ├─nix-daemon.service
           │ │ └─3233 nix-daemon --daemon
           │ ├─ntpd.service
           │ │ └─1751 ntpd -g -n -c /nix/store/02izw6mxdn7sirjy9n6m0zrb9bp3rn8j-ntp.conf -u ntp:nogroup -i /var/lib/ntp
           │ ├─nscd.service
           │ │ └─1697 nscd -f /nix/store/jyaww75i9jac8rpjs3skjx519nqg0yy8-nscd.conf
           │ ├─dbus.service
           │ │ └─1381 /nix/store/6cyp9201l5wyh209fwzvyksva5s9fz4a-dbus-tools-1.8.0/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd
           │ ├─systemd-logind.service
           │ │ └─1375 /nix/store/sk3017fxx2vhj69saizk5634y3w6hqhn-systemd-212/lib/systemd/systemd-logind
           │ ├─haveged.service
           │ │ └─1336 /nix/store/c6x8mp7yvzm8agcfaanwfjcnnmjk72r3-haveged-1.9.1/sbin/haveged -w 1024 -v 1
           │ ├─virtualbox.service
           │ │ └─1322 VBoxService --foreground
           │ ├─dhcpcd.service
           │ │ └─1714 dhcpcd --quiet --config /nix/store/272zp63msk1mj2ar1fhirisfn4vf8z9d-dhcpcd.conf
           │ ├─systemd-udevd.service
           │ │ └─1255 /nix/store/sk3017fxx2vhj69saizk5634y3w6hqhn-systemd-212/lib/systemd/systemd-udevd
           │ ├─systemd-journald.service
           │ │ └─1243 /nix/store/sk3017fxx2vhj69saizk5634y3w6hqhn-systemd-212/lib/systemd/systemd-journald
           │ └─system-getty.slice
           │   └─getty@tty1.service
           │     └─1331 agetty --noclear --login-program /nix/store/gzr868gspawhic2pcamqms8xbpp06rm2-shadow-4.1.5.1/bin/login --keep-baud tty1 115200 38400 9
           └─user.slice
             └─user-1000.slice
               ├─session-3.scope
               │ ├─2713 sshd: operator [priv]
               │ ├─2726 sshd: operator@pts/0
               │ ├─2727 -bash
               │ ├─3319 sudo systemctl status
               │ ├─3320 systemctl status
               │ └─3321 less
               └─user@1000.service
                 ├─2718 /nix/store/sk3017fxx2vhj69saizk5634y3w6hqhn-systemd-212/lib/systemd/systemd --user
                 ├─2719 (sd-pam
                 └─ssh-agent.service
                   └─2725 /nix/store/2wc50fcn54axkg2kk71jm2r5h0w5rbh6-openssh-6.6p1/bin/ssh-agent -a /run/user/1000/ssh-agent

So there's always a failed service, and it always shows that this OS is in a degraded state. This is not helpful as these will alert errors in a centralised monitoring/logging system.

The specific failed service is an sshd service:

$ sudo systemctl status sshd@0-10.0.2.15:22-10.0.2.2:51317.service
● sshd@0-10.0.2.15:22-10.0.2.2:51317.service - SSH Daemon (10.0.2.2:51317)
   Loaded: loaded (/nix/store/wr8r8jrj204q3i0v4vfav8m63ssnv8w1-unit/sshd@.service)
   Active: failed (Result: exit-code) since Tue 2014-07-15 02:22:12 UTC; 1h 35min ago
  Process: 1650 ExecStart=/nix/store/2wc50fcn54axkg2kk71jm2r5h0w5rbh6-openssh-6.6p1/sbin/sshd -i -f /nix/store/ai2a554az21b5zhd1kamcznbim4gd924-sshd_config (code=exited, status=1/FAILURE)
  Process: 1648 ExecStartPre=/nix/store/i5wnidc4707k3pgcbhyjq3qb4ajgyx5n-unit-script/bin/sshd@-pre-start (code=exited, status=0/SUCCESS)
 Main PID: 1650 (code=exited, status=1/FAILURE)

Jul 15 02:20:12 matrix-node systemd[1]: Starting SSH Daemon (10.0.2.2:51317)...
Jul 15 02:20:12 matrix-node systemd[1]: Started SSH Daemon (10.0.2.2:51317).
Jul 15 02:22:12 matrix-node systemd[1]: sshd@0-10.0.2.15:22-10.0.2.2:51317.service: main process exited, code=exited, status=1/FAILURE
Jul 15 02:22:12 matrix-node systemd[1]: Unit sshd@0-10.0.2.15:22-10.0.2.2:51317.service entered failed state.

However, there are 3 instances of SSHD services that were launched. This is proven via the systemctl status sshd.socket output shows that there were 3 connections accepted.

$ sudo systemctl status sshd.socket
● sshd.socket - SSH Socket
   Loaded: loaded (/nix/store/g8s84aaj5p3dcak3c2xinlwm7n7vxn52-unit/sshd.socket)
   Active: active (listening) since Tue 2014-07-15 02:20:06 UTC; 1h 36min ago
   Listen: [::]:22 (Stream)
 Accepted: 3; Connected: 1

Jul 15 02:20:06 matrix-node systemd[1]: Listening on SSH Socket.

I'm launching this through vagrant. This means there should at least be 2 sshd instances. The first would be when vagrant boots the VM up, and then ssh in to setup the network and shared folders. The second is when I run vagrant ssh to ssh into the machine.

However this shows 3 instances. I did some more digging and found that these 3 instances were:

sshd@0-10.0.2.15:22-10.0.2.2:51317.service  -> Failed - Unknown ???
sshd@0-10.0.2.15:22-10.0.2.2:51331.service -> Succeeded and Exited - Vagrant Startup Session
sshd@0-10.0.2.15:22-10.0.2.2:51446.service -> Still running (current session)

The 51317 happened first, then 51331, then 51446. Now I check sudo journalctl -b -r, and this is what I see (reversed, top most recent, bottom older):

...LOTS OF LOGS HERE...
**Jul 15 02:22:12 matrix-node systemd[1]: Unit sshd@0-10.0.2.15:22-10.0.2.2:51317.service entered failed state.**
Jul 15 02:22:12 matrix-node systemd[1]: sshd@0-10.0.2.15:22-10.0.2.2:51317.service: main process exited, code=exited, status=1/FAILURE
Jul 15 02:20:26 matrix-node systemd[1]: Started User Manager for UID 1000.
Jul 15 02:20:26 matrix-node systemd[2718]: Startup finished in 20ms.
...LOTS OF LOGS HERE...
Jul 15 02:20:26 matrix-node sshd[2713]: Accepted publickey for operator from 10.0.2.2 port 51336 ssh2: RSA dd:3b:b8:2e:85:04:06:e9:ab:ff:a8:0a:c0:04:6e:d6
**Jul 15 02:20:26 matrix-node systemd[1]: Started SSH Daemon (10.0.2.2:51336).**
Jul 15 02:20:26 matrix-node systemd[1]: Starting SSH Daemon (10.0.2.2:51336)...
...LOTS OF LOGS HERE...
Jul 15 02:20:15 matrix-node sshd[1757]: Accepted publickey for operator from 10.0.2.2 port 51331 ssh2: RSA dd:3b:b8:2e:85:04:06:e9:ab:ff:a8:0a:c0:04:6e:d6
**Jul 15 02:20:15 matrix-node systemd[1]: Started SSH Daemon (10.0.2.2:51331).**
Jul 15 02:20:15 matrix-node systemd[1]: Starting SSH Daemon (10.0.2.2:51331)...
...LOTS OF LOGS HERE...
**Jul 15 02:20:12 matrix-node systemd[1]: Started SSH Daemon (10.0.2.2:51317).**
Jul 15 02:20:12 matrix-node systemd[1]: Starting SSH Daemon (10.0.2.2:51317)...
Jul 15 02:20:12 matrix-node systemd[1]: Created slice system-sshd.slice.
Jul 15 02:20:12 matrix-node systemd[1]: Starting system-sshd.slice.

Take a look at the ones that I marked with ** **. They show that 51317 started first, and then later failed after 2 minutes. This is the unknown SSH instance, I don't know why this instance started, and nothing ever tried logging in. The other 2 ssh service instances started, succeeded. One of them by Vagrant, the other by my current session.

So what is 51317? Why is it starting up with nothing trying to login, and then subsequently fail and cause my VM to be in a degraded state. Does this have something to do with the socket activation?

The text was updated successfully, but these errors were encountered:

wmertens · 2014-07-16T17:36:22Z

What happens when you just open and close the port with telnet?

CMCDragonkai · 2014-07-17T02:28:10Z

Interesting, when I telnet, and run systemctl status, it now shows 2 ssh services failed.

 ೱ telnet 127.0.0.1 2222                                                                                                                   14-07-17 12:01:14
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.6
dfgfgdfg
Protocol mismatch.
Connection closed by foreign host.

$ sudo systemctl status
● matrix-node
    State: degraded
     Jobs: 0 queued
   Failed: 2 units
    Since: Thu 2014-07-17 01:59:26 UTC; 24min ago

$ sudo systemctl
● sshd@0-10.0.2.15:22-10.0.2.2:50367.service                        loaded failed failed    SSH Daemon (10.0.2.2:50367)
  sshd@2-10.0.2.15:22-10.0.2.2:50423.service                        loaded active running   SSH Daemon (10.0.2.2:50423)
● sshd@3-10.0.2.15:22-10.0.2.2:51014.service                        loaded failed failed    SSH Daemon (10.0.2.2:51014)

I'm thinking this means:

Something is telnetting the VM as it is booting.
Why would SSH show up as a service failure, if it is the client's fault to connect with a mismatched protocol?

So perhaps is there a way to change it so that it's not a SSH failure just because a client fails in connecting?

CMCDragonkai · 2014-07-17T03:05:10Z

I spoke to someone on the systemd mailing list, they said:

sshd chooses to exit with a failure code in this case. What we should really
do is add SuccessExitStatus= setting to the sshd@.service and ignore that
code. Is it some specific value?

Then:

That looks like -1, but whatever. Does it work if 
you add SuccessExitStatus=255?

How would I change the SuccessExitStatus for the sshd@.service inside Nix? I can't change it manually, is there some declarative nix expression for this?

Profpatsch · 2016-03-06T08:59:26Z

(triage) @CMCDragonkai can you re-check with a recent system?

Profpatsch · 2016-03-20T22:24:32Z

No response, so close @zimbatm

sshd will at times fail when exiting. When socket activated, this will leave a number of sshd@ service instances in the failed state, so we simply ignore the error code if we are running socket activated. Recommended by upstream: http://systemd-devel.freedesktop.narkive.com/d0eapMCG/socket-activated-sshd-service-showing-up-as-a-failure-when-the-client-connection-fails Fixes: NixOS#3279

CMCDragonkai · 2016-08-04T08:04:45Z

@Profpatsch Apologies for the late reply. But it appears that this has not been fixed. However @peterhoeg has been working on this. The latest commit that I can see is: peterhoeg@83b9df2

CMCDragonkai · 2016-08-04T08:06:10Z

@peterhoeg Can you send a PR for your fix, so this issue can be truly closed?

sshd will at times fail when exiting. When socket activated, this will leave a number of sshd@ service instances in the failed state, so we simply ignore the error code if we are running socket activated. Recommended by upstream: http://systemd-devel.freedesktop.narkive.com/d0eapMCG/socket-activated-sshd-service-showing-up-as-a-failure-when-the-client-connection-fails Fixes: NixOS#3279

peterhoeg · 2016-08-04T08:50:42Z

By the way, I am REALLY sorry about spamming this issue. Note to self - only reference issues in the actual PR and not the commit message.

sshd will at times fail when exiting. When socket activated, this will leave a number of sshd@ service instances in the failed state, so we simply ignore the error code if we are running socket activated. Recommended by upstream: http://systemd-devel.freedesktop.narkive.com/d0eapMCG/socket-activated-sshd-service-showing-up-as-a-failure-when-the-client-connection-fails Fixes: NixOS#3279

joachifm closed this as completed Mar 20, 2016

peterhoeg mentioned this issue Aug 4, 2016

ssh module: ignore exit code when socket activated #17503

Merged

4 tasks

globin reopened this Aug 5, 2016

globin closed this as completed in #17503 Aug 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket activated SSHD shows error in journalctl early at boot #3279

Socket activated SSHD shows error in journalctl early at boot #3279

CMCDragonkai commented Jul 15, 2014

wmertens commented Jul 16, 2014

CMCDragonkai commented Jul 17, 2014

CMCDragonkai commented Jul 17, 2014

Profpatsch commented Mar 6, 2016

Profpatsch commented Mar 20, 2016

CMCDragonkai commented Aug 4, 2016

CMCDragonkai commented Aug 4, 2016

peterhoeg commented Aug 4, 2016

Socket activated SSHD shows error in journalctl early at boot #3279

Socket activated SSHD shows error in journalctl early at boot #3279

Comments

CMCDragonkai commented Jul 15, 2014

wmertens commented Jul 16, 2014

CMCDragonkai commented Jul 17, 2014

CMCDragonkai commented Jul 17, 2014

Profpatsch commented Mar 6, 2016

Profpatsch commented Mar 20, 2016

CMCDragonkai commented Aug 4, 2016

CMCDragonkai commented Aug 4, 2016

peterhoeg commented Aug 4, 2016