Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockpit-ssh leaks sss_ssh_knownhostsproxy and memory on SIGTERM #18310

Closed
mlvgn opened this issue Feb 8, 2023 · 7 comments · Fixed by #18572 or #18632
Closed

cockpit-ssh leaks sss_ssh_knownhostsproxy and memory on SIGTERM #18310

mlvgn opened this issue Feb 8, 2023 · 7 comments · Fixed by #18572 or #18632
Assignees

Comments

@mlvgn
Copy link

mlvgn commented Feb 8, 2023

Explain what happens

  1. Open Cockpit of host A, where there is a cockpit-ws.
  2. Add new host B to Cockpit;
  3. Select host B in Cockpit;
  4. Open the web terminal and do something in it (for example, I edit configs);
  5. Close the browser tab.

A few days later I get a notification that host B is running out of memory. I diagnose and see that the cockpit-bridge process is running with a memory consumption of more than 10 GB.

Version of Cockpit

283

Where is the problem in Cockpit?

Unknown or not applicable

Server operating system

Fedora

Server operating system version

36

What browsers are you using?

Chrome

System log

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                        
 358874 username  20   0   13.1g  10.5g   6032 S   0.0  33.5  44:20.13 cockpit-bridge                                                 
3519570 username  20   0   17.8g   7.8g   5920 S   0.0  25.0  60:21.99 cockpit-bridge

Forwarded bug

https://bugzilla.redhat.com/show_bug.cgi?id=2185785

@mlvgn mlvgn added the bug label Feb 8, 2023
@mlvgn
Copy link
Author

mlvgn commented Feb 8, 2023

I have several problem machines, they are all in the FreeIPA domain. If I close the browser, the cockpit-bridge process remains hanging and growing in memory.

List of installed packages:

apps                 Applications                             /usr/share/cockpit/apps
base1                                                         /usr/share/cockpit/base1
metrics                                                       /usr/share/cockpit/metrics
network              Networking                               /usr/share/cockpit/networkmanager
performance                                                   /usr/share/cockpit/tuned
shell                                                         /usr/share/cockpit/shell
ssh                                                           /usr/share/cockpit/ssh
storage              Storage                                  /usr/share/cockpit/storaged
system               Overview, Logs, Services, Terminal       /usr/share/cockpit/systemd
updates              Software updates                         /usr/share/cockpit/packagekit
users                Accounts                                 /usr/share/cockpit/users

@mlvgn
Copy link
Author

mlvgn commented Feb 8, 2023

I found that the services are hanging because there is an sss_ssh_knownhostsproxy connection from host A to host B:

tcp   ESTAB      0       0        172.16.54.5:43908 172.16.54.14:22    users:(("sss_ssh_knownho",pid=859102,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:42582 172.16.54.14:22    users:(("sss_ssh_knownho",pid=857434,fd=4))                                          
tcp   ESTAB      462408  0        172.16.54.5:36800 172.16.54.14:22    users:(("sss_ssh_knownho",pid=118452,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:48566 172.16.54.14:22    users:(("sss_ssh_knownho",pid=861084,fd=4))                                          
tcp   ESTAB      697560  0        172.16.54.5:46662 172.16.54.14:22    users:(("sss_ssh_knownho",pid=858115,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:59216 172.16.54.14:22    users:(("sss_ssh_knownho",pid=860563,fd=4))                                          
tcp   ESTAB      912904  0        172.16.54.5:39610 172.16.54.14:22    users:(("sss_ssh_knownho",pid=63809,fd=4))                                           
tcp   ESTAB      892240  0        172.16.54.5:60632 172.16.54.14:22    users:(("sss_ssh_knownho",pid=857956,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:33434 172.16.54.14:22    users:(("sss_ssh_knownho",pid=857239,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:41442 172.16.54.14:22    users:(("sss_ssh_knownho",pid=861447,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:55252 172.16.54.14:22    users:(("sss_ssh_knownho",pid=857618,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:33222 172.16.54.14:22    users:(("sss_ssh_knownho",pid=844960,fd=4))                                          
tcp   ESTAB      0       0        172.16.54.5:35626 172.16.54.14:22    users:(("sss_ssh_knownho",pid=856936,fd=4))

If I kill the sss_ssh_knownhostsproxy processes on host A, then the cockpit-bridge processes on host B are terminated and the memory is freed.

@martinpitt
Copy link
Member

Sorry for the late reply! I investigated this using our test/verify/check-system-realms TestIPA.testQualifiedUsers test case, and I confirm that after logging out, the session is stuck:

[root@x0 ~]# loginctl session-status 11
11 - admin@cockpit.lan (1403400000)
           Since: Mon 2023-03-20 08:33:51 UTC; 10min ago
          Leader: 2788
             TTY: web console
          Remote: ::ffff:172.27.0.2
         Service: cockpit; type web; class user
           State: closing
            Unit: session-11.scope
                  └─2940 /usr/bin/sss_ssh_knownhostsproxy -p 22 x0.cockpit.lan

and there's also the bridge process stuck:

admin@c+    2964  0.0  1.1 315112  7812 ?        Ssl  08:33   0:00 cockpit-bridge

I let this sit for over an hour, and it did not use a single extra byte of memory, though -- so I can't reproduce the memory leak, but the process leak is an issue. I'll look into that, thanks!

@martinpitt martinpitt self-assigned this Mar 20, 2023
@martinpitt martinpitt changed the title cockpit-bridge memory leak cockpit-bridge leaks sss_ssh_knownhostsproxy and memory Mar 20, 2023
@martinpitt
Copy link
Member

Debugging notes:

I'm a bit confused by this. cockpit-ssh does call sss_ssh_knownhostsproxy but (1) with different arguments, and (2) that process does go away immediately. It turns out that we don't even have to do this, whatever calls this instance of knownhostsproxy already does that for us. I am confirming in PR #18572 . Curiously, when c-ssh stops calling the proxy, that also seems to stop this leak -- they step on each other's toes and cause a deadlock?

I'm not sure what that "something" could be. The most obvious candidate is libssh, but I don't see any reference to sssd or knownhostsproxy there. I also tried to call cockpit-ssh directly, and that does not launch this process.

martinpitt added a commit to martinpitt/cockpit that referenced this issue Mar 28, 2023
Something else seems to already call this (in a slightly different way),
so this is redundant. Moreover, the two invocations stepped on each
other's feet, leaving a stuck `sss_ssh_knownhostsproxy` around which
keeps accreting memory and blocks the user session from going away.
Validate the session cleanup in `TestIPA.testQualifiedUsers`.

Fixes cockpit-project#18310
jelly pushed a commit that referenced this issue Mar 29, 2023
Something else seems to already call this (in a slightly different way),
so this is redundant. Moreover, the two invocations stepped on each
other's feet, leaving a stuck `sss_ssh_knownhostsproxy` around which
keeps accreting memory and blocks the user session from going away.
Validate the session cleanup in `TestIPA.testQualifiedUsers`.

Fixes #18310
@martinpitt
Copy link
Member

For closure, that's what the above "something" is:

/etc/ssh/ssh_config.d/04-ipa.conf:	ProxyCommand /usr/bin/sss_ssh_knownhostsproxy -p %p %h

That also explains why it's running all the time, instead of just once. This is actually fairly awkward (but not something which we can influence).

However, even after #18572 the new test still fails fairly often. I can reproduce this sometimes, and indeed it's still the proxy process:

           Since: Tue 2023-04-11 05:22:06 UTC; 1min 36s ago
          Leader: 2935
             TTY: web console
          Remote: ::ffff:172.27.0.2
         Service: cockpit; type web; class user
           State: closing
            Unit: session-11.scope
                  └─3025 /usr/bin/sss_ssh_knownhostsproxy -p 22 x0.cockpit.lan

strace says restart_syscall(<... resuming interrupted read ...>, but it's not clear from what it tries to read.

I've seen it once with (printf '\n\n\n\n\n\n'; sleep 2) | /usr/libexec/cockpit-ssh x0.cockpit.lan, but unfortunately I cannot reproduce it. This needs more debugging, and possibly a bugzilla, and a naughty.

@martinpitt martinpitt reopened this Apr 11, 2023
@martinpitt
Copy link
Member

I reported this to https://bugzilla.redhat.com/show_bug.cgi?id=2185785 -- let's continue to debug it there.

@martinpitt martinpitt changed the title cockpit-bridge leaks sss_ssh_knownhostsproxy and memory cockpit-ssh leaks sss_ssh_knownhostsproxy and memory on SIGTERM Apr 13, 2023
@martinpitt
Copy link
Member

I did some further analysis on https://bugzilla.redhat.com/show_bug.cgi?id=2185785 and I think it's cockpit-ssh's fault after all. It needs to intercept SIGTERM and properly close the connection to clean up, libssh shouldn't install signal handlers.

@martinpitt martinpitt reopened this Apr 13, 2023
martinpitt added a commit to martinpitt/cockpit that referenced this issue Apr 13, 2023
When receiving SIGTERM, SIGINT, or SIGPIPE, give libssh a chance to
clean up the connection. In particular, that will close a running
`ProxyCommand` process.

Fixes cockpit-project#18310
https://bugzilla.redhat.com/show_bug.cgi?id=2185785
martinpitt added a commit to martinpitt/cockpit that referenced this issue Apr 14, 2023
When receiving SIGTERM or SIGINT, give libssh a chance to clean up the
connection. In particular, that will close a running `ProxyCommand`
process when logging out of Cockpit.

Fixes cockpit-project#18310
https://bugzilla.redhat.com/show_bug.cgi?id=2185785
allisonkarlitskaya pushed a commit that referenced this issue Apr 17, 2023
When receiving SIGTERM or SIGINT, give libssh a chance to clean up the
connection. In particular, that will close a running `ProxyCommand`
process when logging out of Cockpit.

Fixes #18310
https://bugzilla.redhat.com/show_bug.cgi?id=2185785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants