You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a dnf update today and was stuck without a working kerberos ticket cache. It appears that there were some issues restarting the service rendering it useless thereafter. A simple systemctl restart sssd-kcm.socket fixed the issue obviously.
Not sure if this might be related; I just unsuspended the machine, and got stuck with a non-working SSSD-KCM. Though it looked a bit like sssd-secrets was stuck on something and sssd-kcm just refused to work (or even stop) at that point.
Attaching an strace of sssd-kcm, sorry, don't have anything else at this point. But it keeps trying to do a 'sendto(14, "GET /kcm/persistent/1000/ccache/"..., 115, MSG_NOSIGNAL, NULL, 0) = 115'. Doing a "klist -A" resulted "klist: Internal credentials cache error while listing ccache collection" (also attaching strace).
I have now enabled the debugging features, so lets hope that something more useful comes out of that.
So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:
sssd-*.service is started
sssd-*.socket is also triggered (but the socket is not bound yet)
daemon comes up
daemon binds to the socket as systemd has not done so yet
systemd fails to bind the socket and the sssd-*.socket units fail to start up
systemd stops sssd-*.service as the socket failed
There are different possible fixes for this:
add proper Before=/After= lines
prevent the service from ever trying to bind to the socket if running under systemd
So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:
sssd-*.service is started
sssd-*.socket is also triggered (but the socket is not bound yet)
daemon comes up
daemon binds to the socket as systemd has not done so yet
systemd fails to bind the socket and the sssd-*.socket units fail to start up
There are different possible fixes for this:
add proper Before=/After= lines
prevent the service from ever trying to bind to the socket if running under systemd
I checked few other socket activated services
And most of socket activates services use "Requires=$name.socket" instead of
Before/After and some of them used Wants+After
Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3529
I did a dnf update today and was stuck without a working kerberos ticket cache. It appears that there were some issues restarting the service rendering it useless thereafter. A simple
systemctl restart sssd-kcm.socket
fixed the issue obviously.Attaching both the system log and the DNF output.
Comments
Comment from benzea at 2017-10-02 18:09:31
Comment from jhrozek at 2017-10-02 18:46:30
Can you reproduce the issue? In the system log, I can only see that KCM had some issues, as you say and the syslog said the socket was already there.
Could you reproduce the issue again, this time adding:
to sssd.conf and restarting the sssd service?
Comment from benzea at 2017-10-02 19:05:27
Hm, I don't seem to be able to reproduce this right now. Though maybe the order/timing of restarts in the post install scripts is relevant?
Comment from jhrozek at 2017-10-02 19:40:52
Maybe... What did you upgrade from and to?
Comment from benzea at 2017-10-04 11:42:01
Hm, the attachment is being slightly mishandled, but I think it should be there.
https://pagure.io/SSSD/sssd/issue/raw/files/9a3c92c3286adc4d1040594531a0fe2440afd6178ace5366561a7b6c21a71df4-dnf_history_info_last
Not sure if this might be related; I just unsuspended the machine, and got stuck with a non-working SSSD-KCM. Though it looked a bit like sssd-secrets was stuck on something and sssd-kcm just refused to work (or even stop) at that point.
Attaching an strace of sssd-kcm, sorry, don't have anything else at this point. But it keeps trying to do a 'sendto(14, "GET /kcm/persistent/1000/ccache/"..., 115, MSG_NOSIGNAL, NULL, 0) = 115'. Doing a "klist -A" resulted "klist: Internal credentials cache error while listing ccache collection" (also attaching strace).
I have now enabled the debugging features, so lets hope that something more useful comes out of that.
Comment from lslebodn at 2017-10-04 12:54:51
Seems to be the same bug as in fedora ticket https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c4
Debug log files will be more useful then strace output
https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c12
Comment from lslebodn at 2017-10-06 18:25:24
@benzea Do you use GNOME Online Accounts + kerberos? Or you can reproduce with plain kinit?
Comment from benzea at 2017-10-06 18:28:05
GNOME online accounts is obviously running, but I have always added my kerberos identities by running kinit every time.
Comment from benzea at 2017-11-03 11:17:07
So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:
There are different possible fixes for this:
Comment from lslebodn at 2017-11-03 11:35:34
On (03/11/17 10:17), Benjamin Berg wrote:
I checked few other socket activated services
And most of socket activates services use "Requires=$name.socket" instead of
Before/After and some of them used Wants+After
I will check with systemd guys.
LS
Comment from lslebodn at 2017-11-03 11:48:08
#437
Comment from lslebodn at 2017-11-03 11:48:49
Metadata Update from @lslebodn:
Comment from lslebodn at 2017-11-03 15:13:05
master:
Comment from lslebodn at 2017-11-03 15:14:03
Metadata Update from @lslebodn:
Comment from lslebodn at 2017-11-03 15:14:25
Metadata Update from @lslebodn:
Comment from lslebodn at 2017-11-03 15:14:35
Metadata Update from @lslebodn:
The text was updated successfully, but these errors were encountered: