Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSSD-kcm/secrets failed to restart during/after upgrade #4555

Closed
sssd-bot opened this issue May 2, 2020 · 0 comments
Closed

SSSD-kcm/secrets failed to restart during/after upgrade #4555

sssd-bot opened this issue May 2, 2020 · 0 comments
Labels
Closed: Fixed Issue was closed as fixed.

Comments

@sssd-bot
Copy link

sssd-bot commented May 2, 2020

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3529

  • Created at 2017-10-02 18:08:42 by benzea
  • Closed at 2017-11-03 15:14:23 as Fixed
  • Assigned to lslebodn

I did a dnf update today and was stuck without a working kerberos ticket cache. It appears that there were some issues restarting the service rendering it useless thereafter. A simple systemctl restart sssd-kcm.socket fixed the issue obviously.

Attaching both the system log and the DNF output.

Comments


Comment from benzea at 2017-10-02 18:09:31

system.logdnf_history_info_last


Comment from jhrozek at 2017-10-02 18:46:30

Can you reproduce the issue? In the system log, I can only see that KCM had some issues, as you say and the syslog said the socket was already there.

Could you reproduce the issue again, this time adding:

[kcm]
debug_level=10
debug_microseconds=true

[secrets]
debug_level=10
debug_microseconds=true

to sssd.conf and restarting the sssd service?


Comment from benzea at 2017-10-02 19:05:27

Hm, I don't seem to be able to reproduce this right now. Though maybe the order/timing of restarts in the post install scripts is relevant?


Comment from jhrozek at 2017-10-02 19:40:52

Maybe... What did you upgrade from and to?


Comment from benzea at 2017-10-04 11:42:01

Hm, the attachment is being slightly mishandled, but I think it should be there.

https://pagure.io/SSSD/sssd/issue/raw/files/9a3c92c3286adc4d1040594531a0fe2440afd6178ace5366561a7b6c21a71df4-dnf_history_info_last

Not sure if this might be related; I just unsuspended the machine, and got stuck with a non-working SSSD-KCM. Though it looked a bit like sssd-secrets was stuck on something and sssd-kcm just refused to work (or even stop) at that point.

Attaching an strace of sssd-kcm, sorry, don't have anything else at this point. But it keeps trying to do a 'sendto(14, "GET /kcm/persistent/1000/ccache/"..., 115, MSG_NOSIGNAL, NULL, 0) = 115'. Doing a "klist -A" resulted "klist: Internal credentials cache error while listing ccache collection" (also attaching strace).

I have now enabled the debugging features, so lets hope that something more useful comes out of that.
klist-A-failure-after-suspend-for-more-than-a-day

strace-sssd-kcm-hanging


Comment from lslebodn at 2017-10-04 12:54:51

Seems to be the same bug as in fedora ticket https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c4

Debug log files will be more useful then strace output
https://bugzilla.redhat.com/show_bug.cgi?id=1494843#c12


Comment from lslebodn at 2017-10-06 18:25:24

@benzea Do you use GNOME Online Accounts + kerberos? Or you can reproduce with plain kinit?


Comment from benzea at 2017-10-06 18:28:05

GNOME online accounts is obviously running, but I have always added my kerberos identities by running kinit every time.


Comment from benzea at 2017-11-03 11:17:07

So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:

  • sssd-*.service is started
  • sssd-*.socket is also triggered (but the socket is not bound yet)
  • daemon comes up
  • daemon binds to the socket as systemd has not done so yet
  • systemd fails to bind the socket and the sssd-*.socket units fail to start up
  • systemd stops sssd-*.service as the socket failed

There are different possible fixes for this:

  • add proper Before=/After= lines
  • prevent the service from ever trying to bind to the socket if running under systemd

Comment from lslebodn at 2017-11-03 11:35:34

On (03/11/17 10:17), Benjamin Berg wrote:

So, I just ran into it, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:

  • sssd-*.service is started
  • sssd-*.socket is also triggered (but the socket is not bound yet)
  • daemon comes up
  • daemon binds to the socket as systemd has not done so yet
  • systemd fails to bind the socket and the sssd-*.socket units fail to start up

There are different possible fixes for this:

  • add proper Before=/After= lines
  • prevent the service from ever trying to bind to the socket if running under systemd

I checked few other socket activated services

And most of socket activates services use "Requires=$name.socket" instead of
Before/After and some of them used Wants+After

I will check with systemd guys.

LS


Comment from lslebodn at 2017-11-03 11:48:08

#437


Comment from lslebodn at 2017-11-03 11:48:49

Metadata Update from @lslebodn:

  • Issue tagged with: PR

Comment from lslebodn at 2017-11-03 15:13:05

master:


Comment from lslebodn at 2017-11-03 15:14:03

Metadata Update from @lslebodn:

  • Custom field version adjusted to 1.15.3

Comment from lslebodn at 2017-11-03 15:14:25

Metadata Update from @lslebodn:

  • Issue close_status updated to: Fixed
  • Issue set to the milestone: SSSD 1.16.1
  • Issue status updated to: Closed (was: Open)

Comment from lslebodn at 2017-11-03 15:14:35

Metadata Update from @lslebodn:

  • Issue assigned to lslebodn
@sssd-bot sssd-bot added the Closed: Fixed Issue was closed as fixed. label May 2, 2020
@sssd-bot sssd-bot closed this as completed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed: Fixed Issue was closed as fixed.
Projects
None yet
Development

No branches or pull requests

1 participant