Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests queued during transition from offline to online mode #3397

Closed
sssd-bot opened this issue May 2, 2020 · 0 comments
Closed

Requests queued during transition from offline to online mode #3397

sssd-bot opened this issue May 2, 2020 · 0 comments
Assignees
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.

Comments

@sssd-bot
Copy link

sssd-bot commented May 2, 2020

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/2355


Using SSSD with "cache_credentials = true", users may experience periodic blocking for up to 6 seconds if SSSD is switching from offline to online mode and the LDAP server is unreachable.

The first request to SSSD after it has been offline for more than 60 seconds is immediately answered from the cache, but then triggers a reconnection trial to the LDAP server.

All subsequent requests reaching SSSD during the connection phase are queued and answered once the connection succeeds or fails. In case the LDAP server is unreachable, SSSD waits for 6 seconds before the connection trial is aborted. This means that the user may experience a delay of up to 6 seconds every 60 seconds (in the worst case).

See the following debug logs where the LDAP server is not responding, starting off in offline mode:

The first request to SSSD (which is triggering the reconnection trial) gets answered right away in offline mode:

(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][idnumber=10011]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Request processed. Returned 1,11,Fast reply - offline
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): beginning to connect
...
SSSD is now trying to reconnect to the LDAP server.

Only the subsequent requests that are received while SSSD is trying to (re-)connect to the LDAP server are queued until the connection times out (for at most 6 seconds). These pending requests are causing the system to block:

...

(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][name=brauchle]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): waiting for connection to complete
...
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][idnumber=10011]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): waiting for connection to complete
...

--> this is the time where the system may be unresponsive for 6 seconds <--

...

(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x0020): Failed to connect, going offline (5 [Input/output error])
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_mark_offline]
(0x2000): Going offline!
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_run_offline_cb]
(0x0080): Going offline. Running callbacks.
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #1
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #2
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
(0x0100): Request processed. Returned 1,11,Offline
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #3
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
(0x0100): Request processed. Returned 1,11,Offline


After the connection times out, the queued request are answered with cached entries.

So why not keep the "offline" flag set to "true" until the LDAP connection trial returns (positive or negative) and only if positive, switch to online mode?

As the first request (triggering the reconnection) is answered from the cache anyway, there is no point to keep the subsequent ones pending until the connection is established successfully.

Possibly one needs to consider that start up phase (with cold caches) as a special case and actually do queue incoming request in this case?

Comments


Comment from endzone at 2014-06-06 14:19:32

Repost of the log files in readable format:

The *first* request to SSSD, answered from cache. Triggers reconnect afterwards:
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][idnumber=10011]
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Request processed. Returned 1,11,Fast reply - offline
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): beginning to connect
...
SSSD is now trying to reconnect to the LDAP server.

*Subsequent* requests are queued until the connection times out.
These pending requests are causing the system to block:

...
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][name=brauchle]
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): waiting for connection to complete
...
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][idnumber=10011]
>> (Tue Jun  3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): waiting for connection to complete
...

--> this is the time where the system may be unresponsive for 6 seconds <--

...
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x0020): Failed to connect, going offline (5 [Input/output error])
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [be_mark_offline]
>> (0x2000): Going offline!
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [be_run_offline_cb]
>> (0x0080): Going offline. Running callbacks.
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #1
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #2
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
>> (0x0100): Request processed. Returned 1,11,Offline
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #3
>> (Tue Jun  3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
>> (0x0100): Request processed. Returned 1,11,Offline

Comment from sbose at 2014-06-12 17:04:28

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.11.7


Comment from jhrozek at 2014-06-17 11:28:32

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1110226

rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1110226 1110226]


Comment from mzidek at 2014-06-18 14:04:46

Fields changed

owner: somebody => mzidek


Comment from mzidek at 2014-07-03 14:29:44

Fields changed

patch: 0 => 1


Comment from jhrozek at 2014-07-31 11:55:54


Comment from jhrozek at 2014-07-31 13:54:48

Pushed to sssd-1-11:
- f65efda
- e552a21
- e9ca61c

_comment0: Pushed to sssd-1-11:
f65efda
e552a21
e9ca61c
=> 1406807721234389
resolution: => fixed
status: new => closed


Comment from endzone at 2017-02-24 14:32:20

Metadata Update from @endzone:

  • Issue assigned to mzidek
  • Issue set to the milestone: SSSD 1.11.7
@sssd-bot sssd-bot added Bugzilla Closed: Fixed Issue was closed as fixed. labels May 2, 2020
@sssd-bot sssd-bot closed this as completed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.
Projects
None yet
Development

No branches or pull requests

2 participants