You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using SSSD with "cache_credentials = true", users may experience periodic blocking for up to 6 seconds if SSSD is switching from offline to online mode and the LDAP server is unreachable.
The first request to SSSD after it has been offline for more than 60 seconds is immediately answered from the cache, but then triggers a reconnection trial to the LDAP server.
All subsequent requests reaching SSSD during the connection phase are queued and answered once the connection succeeds or fails. In case the LDAP server is unreachable, SSSD waits for 6 seconds before the connection trial is aborted. This means that the user may experience a delay of up to 6 seconds every 60 seconds (in the worst case).
See the following debug logs where the LDAP server is not responding, starting off in offline mode:
The first request to SSSD (which is triggering the reconnection trial) gets answered right away in offline mode:
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][idnumber=10011]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Request processed. Returned 1,11,Fast reply - offline
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): beginning to connect
...
SSSD is now trying to reconnect to the LDAP server.
Only the subsequent requests that are received while SSSD is trying to (re-)connect to the LDAP server are queued until the connection times out (for at most 6 seconds). These pending requests are causing the system to block:
...
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][name=brauchle]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): waiting for connection to complete
...
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
(0x0100): Got request for [4097][1][idnumber=10011]
(Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
(0x4000): waiting for connection to complete
...
--> this is the time where the system may be unresponsive for 6 seconds <--
...
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x0020): Failed to connect, going offline (5 [Input/output error])
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_mark_offline]
(0x2000): Going offline!
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_run_offline_cb]
(0x0080): Going offline. Running callbacks.
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #1
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #2
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
(0x0100): Request processed. Returned 1,11,Offline
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
(0x4000): notify offline to op #3
(Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
(0x0100): Request processed. Returned 1,11,Offline
After the connection times out, the queued request are answered with cached entries.
So why not keep the "offline" flag set to "true" until the LDAP connection trial returns (positive or negative) and only if positive, switch to online mode?
As the first request (triggering the reconnection) is answered from the cache anyway, there is no point to keep the subsequent ones pending until the connection is established successfully.
Possibly one needs to consider that start up phase (with cold caches) as a special case and actually do queue incoming request in this case?
The *first* request to SSSD, answered from cache. Triggers reconnect afterwards:
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][idnumber=10011]
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Request processed. Returned 1,11,Fast reply - offline
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): beginning to connect
...
SSSD is now trying to reconnect to the LDAP server.
*Subsequent* requests are queued until the connection times out.
These pending requests are causing the system to block:
...
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][name=brauchle]
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): waiting for connection to complete
...
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [be_get_account_info]
>> (0x0100): Got request for [4097][1][idnumber=10011]
>> (Tue Jun 3 08:16:42 2014) [sssd[be[default]]] [sdap_id_op_connect_step]
>> (0x4000): waiting for connection to complete
...
--> this is the time where the system may be unresponsive for 6 seconds <--
...
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x0020): Failed to connect, going offline (5 [Input/output error])
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_mark_offline]
>> (0x2000): Going offline!
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [be_run_offline_cb]
>> (0x0080): Going offline. Running callbacks.
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #1
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #2
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
>> (0x0100): Request processed. Returned 1,11,Offline
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [sdap_id_op_connect_done]
>> (0x4000): notify offline to op #3
>> (Tue Jun 3 08:16:48 2014) [sssd[be[default]]] [acctinfo_callback]
>> (0x0100): Request processed. Returned 1,11,Offline
Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/2355
Using SSSD with "cache_credentials = true", users may experience periodic blocking for up to 6 seconds if SSSD is switching from offline to online mode and the LDAP server is unreachable.
The first request to SSSD after it has been offline for more than 60 seconds is immediately answered from the cache, but then triggers a reconnection trial to the LDAP server.
All subsequent requests reaching SSSD during the connection phase are queued and answered once the connection succeeds or fails. In case the LDAP server is unreachable, SSSD waits for 6 seconds before the connection trial is aborted. This means that the user may experience a delay of up to 6 seconds every 60 seconds (in the worst case).
See the following debug logs where the LDAP server is not responding, starting off in offline mode:
The first request to SSSD (which is triggering the reconnection trial) gets answered right away in offline mode:
Only the subsequent requests that are received while SSSD is trying to (re-)connect to the LDAP server are queued until the connection times out (for at most 6 seconds). These pending requests are causing the system to block:
...
--> this is the time where the system may be unresponsive for 6 seconds <--
...
After the connection times out, the queued request are answered with cached entries.
So why not keep the "offline" flag set to "true" until the LDAP connection trial returns (positive or negative) and only if positive, switch to online mode?
As the first request (triggering the reconnection) is answered from the cache anyway, there is no point to keep the subsequent ones pending until the connection is established successfully.
Possibly one needs to consider that start up phase (with cold caches) as a special case and actually do queue incoming request in this case?
Comments
Comment from endzone at 2014-06-06 14:19:32
Repost of the log files in readable format:
Comment from sbose at 2014-06-12 17:04:28
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.11.7
Comment from jhrozek at 2014-06-17 11:28:32
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1110226
rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1110226 1110226]
Comment from mzidek at 2014-06-18 14:04:46
Fields changed
owner: somebody => mzidek
Comment from mzidek at 2014-07-03 14:29:44
Fields changed
patch: 0 => 1
Comment from jhrozek at 2014-07-31 11:55:54
Comment from jhrozek at 2014-07-31 13:54:48
Pushed to sssd-1-11:
- f65efda
- e552a21
- e9ca61c
_comment0: Pushed to sssd-1-11:
f65efda
e552a21
e9ca61c
=> 1406807721234389
resolution: => fixed
status: new => closed
Comment from endzone at 2017-02-24 14:32:20
Metadata Update from @endzone:
The text was updated successfully, but these errors were encountered: