You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The context should keep a list of the servers that it communicated since the start of the SSSD. This list is updated when the connection is lost and SSSD detected that it needs to try to fail over. This list actually set of key value pairs where the key is the name of the host and the value is the time stamp of the moment the connection was identified as broken. Currently the fail over code (I am not sure since I have not dived into the details but it seems to) iterates through the servers read from the configuration file. If instead the list suggested above is populated at the start of the SSSD and carried in the context the fail over code would not need (and I would argue should not) to consult the configuration. Instead it will look at the list and pick the server that it tried longest time ago. If this time is still less then some predefined period say 60 seconds (configurable) it will not try to fail over but rather return offline status right away. The set can be implemented as hash table. However collection might be better since we can use a pinned iterator and that would allow less processing (sorting, matching etc.) to detect the which server to connect to.
Currently the code assumes that if one server is not available then this is a server outage and some other server might be available, however this is usually not the case nowadays. If one of the services is not available it is most likely due to the network issues around client and it should be expected that the client is offline rather than one of the servers. I suggest that we add a new parameter into the configuration file that would be named something like "client_failover_mode" and would have the following values:[[BR]][[BR]]
mobile - this would mean that the client is a mobile device or laptop. It will assume by default that if it can't connect to a server it is completely offline. However next time it needs to connect it will try a different server from the list
server (default) - assume that it is a server outage not a client networking problem (current mode)
vm - I feel that for VMs we would need to have a different set of assumptions so I suspect we would have to do something different from the cases above. This can be deferred but the point is that the configuration should not be a simple Boolean. I imagine cases when retrying the same server once again would be preferable.
Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/743
Current logic forces LDAP provider to retry same server is just had a communication error with.
Comments
Comment from dpal at 2010-12-18 23:00:21
Couple design ideas:
The context should keep a list of the servers that it communicated since the start of the SSSD. This list is updated when the connection is lost and SSSD detected that it needs to try to fail over. This list actually set of key value pairs where the key is the name of the host and the value is the time stamp of the moment the connection was identified as broken. Currently the fail over code (I am not sure since I have not dived into the details but it seems to) iterates through the servers read from the configuration file. If instead the list suggested above is populated at the start of the SSSD and carried in the context the fail over code would not need (and I would argue should not) to consult the configuration. Instead it will look at the list and pick the server that it tried longest time ago. If this time is still less then some predefined period say 60 seconds (configurable) it will not try to fail over but rather return offline status right away. The set can be implemented as hash table. However collection might be better since we can use a pinned iterator and that would allow less processing (sorting, matching etc.) to detect the which server to connect to.
Currently the code assumes that if one server is not available then this is a server outage and some other server might be available, however this is usually not the case nowadays. If one of the services is not available it is most likely due to the network issues around client and it should be expected that the client is offline rather than one of the servers. I suggest that we add a new parameter into the configuration file that would be named something like "client_failover_mode" and would have the following values:[[BR]][[BR]]
Comment from sbose at 2010-12-20 13:16:49
I think I've found the possilbe reason a second server was tired, see 'Avoid multiple initializations in LDAP provider' on sssd-devel list.
Which OpenLDAP version are you using 2.3.x or 2.4.x?
Comment from sgallagh at 2010-12-21 15:15:24
Fields changed
component: SSSD => Failover
milestone: NEEDS_TRIAGE => SSSD 1.5.1
owner: somebody => sgallagh
Comment from sgallagh at 2011-01-04 15:30:10
Fields changed
owner: sgallagh => sbose
Comment from sgallagh at 2011-01-27 19:15:01
Fields changed
milestone: SSSD 1.5.1 => SSSD 1.5.2
upgrade: => 0
Comment from dpal at 2011-02-03 15:12:32
Fields changed
resolution: => worksforme
status: new => closed
Comment from dpal at 2012-01-19 03:08:14
Fields changed
rhbz: => 0
Comment from dpal at 2017-02-24 14:37:51
Metadata Update from @dpal:
The text was updated successfully, but these errors were encountered: