New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ad: use parallel cldap ping for site discovery #5300
Conversation
|
@pbrezina I see linking failures: |
... cmocka based tests aren't compatible with LTO linker flag. This only affects few tests. In Fedora spec-file it is worked around quite bluntly: |
|
Hi, thanks for the patches. My tests with various configurations with multiple existing, not-existing or not responding DCs went well. Coverity didn't had anything to complain either. While reading the patches I was wondering if it would make sense to create a macro for bye, |
|
Hi, maybe there should be some limit in the for-loop so that SSSD does not accidentally tries to ping hundreds of DC in a larger AD environment. adcli implements a scheme described in https://winprotocoldoc.blob.core.windows.net/productionwindowsarchives/WinArchive/%5bMS-DISO%5d.pdf section 5.4.5.3 where first 5 DC are pinged with a 0.4s timeout, then the next 5 are pinged with a 0.2s timeout and finally all other with a timeout of 0.1s. Unfortunately I haven't found any newer document how Windows clients are doing CLDAP pings. bye, |
|
Do you mean there is 100ms timeout for the DC's to deliver a reply? That's quite low, isn't it? |
All Windows clients uses CLDAP (UDP) for LDAP ping. Even though AD also supports LDAP ping over TCP IPA does not therefore it is crusial for us to perform the ping over CLDAP protocol. Resolves: SSSD#5215
If I understand the MFST document correctly the client still sends the pings sequentially and the timeout is the time beofre sending the next ping. I guess the client would still accept replies from other DC pinged earlier. So it idea is more the other way round. The DCs are split in batches to not ping all DCs at once. Then give the first batch of DC a longer time to reply to increase the chance that the reply is received in that time, for the second batch wait already a bit shorter and for the rest even shorter to make sure that the waiting time does not depend too much on the number of DC. bye, |
|
So do you suggest to implement 3 batches that are sent at time T (5 servers), T+400ms (5 servers), T+600ms (remainder) or to iterate over it sequentially with slow delay before going to next dc (to ping first five servers will take 4005, the next five 2005 and the remaining 100*X)? |
Hi, I was thinking of keeping the parallel pings but doing it in batches, so that in the typical/positive case we do not have to send pings to all DCs. bye, |
06b3628
to
c1c360f
Compare
|
Done. I added |
|
Hi, thanks for the changes, the patches are working well and with To make sure people will understand in future why the batch sizes and waiting times were chosen this way I wonder if you can add a reference to [MS-DISO] section 5.4.5.3 to the comment? bye, |
Previous implementation would not fallback to the off-site domain controllers. This would cause problems if the site actually changed.
Site and forest information is stable not dynamic. To avoid spamming network with cldap pings all the time we will renew netlogon information only when SSSD starts and when we are recovering from an offline state to detect possible change (e.g. user moves to another location with laptop).
c1c360f
to
7eb509b
Compare
|
Done. I only changed comment in |
|
Hi, thanks, ACK. bye, |
|
Pushed PR: #5300
|
|
Hi @pbrezina, I came across an issue with is needed to make sure that due to the bye, |
|
JFTR, the above comment was addressed via 37ba37a |
This PR adds support for CLDAP (connectionless LDAP over UDP) and makes
use of it during site discovery.
The discovery process is improved, it now sends the CLDAP ping to all
discovered domain controllers at once to avoid potential timeouts when
some of them are unreachable (which is quite common in real environment).
The first received reply is used.
The netlogon information is now requested only when SSSD starts and when
we are recovering from offline state To avoid swamping network with many
ping. This is quite fine given the site and forest information is not
dynamic but usually stable.
Additional improvement is that if we already know client's site we ping
only in-site controllers first. We were already doing that but now we
also fallback to off-site controllers in case none in-site controller
is unreachable. Previously we would just fail.
There is a bug in openldap [1] that makes SSSD wait for timeout in-case none
of the controller is reachable, but as long as at least one is reachble
the resolution will be fast.
Resolves:
#3743
#5215
[1] https://bugs.openldap.org/show_bug.cgi?id=9328