Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sssd SRV hardcoded timeouts (and general HA gripes) #3553

Closed
sssd-bot opened this issue May 2, 2020 · 0 comments
Closed

sssd SRV hardcoded timeouts (and general HA gripes) #3553

sssd-bot opened this issue May 2, 2020 · 0 comments

Comments

@sssd-bot
Copy link

sssd-bot commented May 2, 2020

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/2511


Given an environment that consists of multiple authentication nodes you can configure sssd to access these destinations via ordered list or SRV records.

The issue here is that neither of these methods scale well.

Example 1 - ordered list

ipa_server = a, b, c, d
krb5_server = a, b, c, d
ldap_uri = a, b, c, d

Given 3500 servers with this configuration, all of those servers will use server A until it fails, and then move on to B, then C, etc.

This can act on your environment like a wrecking ball as mass load moves around.

Example 2 - Replace the ordered list with SRV records

This seems better except for the fact that you guys hardcode a timelimit/dont respect SRV TTL's.

src/providers/data_provider_fo.c:73
opts->srv_retry_timeout = 14400;

At a minimum SSSD should respect SRV TTLs so I can try and round robin.

Ideally (!) there should be some sort of HA/Load balanace scheme for lists of providers.

Help :)

Comments


Comment from dpal at 2014-11-29 07:06:27

Sounds like a dup of #1884.


Comment from gprocunier at 2014-12-02 17:55:19

Sure, but that ticket was opened 20 months ago (with no public updates) and I have provided two use cases where the current behavior is causing problems in our environment.

In the ordered list scenario, given enough servers you can create a Denial of Service from volume.


Comment from jhrozek at 2014-12-04 17:29:52

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.13 beta


Comment from jhrozek at 2014-12-05 21:12:20

This ticket and #1884 was requested via a downstream support case. Moving back to NEEDS_TRIAGE.

milestone: SSSD 1.13 beta => NEEDS_TRIAGE


Comment from jhrozek at 2014-12-07 21:23:15

Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1171376 (Red Hat Enterprise Linux 6)

rhbz: => [https://bugzilla.redhat.com/show_bug.cgi?id=1171376 1171376]


Comment from jhrozek at 2014-12-08 15:01:06

Fields changed

owner: somebody => jhrozek
status: new => assigned


Comment from jhrozek at 2015-01-29 15:38:09

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.12.4


Comment from jhrozek at 2015-02-17 21:03:11

We're not going to implement example 1 per se in 1.12, but something similar instead -- in 1.13, we're going to implement ticket #2499 which would handle the HA scenario using a single host name that resolves into multiple IP addresses.

For 1.12, I've just sent a patch for ticket #1884 that implements honoring the SRV TTL values. I'm therefore closing this bug as duplicate of #1884. Please let me know if you'd like some test packages (just the RHEL/Fedora release is fine) and I'll build them.

Thank you for your patience.

resolution: => duplicate
status: assigned => closed


Comment from gprocunier at 2017-02-24 14:24:24

Metadata Update from @gprocunier:

  • Issue assigned to jhrozek
  • Issue set to the milestone: SSSD 1.12.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants