Skip to content

New failover implementation#8566

Open
pbrezina wants to merge 10 commits intoSSSD:failoverfrom
pbrezina:failover
Open

New failover implementation#8566
pbrezina wants to merge 10 commits intoSSSD:failoverfrom
pbrezina:failover

Conversation

@pbrezina
Copy link
Copy Markdown
Member

This pull request is intended to be a start of a "failover" feature branch where other developers will be able to contribute.

The main failover logic works, compiles and can be tested using a "minimal" provider that is included as an example. The purpose of the "minimal" provider is only to test the failover without the need to port full provider code and itwill be removed prior pushing the contents to the master branch. See how to set it up in minimal-provider-notes.txt and see the switch to new failover in commit minimal: switch to new failover for service lookup and user authentication - this is the minimal set of changes to get it working, but the real port should get and will require more refactoring.

The work is still not finished and there is missing functionality. This functionality, however, can be implemented in small areas of code and should not require larger changes or glues in the whole code base, so this is ready for review. Remaining work is tracked at [1]. Feel free to take any of these tickets and open new tickets when you find something missing.

When reviewing, you can start with src/providers/failover/readme.md that provides high level documentation of the code. And of course do not forget the design page [2].

Thanks, Pavel

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a new failover mechanism for SSSD, introducing prioritized server groups, parallelized candidate server discovery, and a transaction-based API for automated retries. It also provides a minimal provider implementation to demonstrate the new architecture. Critical logic bugs were identified in the server group resolution logic, where duplicate detection causes premature loop exit, and in the address change detection function, which currently returns inverted results.

Comment thread src/providers/failover/failover_group.c Outdated
Comment thread src/providers/failover/failover_server_resolve.c Outdated
Comment thread src/providers/failover/failover_group.c Dismissed
Comment thread src/providers/minimal/minimal_id.c Dismissed
#include "providers/failover/ldap/failover_ldap.h"

static errno_t
find_password_expiration_attributes(TALLOC_CTX *mem_ctx,

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 114 lines.
Comment on lines +72 to +90
switch (ar->entry_type & BE_REQ_TYPE_MASK) {
case BE_REQ_SERVICES:
DEBUG(SSSDBG_TRACE_FUNC, "Executing BE_REQ_SERVICES request\n");

subreq = minimal_services_get_send(state, be_ctx->ev, fctx, id_ctx,
sdom, ar->filter_value,
ar->extra_value, ar->filter_type,
noexist_delete);
break;
default: /*fail*/
ret = EINVAL;
state->err = "Invalid request type";
DEBUG(SSSDBG_OP_FAILURE,
"Unexpected request type: 0x%X [%s:%s] in %s\n",
ar->entry_type, ar->filter_value,
ar->extra_value?ar->extra_value:"-",
ar->domain);
goto done;
}

Check notice

Code scanning / CodeQL

No trivial switch statements Note

This switch statement should either handle more cases, or be rewritten as an if statement.
Comment on lines +120 to +128
switch (state->ar->entry_type & BE_REQ_TYPE_MASK) {
case BE_REQ_SERVICES:
err = "Service lookup failed";
ret = minimal_services_get_recv(subreq);
break;
default: /* fail */
ret = EINVAL;
break;
}

Check notice

Code scanning / CodeQL

No trivial switch statements Note

This switch statement should either handle more cases, or be rewritten as an if statement.
Comment on lines +212 to +220
// TODO handle how to yield ERR_SERVER_FAILED
// ret = sdap_id_op_done(state->op, ret, &dp_error);
// if (dp_error == DP_ERR_OK && ret != EOK) {
// /* retry */
// ret = minimal_services_get_retry(req);
// if (ret != EOK) {
// tevent_req_error(req, ret);
// return;
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment on lines +222 to +225
// /* Return to the mainloop to retry */
// return;
// }
// state->sdap_ret = ret;

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment thread src/providers/minimal/minimal_id_services.c Dismissed
@alexey-tikhonov alexey-tikhonov self-assigned this Apr 1, 2026
@alexey-tikhonov alexey-tikhonov added Waiting for review no-backport This should go to target branch only. labels Apr 1, 2026
@alexey-tikhonov
Copy link
Copy Markdown
Member

@pbrezina, is it expected CI fails to build?

src/providers/minimal/minimal_id.c:28:10: fatal error: providers/minimal/minimal.h: No such file or directory

@pbrezina pbrezina force-pushed the failover branch 2 times, most recently from 0570a63 to 2a2c475 Compare April 13, 2026 09:09
@alexey-tikhonov
Copy link
Copy Markdown
Member

alexey-tikhonov commented Apr 13, 2026

@pbrezina,
re: "oidc_child: parameterize entra_idp url" (and other) commits being included in this PR: imo, it's better to rebase base branch (https://github.com/SSSD/sssd/tree/failover) and not current PR in review (https://github.com/pbrezina/sssd/tree/failover)

@alexey-tikhonov
Copy link
Copy Markdown
Member

FreeBSD CI doesn't have required headers installed:


  src/providers/minimal/minimal_ldap_auth.c:32:10: fatal error: 'shadow.h' file not found
     32 | #include <shadow.h>
        |          ^~~~~~~~~~

While 'minimal' isn't going to be merged in main repo branches, this 'fail to build' can hide other issues.

@pbrezina pbrezina force-pushed the failover branch 4 times, most recently from 1fe8626 to ed511ad Compare April 13, 2026 10:47
…pec file

Add the sssd-minimal provider package to the spec file following the
same pattern as other providers (ldap, ipa, ad, etc.). This packages
the libsss_minimal.so library that was added in recent commits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@pbrezina
Copy link
Copy Markdown
Member Author

Now it is fixed. There were missing headers in noinst_HEADERS and some other problems. I reordered the commits and every commit for testing only is clearly marked to not go to master.

Reviewer needs to pay attention only to the "failover" commit, other commits are just for testing and a demostration.

And also disable codeql for the minimal provider. The
provider is for testing only, it does not make sense to
fix any issue there.
This crafts and implements the new failover interface,
it does not provide complete implementation of the failover
mechanism yet. It brings the code to a state were the public
and private interfaces are stable, working and testable so
the following tasks can be split and work on in parallel.

What is missing at this state:
- server configuration and discovery
  (failover_server_group/batch/vtable_op)
- server selection mechanism (sss_failover_vtable_op_server_next)
- kerberos authentication
- sharing servers between IPA/AD LDAP and KDC
- online/offline callbacks (resolve callback should not be needed)

But especially it is possible to start refactoring SSSD code to start
using the new failover implementation.
@alexey-tikhonov
Copy link
Copy Markdown
Member

alexey-tikhonov commented Apr 16, 2026

@pbrezina, would it be difficult to include a 'system' test using "minimal" provider and covering any failover scenario?

If it's difficult then disregard as test would be discarded eventually.


### Failover Context

* [sss_failover.c]()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual files are named without 'sss_' prefix.

@alexey-tikhonov
Copy link
Copy Markdown
Member

What is missing at this state:

  • server configuration and discovery
    (failover_server_group/batch/vtable_op)
  • server selection mechanism (sss_failover_vtable_op_server_next)
  • kerberos authentication
  • sharing servers between IPA/AD LDAP and KDC
  • online/offline callbacks (resolve callback should not be needed)

Periodic refreshes are also not yet implemented, right?

}

/* Switch the attempt_req state to caller_req state so it is used seamlessly
* by the user. This is quite a hack and the attempt_state must stay
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. No other reasonable way?

errno_t ret;

state = tevent_req_data(req, struct sss_failover_transaction_state);
state->attempts++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to limit attempts?

}

void
sss_failover_server_mark_reachable(struct sss_failover_server *srv)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's not used anywhere.
Shouldn't it be called from sss_failover_ping_done()?


state->current_group++;
ret = sss_failover_refresh_candidates_group_next(req);
if (ret != EOK) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should also return for ret == EOK as well?
Otherwise it will be tevent_req_done(req); below?

bool addr_changed,
bool reuse_connection,
bool authenticate_connection,
bool read_rootdse,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are pretty much LDAP specific.
Is this a good fit for abstract vtable API?

@alexey-tikhonov
Copy link
Copy Markdown
Member

I did only overview / preliminary round and not sure at all if any of my comments are valid.
But setting 'changes requested' nonetheless to get a response for my comments to check my understanding so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changes requested no-backport This should go to target branch only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants