Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - toggle between refreshable and one-time instantiation of pac4j clients #5315

Closed
wants to merge 1 commit into from

Conversation

hsartoris-bard
Copy link
Contributor

A change was introduced in aa5c1c6 to allow runtime refreshing of Pac4j clients. Unfortunately, this is extremely costly, particularly when running several delegated clients: with a CAS client, an OIDC client, and (particularly slow) two SAML clients, we are experiencing hot load times of over a second for /cas/login as every one of those clients are built from scratch for each login flow.
The other effect of this behavior is to render the lazy-init flag essentially useless, as the clients aren't being persisted and are as such effectively always lazy initialized.

This patch more or less restores the code as it existed previously, using lazy-init as a toggle between using RefreshableDelegatedClients vs a static list. Though this is not included in the PR, setting the default value for lazy-init to true would almost certainly reflect the behavior most end users hope for - losing a second or two during startup is nothing compared to default-enabled, persistent performance degradation across all requests.

@apereocas-bot
Copy link
Contributor

Thank you very much for submitting this pull request!

The patch presented here seems to contain no adequate unit and/or integration tests. Patches that alter CAS core
functionality without proper tests, regardless of how small they may seem, are automatically closed. By test, we specifically
mean, unit/integration/browser/functional tests; a description and series of instructions on how one may go about
reproducing or testing the patch is excellent but not sufficient for acceptance. If the change-set here
modifies CAS functionality that is extremely and obviously trivial (i.e. fixes a typo) or updates the CAS documentation or
presents any other type of change that generally can be accepted without proper tests,
please comment and explain on the pull request.

Please re-open the pull request (or ask project maintainers to do this for you) when you're ready and have included tests to verify
functionality. If you feel like you are not immediately available or have no time to work out test cases and steps to
reproduce and verify the changes, that's not an issue. Please revisit the pull request as soon as it's convenient for you
to resume work as your schedule allows. If you'd like to keep the patch around and open while you make progress,
please mark the patch as WIP by including the word "WIP" in the pull request title,
or create the initial pull request in "Draft" status.

Furthermore,

  • To learn about CAS test process and how you may design and development unit/integration tests, please see this link. If you are unclear on how to write basic unit tests, please see this link.
  • See Contributor Guidelines to learn more about how to contribute to the project and review the criteria that allows for the project to accept contributions.
  • If you are seeking assistance and have a question about your CAS deployment, please visit this page to learn more about support options.
  • If your change-set attempts to solve a problem or defect, be sure to include tests to first and foremost demonstrate the issue. Tests must be able to reproduce the problem without changes and fixes first so that project maintainers can ascertain the validity of the issue scenario without resorting to manual steps. Tests and testing done after the fact is not acceptable.
  • If your change-set introduces a new CAS feature or improvement, please consider starting
    discussion with the project to see if your patch can be accepted and maintained by project owners and maintainers. Unless your change is
    extremely trivial and has been through proper review and is sponsored by a project maintainer, please do not spend time
    modifying code and configuration in a proposed pull request.

If you believe this message to be an error, please post your explanation here as a comment and it will be reviewed.

Thanks again!

@leleuj
Copy link
Contributor

leleuj commented Nov 30, 2021

@mmoayyed while the implementation is debattable, I think this PR raises a good question about the "refreshable" or not nature of the pac4j clients and the related drawbacks.

@mmoayyed mmoayyed changed the title toggle between refreshable and one-time instantiation of pac4j clients WIP - toggle between refreshable and one-time instantiation of pac4j clients Dec 1, 2021
@mmoayyed mmoayyed reopened this Dec 1, 2021
@mmoayyed
Copy link
Member

mmoayyed commented Dec 1, 2021

Thank you for the patch and the notes. What you describe is quite true, but the implementation here is not something we can work with, specially since it also contains no tests or a way to verify broken behavior or that of the fix. So I'll go ahead and close this for now, and we'll see if we can provide a remedy in future RCs for 6.5. Thank you!

@mmoayyed mmoayyed closed this Dec 1, 2021
@leleuj
Copy link
Contributor

leleuj commented Dec 1, 2021

Maybe the solution could rather be provided at the pac4j level.

Just to be sure, what is the reason we have the RefreshableDelegatedClients?

@hsartoris-bard
Copy link
Contributor Author

@leleuj This is, to me, an important question - is there any real deploy-time use case for RefreshableDelegatedClients as-is? I suppose that runtime reloading of configuration is one such case, but I will note that, in my time developing and administering multiple CAS instances, I have never achieved sufficient simplicity of configuration to actually allow this. Moreover, it seems to me that if such behavior is desired, it should be conditional on the underlying configuration having actually changed.

@leleuj
Copy link
Contributor

leleuj commented Dec 1, 2021

@mmoayyed will tell us the use case needed by RefreshableDelegatedClients and we'll try to find a better solution or a way to offer a static alternative.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

Since pac4j v5, the Clients component has gained some dynamism: you can add or remove clients on the fly.
It does not handle clients initialization or re-initialization, but this could certainly be done.
We could imagine having an init() method for the Clients component which loops through all the clients and initializes again the clients that have changed.

What do you think?

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

Or even an init method at the Client level which takes into account the fact that the client has changed or not... It might generate some side-effects... but why not?

My feeling is that if there is any issue, it should be solved by pac4j itself.

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

The use-case for refreshable clients is exactly is as the name suggests. We want to outsource the construction of identity providers to external systems, APIs who provide administration and management of clients in a way that is completely detached from CAS, allowing each client construction to change at will, and be managed entirely separately from the CAS configuration, without necessarily having to deal with the complexity and/or overhead of the Spring Cloud Config server and family.

Suffice it to say, if something is part of CAS, there is a valid, real, deploy-time use case for it to justify its existence with support and maintainer ability to take care of it. It also means somewhere someone is paying, one way or another for us maintainers to support and back it. If you don't have the simplicity or complexity to manage this type of setup, worry not. It's not for you. We work on 10s, sometimes 100s of deployments throughout the year; Not everything found in the codebase is right for everyone, and that's why the menu is large to choose from. However, this particular behavior should not be the default, and certainly should be controllable via configuration, and that is the case today in the current master with the most recent changes. So, there is nothing more to do here.

Furthermore, this is not something that should go into pac4j per se. One of the side-benefits of this change is the ability to push pac4j down the stack and not expose it directly to client code and integrations and customizations. A Clients interface has many other traits that make it hard for such direct integrations, and the facilities here act an intermediate abstraction and facade to bridge pac4j and CAS, handling caching, reloadability, serialization, clustered deployments, distributed nature of configuration, etc. It's not something we'd want pac4j to do, and it's something that one can already do with pac4j if necessary, outside the core library, and that is exactly what's happening here in CAS as well.

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

However, this particular behavior should not be the default, and certainly should be controllable via configuration, and that is the case today in the current master with the most recent changes. So, there is nothing more to do here.

I would encourage you to test and verify the changes in master today, and report feedback. When you do, please make sure you have tests that reproduce an issue, or verify a fix with tests. Fixes or feedback without tests takes time to verify, and it's free time we dont have any to spare. A non-tested non-testable fix, is a fix that does not exist.

I will also be doing a sweep some time later in the week, or the next to see if some things can be backported to 6.4.x.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

OK. Thanks for the thoroughful explanations. I think I get the point.

Taking a look at the master, I'm a bit worried by the source code:

    @Override
    public Collection<IndirectClient> build() {
        if (this.clients.isEmpty() || !casProperties.getAuthn().getPac4j().getCore().isLazyInit()) {
            this.clients.clear();
            configureCasClient(clients);
...
            configureHiOrgServerClient(clients);
        }
        return clients;
    }

We clear the list of the clients before adding again the clients. Under heavy loads and many requests, as the RefreshableDelegatedClients is not synchronized, I think this can lead some issue.

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

Good point! I think we have pac4j-level puppeteer tests that cover this. Can you see if you can duplicate this as an issue with a test?

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

...or perhaps:

private final Set<IndirectClient> clients = Collections.synchronizedSet(new LinkedHashSet<>());

?

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

I don't think a puppeteer test is appropriate here, it's really about heavy load and multiple requests at the same time.

The scenario would be: a first request comes in to trigger an authn delegation for the SAML2Client, the list is rebuilt. At the same time, a second request comes in to trigger an authn delegation for the CasClient, the list is also rebuilt.
The two list reconstructions happen at the same time, I'm not sure what can happen but I fear issues.

I think the option would be to rebuild a new list and just set it when it's ready:

    @Override
    public Collection<IndirectClient> build() {
        if (this.clients.isEmpty() || !casProperties.getAuthn().getPac4j().getCore().isLazyInit()) {
            val newClients = new ArraysList<Client>();
            configureCasClient(newClients);
...
            configureHiOrgServerClient(newClients);
            this.clients = newClients;
        }
        return clients;
    }

I will make a test and let you know.

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

I don't think a puppeteer test is appropriate here, it's really about heavy load and multiple requests at the same time.

It's exactly the sort of test puppeteer can help with; we have many other similar types of tests that do very similar things for ticket registries, specially JPA. But, any test is a good test as long as it produces an issue.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

Indeed, the JPA test does that. I didn't know that. I will see what I can do.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

OK. I spent quite some time on this.

First of all, I followed your advice and tried to create a puppeteer test which fails. Here is the PR: #5318
It's a basic clic on the "Github" button which is done on multiple requests at the same time.
I tried several settings but was unable to trigger an error.

Then, I took another approach based on the following demo: https://github.com/casinthecloud/cas-pac4j-oauth-demo and changed the CAS source code to create a fake delay of 10 secondes in the build method of the DefaultDelegatedClientFactory component "to simulate a concurrency":

    @Override
    public Collection<IndirectClient> build() {
        this.clients.clear();

        try {
            Thread.sleep(10000);
        } catch (final InterruptedException e) {
            LOGGER.error("Pause interrupted", e);
        }

        configureCasClient(clients);
        configureFacebookClient(clients);
        configureOidcClient(clients);
        configureOAuth20Client(clients);
        configureSamlClient(clients);
        configureTwitterClient(clients);
        configureDropBoxClient(clients);
        configureFoursquareClient(clients);
        configureGitHubClient(clients);
        configureGoogleClient(clients);
        configureWindowsLiveClient(clients);
        configureYahooClient(clients);
        configureLinkedInClient(clients);
        configurePayPalClient(clients);
        configureWordPressClient(clients);
        configureBitBucketClient(clients);
        configureHiOrgServerClient(clients);

        return clients;
    }

I open a login page on two tabs on my Chrome. On the first tab, I click on "CAS", on the second tab, I click on "SAML2CLIENT".

The second tab returns an issue: "Authorization Denied".

If I change the source code to the fix I proposed, the error is gone for the same test (two tabs, one for CAS, the other one for SAML):

    @Override
    public Collection<IndirectClient> build() {
        val newClients = new HashSet<IndirectClient>();

        try {
            Thread.sleep(10000);
        } catch (final InterruptedException e) {
            LOGGER.error("Pause interrupted", e);
        }

        configureCasClient(newClients);
        configureFacebookClient(newClients);
        configureOidcClient(newClients);
        configureOAuth20Client(newClients);
        configureSamlClient(newClients);
        configureTwitterClient(newClients);
        configureDropBoxClient(newClients);
        configureFoursquareClient(newClients);
        configureGitHubClient(newClients);
        configureGoogleClient(newClients);
        configureWindowsLiveClient(newClients);
        configureYahooClient(newClients);
        configureLinkedInClient(newClients);
        configurePayPalClient(newClients);
        configureWordPressClient(newClients);
        configureBitBucketClient(newClients);
        configureHiOrgServerClient(newClients);

        this.clients = newClients;
        return newClients;
    }

Does that make sense?

GitHub
CAS server demo to test the authentication delegation - GitHub - casinthecloud/cas-pac4j-oauth-demo: CAS server demo to test the authentication delegation

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

It does, yes. Does the fix go away if you switch to a synchronized set, or simply add @synchronized to the build method?

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

What I am saying, while I understand the issue, I think you'd run into the same problem with your proposed fix. It's a matter of when, no?

I think a more solid fix perhaps is to sync all ops that deal with the clients fields, and put a lock around that. Either manually using a lock object or using Synchronized.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

I haven't made the test yet, but I think @synchronized would solve the issue but not a synchronized set.

You're right, we could sync the whole ops that deal with the clients field, it seems the more cautious approach, but I fear performance issue.

With my proposed fix, despite multiple clients sets can be built at the same time, they don't interfere until the assignment to the shared clients field. And assignment is atomic in Java: so this clients field will always get a fully built list of clients.

@mmoayyed
Copy link
Member

mmoayyed commented Dec 2, 2021

I would be OK with that, sure. Do we need to worry about other methods/ops in that component that deal with the clients field? The destroy() function seems safe to me to ignore. If you agree, then I think your proposed patch should be the solution here as well.

@leleuj
Copy link
Contributor

leleuj commented Dec 2, 2021

I guess destroy would be called at shutdown by Spring, I don't think it can happen twice and I don't think it's a big deal, this seems safe indeed.
Will submit PR for 6.4 and 6.5.

@leleuj
Copy link
Contributor

leleuj commented Dec 3, 2021

To follow up on our discussion, I just opened the following PRs: #5319 and #5320

@mmoayyed
Copy link
Member

mmoayyed commented Dec 3, 2021

Thank you. I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants