public-keys endpoint causing high db load #1145

nulian · 2021-03-16T13:17:20Z

We have a fusionauth setup with a lot of applications and with even more keys that makes the /api/jwt/public-keys?applicationId=11111111-1111-1111-1111-111111111111 endpoint very slow.

Was monitoring the sql queries that fusionauth does and it queries applications with active = true which isn't an indexed column.

After that it seems to query all keys ordered by insert_instant which also isn't indexed.
And then probably filter client side the correct key which probably could be optimized by searching for the just the keys needed.

We can probably partly fix the performance on our side by adding some indexes.
But would be nice for more optimized query on fetching the keys also and that we don't have to do custom alteration on the fusionauth database.

The text was updated successfully, but these errors were encountered:

mooreds · 2021-03-16T14:57:20Z

@nulian what version of FusionAuth are you running?

Would love to hear if you apply indices which make this faster. If you do, please share them.

Also, another option would be to put a caching proxy in front of that API; I'm not sure how often your public keys change, but a proxy that caches them even for 10 minutes would help.

Related issue: #374

robotdan · 2021-03-16T19:20:18Z

We are doing a single query on the applications table by a primary key - I don't think indexing active is necessary. If you find otherwise please share your results.

As a side note, we don't support modifications to our schema, so if you are not using our support, then go for it! :-) But we can't support you once you have modified the schema.

We can look at this one, but a better choice would be to use the JSON Web Key Set endpoint (JWKS).
https://fusionauth.io/docs/v1/tech/oauth/endpoints/#json-web-key-set-jwks

This will contain the public keys as well, but in an industry standard format that can be consumed by third party libraries, etc. We wrote the Public Key endpoint long before we supported JWKS.

How many applications do you have, and how many keys do you have managed in Key Master?

Johpie · 2021-03-18T10:23:56Z

Hi @mooreds and @robotdan,

I'm a colleague of @nulian and can provide some more details. We discussed the problem internally and a different team is going to look into the JSON Web Key Set (JWKS) to see if that's a better solution for us (but probably gonna take some time to implement).

In the meanwhile I can provide some more details of our setup:

Current fusionauth version: 1.20.0
Current elasticsearch version: 7.6.2

We are running fusionauth on AKS, which currently runs with 6 nodes. We are running 6 fusionauth pods distributed over those six nodes which are limited on 10 gb memory per pod. For the database we use "Azure Database for MySQL server" (version 5.7.32) configured with 8 vCores and limited on 1500 iops. (Before we ran into problems with this query we ran on 2 vCores).

Current amount of applications: ~11000 (This is going to grow to around 45000 soon) (49 mb data+index size)
Current amount of keys: ~33500 (198 mb data+index size)

The troubling query:

SELECT
     k.id                  AS k_id,
     k.algorithm           AS k_algorithm,
     k.certificate         AS k_certificate,
     k.expiration_instant  AS k_expiration_instant,
     k.insert_instant      AS k_insert_instant,
     k.issuer              AS k_issuer,
     k.kid                 AS k_kid,
     k.last_update_instant AS k_last_update_instant,
     k.name                AS k_name,
     k.private_key         AS k_private_key,
     k.public_key          AS k_public_key,
     k.secret              AS k_secret,
     k.type                AS k_type
     FROM `keys` as k
    ORDER BY k.insert_instant DESC;

When I run a EXPLAIN on this query, it is using filesort, meaning that it can't use an index to perform the sorting. When profiling this query it currently takes around 28~30 seconds to complete. And in our case it can happen that this same query is running multiple times at once.

We do have a caching mechanism implemented which releases the stress a bit on the DB server, but we still think adding indexes on the right column could help (we still have to test this in our test environment). Will come back with those results when we have them.

mooreds · 2021-03-18T13:50:29Z

Thanks for the details @Johpie and @nulian , looking forward to the results of your testing. My guess is that this will always do a filesort because the app is doing a tablescan. I'm not sure why we need to sort by insert_instant, tbh.

Are these 11000 applications all in one tenant or are they distributed across tenants?

As an aside, would your team/company be interested in being featured on the FusionAuth blog? It sure sounds like you are doing interesting things with FusionAuth. We do emailed interviews and publish posts like this one: https://fusionauth.io/blog/2020/11/18/reconinfosec-fusionauth/ If so, please email me at dan at fusionauth dot io and I'll send you over the questions.

robotdan · 2021-03-18T14:17:15Z

Linked to #374 - that issue is going to have to be on a short list - shortly. :-)

nulian · 2021-03-18T16:06:47Z

All the apps are in 1 tenant because we have 1 shared user database.

I think for the public-keys it would already make a big different if it uses the keys it looks up from the application table and only query for those in the keys table. Currently it still fetches the entire table while it could have just done a limited keys search in the keys table.

Johpie · 2021-03-22T11:19:17Z

After some investigation in our test environment, we concluded adding a index on the keys table is not going to help. Also removing the order by clause didn't help. We think it's just the sheer size of the table being transferred every time the query is called is killing our performance right now.

Not sure if we should close this issue, but I'll leave that up to you guys :-)

mooreds · 2021-03-22T11:29:59Z

Did you test the JWKS endpoint as well?

nulian · 2021-03-22T12:15:05Z

Had a look it's a bit faster 850ms vs 3.5s.
Biggest downside to that is everytime a customer on our platform creates a new app we will have to redownload that entire jwks key registry again.
It's 19.6MB at the moment and will probably keep growing.
So a filter would help a lot like that we could get the jwks of a certain kid or applicationId.

robotdan · 2021-03-22T13:42:01Z

JWKS is a spec, as far as I know it does not define a way to request a key by kid.

We can keep this issue open, I’d like to make the public key api a bit faster at this scale.

However, neither of these endpoints are really designed to hit on a request scope.

I would suggest identifying a way to get what you need from the response, build your JWT verifier or whatever you need and then don’t ask for the public keys again until you know you have added another client or you get a cache miss on your JWT verifier by kid.

Because even if we cache this, the HTTP response payload is going to be large enough that the transfer over the wire is going to add noticeable latency.

nulian · 2021-03-22T14:35:08Z

That is mostly our current implementation with the public-keys endpoint.
We request it the first time and then cache it till it no longer matches to reduce the load on fusionauth.
Though we do it per application instead of requesting the entire list.
And current caching implementation is per pod to reduce the complexity.

robotdan · 2021-03-22T15:49:41Z

Ok, great, thanks for the update @nulian . We can use this ticket to review this endpoint and see what we can do to make it perform much better under this type of scale. Thanks for using FusionAuth!

robotdan · 2021-09-24T18:08:04Z

This should be fixed in upcoming 1.30.2 release. We made some modifications to the way this endpoint is performing application and tenant lookups.

mooreds added the needs more info label Mar 16, 2021

robotdan added enhancement New feature or request and removed needs more info labels Mar 16, 2021

robotdan mentioned this issue Mar 18, 2021

Pagination for everyone! And all things! #374

Open

18 tasks

robotdan added the scale label Mar 22, 2021

robotdan self-assigned this Sep 24, 2021

robotdan added this to the 1.30.2 milestone Sep 24, 2021

robotdan closed this as completed Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

public-keys endpoint causing high db load #1145

public-keys endpoint causing high db load #1145

nulian commented Mar 16, 2021 •

edited

Loading

mooreds commented Mar 16, 2021

robotdan commented Mar 16, 2021 •

edited

Loading

Johpie commented Mar 18, 2021 •

edited

Loading

mooreds commented Mar 18, 2021

robotdan commented Mar 18, 2021

nulian commented Mar 18, 2021

Johpie commented Mar 22, 2021

mooreds commented Mar 22, 2021

nulian commented Mar 22, 2021 •

edited

Loading

robotdan commented Mar 22, 2021

nulian commented Mar 22, 2021 •

edited

Loading

robotdan commented Mar 22, 2021

robotdan commented Sep 24, 2021

public-keys endpoint causing high db load #1145

public-keys endpoint causing high db load #1145

Comments

nulian commented Mar 16, 2021 • edited Loading

mooreds commented Mar 16, 2021

robotdan commented Mar 16, 2021 • edited Loading

Johpie commented Mar 18, 2021 • edited Loading

mooreds commented Mar 18, 2021

robotdan commented Mar 18, 2021

nulian commented Mar 18, 2021

Johpie commented Mar 22, 2021

mooreds commented Mar 22, 2021

nulian commented Mar 22, 2021 • edited Loading

robotdan commented Mar 22, 2021

nulian commented Mar 22, 2021 • edited Loading

robotdan commented Mar 22, 2021

robotdan commented Sep 24, 2021

nulian commented Mar 16, 2021 •

edited

Loading

robotdan commented Mar 16, 2021 •

edited

Loading

Johpie commented Mar 18, 2021 •

edited

Loading

nulian commented Mar 22, 2021 •

edited

Loading

nulian commented Mar 22, 2021 •

edited

Loading