-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
public-keys endpoint causing high db load #1145
Comments
@nulian what version of FusionAuth are you running? Would love to hear if you apply indices which make this faster. If you do, please share them. Also, another option would be to put a caching proxy in front of that API; I'm not sure how often your public keys change, but a proxy that caches them even for 10 minutes would help. Related issue: #374 |
We are doing a single query on the As a side note, we don't support modifications to our schema, so if you are not using our support, then go for it! :-) But we can't support you once you have modified the schema. We can look at this one, but a better choice would be to use the JSON Web Key Set endpoint (JWKS). This will contain the public keys as well, but in an industry standard format that can be consumed by third party libraries, etc. We wrote the Public Key endpoint long before we supported JWKS. How many applications do you have, and how many keys do you have managed in Key Master? |
I'm a colleague of @nulian and can provide some more details. We discussed the problem internally and a different team is going to look into the JSON Web Key Set (JWKS) to see if that's a better solution for us (but probably gonna take some time to implement). In the meanwhile I can provide some more details of our setup: Current fusionauth version: 1.20.0 We are running fusionauth on AKS, which currently runs with 6 nodes. We are running 6 fusionauth pods distributed over those six nodes which are limited on 10 gb memory per pod. For the database we use "Azure Database for MySQL server" (version 5.7.32) configured with 8 vCores and limited on 1500 iops. (Before we ran into problems with this query we ran on 2 vCores). Current amount of applications: ~11000 (This is going to grow to around 45000 soon) (49 mb data+index size) The troubling query:
When I run a EXPLAIN on this query, it is using filesort, meaning that it can't use an index to perform the sorting. When profiling this query it currently takes around 28~30 seconds to complete. And in our case it can happen that this same query is running multiple times at once. We do have a caching mechanism implemented which releases the stress a bit on the DB server, but we still think adding indexes on the right column could help (we still have to test this in our test environment). Will come back with those results when we have them. |
Thanks for the details @Johpie and @nulian , looking forward to the results of your testing. My guess is that this will always do a filesort because the app is doing a tablescan. I'm not sure why we need to sort by Are these 11000 applications all in one tenant or are they distributed across tenants? As an aside, would your team/company be interested in being featured on the FusionAuth blog? It sure sounds like you are doing interesting things with FusionAuth. We do emailed interviews and publish posts like this one: https://fusionauth.io/blog/2020/11/18/reconinfosec-fusionauth/ If so, please email me at dan at fusionauth dot io and I'll send you over the questions. |
Linked to #374 - that issue is going to have to be on a short list - shortly. :-) |
All the apps are in 1 tenant because we have 1 shared user database. I think for the public-keys it would already make a big different if it uses the keys it looks up from the application table and only query for those in the keys table. Currently it still fetches the entire table while it could have just done a limited keys search in the keys table. |
After some investigation in our test environment, we concluded adding a index on the keys table is not going to help. Also removing the order by clause didn't help. We think it's just the sheer size of the table being transferred every time the query is called is killing our performance right now. Not sure if we should close this issue, but I'll leave that up to you guys :-) |
Did you test the JWKS endpoint as well? |
Had a look it's a bit faster 850ms vs 3.5s. |
JWKS is a spec, as far as I know it does not define a way to request a key by kid. We can keep this issue open, I’d like to make the public key api a bit faster at this scale. However, neither of these endpoints are really designed to hit on a request scope. I would suggest identifying a way to get what you need from the response, build your JWT verifier or whatever you need and then don’t ask for the public keys again until you know you have added another client or you get a cache miss on your JWT verifier by kid. Because even if we cache this, the HTTP response payload is going to be large enough that the transfer over the wire is going to add noticeable latency. |
That is mostly our current implementation with the public-keys endpoint. |
Ok, great, thanks for the update @nulian . We can use this ticket to review this endpoint and see what we can do to make it perform much better under this type of scale. Thanks for using FusionAuth! |
This should be fixed in upcoming |
We have a fusionauth setup with a lot of applications and with even more keys that makes the /api/jwt/public-keys?applicationId=11111111-1111-1111-1111-111111111111 endpoint very slow.
Was monitoring the sql queries that fusionauth does and it queries applications with
active = true
which isn't an indexed column.After that it seems to query all keys ordered by insert_instant which also isn't indexed.
And then probably filter client side the correct key which probably could be optimized by searching for the just the keys needed.
We can probably partly fix the performance on our side by adding some indexes.
But would be nice for more optimized query on fetching the keys also and that we don't have to do custom alteration on the fusionauth database.
The text was updated successfully, but these errors were encountered: