New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue #21
Comments
@timbuchwaldt agreed, this is an real issue, especially when your user base is growing and those users are opening multiple sessions I was thinking how to limit a subset of the data that is fetched from the database e.g. by filtering only active tokens or by extending model and include e.g. ip address and filter in the first place all tokens with specific ip this is similar to your approach, but then we are loosing one functionality - self cleaning of the token table finally what we have implemented here is pretty simple: we are using cache to resolve tokens:
this is very simple solution and save us a lot of hassle one more place that can improve performance is setting cache after successful login (at token creation), to not delay cache setting till first token request,but we did not implemented it. there was one more idea - to hash a keys in the cache itself - e.g. to get md5 or sha-xxx of the token and use it as a key in the cache - but in our situation it would not bring any additional security level and will put some more load, delaying whole process. |
Limiting to just a subset of data is basically what I proposed (and @Raphaa implemented in a fork), by just adding a second column to contain part of the token. Limiting to IPs has obvious problems (e.g. taking your device to another place). Self cleaning seems like a nice property, but this could easily be implemented as a task run by cron, celery beat or similar systems. Heck, even running Regarding caching: Say once has like 50 Django instances running - so the user might turn up on any of the 50, so the cache would be empty most of the time. |
sure, but this is another deployment task to remember
wait ,what? are you talking about local cache? this shouldn't be used in production... memcache/redis and there are no issue at all |
Using memcache/redis is just sellotaping. The issue (broken by design) would be still there and also requires an additional deployment task as cleaning the database. Self cleaning at every request is also bad design for my taste. Deploying Django is always more than copy pasting via FTP, so I don't know why that should be unreasonable. |
@timbuchwaldt Yes. I agree. no debate needed. Will add to the todos for dev. |
Are there any updates on this? We are having major performance issues and so far we've pin pointed the issue to this exact case. I am willing to write a pull request for this, just need to make sure it aligns with the developers' suggested solution. |
@James1345 do you have any updates for this? Can @Raphaa submit a pull request of his fork? |
As much as I appreciate the work that has been done on this app, it's not production ready because of this bug. @tumbak solution seems to work fine. One suggestion: you have commented out expired token deletion but didn't add any test in the same method. It still needs this line imo Having a management command is helpful but is not clear to novice users (or easy to forget in general). If we keep deletion where it was originally what will be the overhead? The possibility of having duplicate token_slice is very small, so filtering by that and then deleting expired doesn't seem to present a performance issue. |
@Raphaa Thanks a lot. Just to be sure, for a production system updating to the #28 version mean that all current tokens become invalid, right? Should we then possibly add a data migration to delete all tokens before? Or some way of making devs aware that this will break current tokens. Update: My uncertainty is if this should be considered a backwards incompatible release then. |
@belugame @Raphaa I wouldn't have thought of invalidating all tokens as a backward incompatible fix provided the migration deletes them all as that's the same effect as a logout/token expiring. Provided the mechanism for getting/submitting tokens is unchanged it would be transparent to rest clients connecting to an API using knox authentication, and the migration should be trivial for the maintainer of the API to apply. |
As my commit in #28 says:
The first 8 characters of a token will be saved as token_key, so in future knox won't iterate over all hashes, only over all with the first 8 chars.
For a smooth migration, this minor release doesn't include the actual performance improvements. If a token is valid, it will fill the token_key, but is still running over all tokens. In the next (I would say) major release, a breaking change will come, because the token_key will become not null. All reused tokens between this and the upcoming (major) release will be updated automatically and are not affected. Any other inactive user must reauthenticate. Then the actual performance improvement will work.
This is why I've created two pull requests. Admins should take some
time between updating between this two releases, for instance 30 days.
Imho this is acceptable for me, that users who didn't use the API for
a month, can reauthenticate. Users who used the API within 30 days
don't recognize any changes.
I hope you understand my intention.
|
I'm trying to estimate the at-scale performance with your changes, could I possibly have your input? Say for 10 million stored tokens, we will retrieve all with matching first 8 char (expense 1), then iterate over them to find a match (expense2). How big might these expenses be? I'm not sure on some of the math. I was starting to use knox, saw the authenticate_credentials() method and knew it wouldn't scale. Now I'm determining if I should continue / if your updates would resolve the problem. Intuitively your fix should resolve the issue, just double checking. |
@rootvar Thanks. Hmm, how can I translate it into the average number of tokens sharing the same first 8, in order to estimate the time taken for the db lookup (expense 1)? edit - oh I see what you're saying; the chances of sharing the same first 8 should be ~1:62^8 i.e., very rarely will there be > 1 token in the query, if I have that right? |
I can't say any numbers. But it won't be that often > 1. Would say, that other things in your app are more expensive. ;) |
@Raphaa I can't merge your pull request at the moment as the tests are failing, could you look into it? |
As the readme doesn't really state it: django-rest-knox has a major performance issue.
https://github.com/James1345/django-rest-knox/blob/master/knox/auth.py#L50
This line leads to major performance hits once one reaches significant user numbers, as each API request leads to all tokens being sent do the Django instances. Once reaching a few tens of thousands of users and more than a handful of requests per second, the application is spending more time receiving/parsing SQL and guessing tokens than doing some actual work.
One way to change this would be the following: Split the token in 2 parts:
This would lead to only querying for a single row (the one matching part on), then comparing one hash.
Funfact: While loadtesting I watched my servers handle > 100mbit/s of SQL due to this bug.
The text was updated successfully, but these errors were encountered: