Performance Issue #21

timbuchwaldt · 2016-08-17T16:57:14Z

As the readme doesn't really state it: django-rest-knox has a major performance issue.

https://github.com/James1345/django-rest-knox/blob/master/knox/auth.py#L50

This line leads to major performance hits once one reaches significant user numbers, as each API request leads to all tokens being sent do the Django instances. Once reaching a few tens of thousands of users and more than a handful of requests per second, the application is spending more time receiving/parsing SQL and guessing tokens than doing some actual work.

One way to change this would be the following: Split the token in 2 parts:

Part one is an identifier for a database column, e.g. a random string
Part two is the very token knox currently uses

This would lead to only querying for a single row (the one matching part on), then comparing one hash.

Funfact: While loadtesting I watched my servers handle > 100mbit/s of SQL due to this bug.

jerzyk · 2016-08-26T18:19:13Z

@timbuchwaldt agreed, this is an real issue, especially when your user base is growing and those users are opening multiple sessions

I was thinking how to limit a subset of the data that is fetched from the database e.g. by filtering only active tokens or by extending model and include e.g. ip address and filter in the first place all tokens with specific ip

this is similar to your approach, but then we are loosing one functionality - self cleaning of the token table

finally what we have implemented here is pretty simple: we are using cache to resolve tokens:

when user is accessing site first time, his token is not in cache, so there is call to the database, when successful - cache is being set
subsequent calls are getting data from the cache itself, very fast as this is direct key access

this is very simple solution and save us a lot of hassle

one more place that can improve performance is setting cache after successful login (at token creation), to not delay cache setting till first token request,but we did not implemented it.

there was one more idea - to hash a keys in the cache itself - e.g. to get md5 or sha-xxx of the token and use it as a key in the cache - but in our situation it would not bring any additional security level and will put some more load, delaying whole process.

timbuchwaldt · 2016-08-26T18:26:44Z

Limiting to just a subset of data is basically what I proposed (and @Raphaa implemented in a fork), by just adding a second column to contain part of the token. Limiting to IPs has obvious problems (e.g. taking your device to another place).

Self cleaning seems like a nice property, but this could easily be implemented as a task run by cron, celery beat or similar systems. Heck, even running DELETE FROM tokens WHERE created_at < DATEADD(day, -30, GETDATE() at a percentage of requests would do the trick. But passing arround all data on every request is just slow.

Regarding caching: Say once has like 50 Django instances running - so the user might turn up on any of the 50, so the cache would be empty most of the time.

jerzyk · 2016-08-26T18:47:24Z

Self cleaning seems like a nice property, but this could easily be implemented as a task run by cron, celery beat or similar systems. Heck, even running DELETE FROM tokens WHERE created_at < DATEADD(day, -30, GETDATE() at a percentage of requests would do the trick. But passing arround all data on every request is just slow.

sure, but this is another deployment task to remember

Regarding caching: Say once has like 50 Django instances running - so the user might turn up on any of the 50, so the cache would be empty most of the time.

wait ,what? are you talking about local cache? this shouldn't be used in production... memcache/redis and there are no issue at all

jasjukaitis · 2016-08-26T19:14:06Z

Using memcache/redis is just sellotaping. The issue (broken by design) would be still there and also requires an additional deployment task as cleaning the database. Self cleaning at every request is also bad design for my taste. Deploying Django is always more than copy pasting via FTP, so I don't know why that should be unreasonable.

James1345 · 2016-08-28T22:57:36Z

@timbuchwaldt Yes. I agree. no debate needed. Will add to the todos for dev.

tumbak · 2016-10-30T11:56:57Z

Are there any updates on this? We are having major performance issues and so far we've pin pointed the issue to this exact case.

I am willing to write a pull request for this, just need to make sure it aligns with the developers' suggested solution.

mheppner · 2016-11-30T16:17:22Z

@James1345 do you have any updates for this? Can @Raphaa submit a pull request of his fork?

rootvar · 2016-12-01T19:02:34Z

As much as I appreciate the work that has been done on this app, it's not production ready because of this bug.

@tumbak solution seems to work fine. One suggestion: you have commented out expired token deletion but didn't add any test in the same method. It still needs this line imo
if auth_token.expires < timezone.now():

Having a management command is helpful but is not clear to novice users (or easy to forget in general). If we keep deletion where it was originally what will be the overhead? The possibility of having duplicate token_slice is very small, so filtering by that and then deleting expired doesn't seem to present a performance issue.

jasjukaitis · 2016-12-01T19:15:49Z

I've added my performance fix in two pull requests (two versions). #28 and #29

belugame · 2016-12-04T17:28:42Z

@Raphaa Thanks a lot. Just to be sure, for a production system updating to the #28 version mean that all current tokens become invalid, right? Should we then possibly add a data migration to delete all tokens before? Or some way of making devs aware that this will break current tokens.

Update: My uncertainty is if this should be considered a backwards incompatible release then.

James1345 · 2016-12-08T16:58:34Z

@belugame @Raphaa I wouldn't have thought of invalidating all tokens as a backward incompatible fix provided the migration deletes them all as that's the same effect as a logout/token expiring. Provided the mechanism for getting/submitting tokens is unchanged it would be transparent to rest clients connecting to an API using knox authentication, and the migration should be trivial for the maintainer of the API to apply.

jasjukaitis · 2016-12-08T21:34:12Z

As my commit in #28 says:

The first 8 characters of a token will be saved as token_key, so in future knox won't iterate over all hashes, only over all with the first 8 chars. For a smooth migration, this minor release doesn't include the actual performance improvements. If a token is valid, it will fill the token_key, but is still running over all tokens. In the next (I would say) major release, a breaking change will come, because the token_key will become not null. All reused tokens between this and the upcoming (major) release will be updated automatically and are not affected. Any other inactive user must reauthenticate. Then the actual performance improvement will work.

This is why I've created two pull requests. Admins should take some time between updating between this two releases, for instance 30 days. Imho this is acceptable for me, that users who didn't use the API for a month, can reauthenticate. Users who used the API within 30 days don't recognize any changes. I hope you understand my intention.

bf0 · 2016-12-25T05:29:36Z

@Raphaa

I'm trying to estimate the at-scale performance with your changes, could I possibly have your input? Say for 10 million stored tokens, we will retrieve all with matching first 8 char (expense 1), then iterate over them to find a match (expense2). How big might these expenses be? I'm not sure on some of the math.

I was starting to use knox, saw the authenticate_credentials() method and knew it wouldn't scale. Now I'm determining if I should continue / if your updates would resolve the problem. Intuitively your fix should resolve the issue, just double checking.

rootvar · 2016-12-25T05:40:40Z

@bf0

not @Raphaa but it should be something like 62^8 (62 alphanumeric characters to the 8th power)

bf0 · 2016-12-25T05:49:11Z

@rootvar Thanks. Hmm, how can I translate it into the average number of tokens sharing the same first 8, in order to estimate the time taken for the db lookup (expense 1)?

edit - oh I see what you're saying; the chances of sharing the same first 8 should be ~1:62^8 i.e., very rarely will there be > 1 token in the query, if I have that right?

jasjukaitis · 2016-12-28T21:50:53Z

I can't say any numbers. But it won't be that often > 1. Would say, that other things in your app are more expensive. ;)

belugame · 2016-12-29T14:38:42Z

@Raphaa I can't merge your pull request at the moment as the tests are failing, could you look into it?

jasjukaitis mentioned this issue Jan 12, 2017

Performance improvements part II #29

Merged

belugame mentioned this issue Feb 8, 2017

Release 3.0.0: Big performance improve #44

Merged

belugame closed this as completed in #44 Feb 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue #21

Performance Issue #21

timbuchwaldt commented Aug 17, 2016 •

edited

jerzyk commented Aug 26, 2016

timbuchwaldt commented Aug 26, 2016

jerzyk commented Aug 26, 2016

jasjukaitis commented Aug 26, 2016

James1345 commented Aug 28, 2016

tumbak commented Oct 30, 2016

mheppner commented Nov 30, 2016

rootvar commented Dec 1, 2016

jasjukaitis commented Dec 1, 2016

belugame commented Dec 4, 2016 •

edited

James1345 commented Dec 8, 2016

jasjukaitis commented Dec 8, 2016 via email •

edited

bf0 commented Dec 25, 2016 •

edited

rootvar commented Dec 25, 2016

bf0 commented Dec 25, 2016 •

edited

jasjukaitis commented Dec 28, 2016

belugame commented Dec 29, 2016

Performance Issue #21

Performance Issue #21

Comments

timbuchwaldt commented Aug 17, 2016 • edited

jerzyk commented Aug 26, 2016

timbuchwaldt commented Aug 26, 2016

jerzyk commented Aug 26, 2016

jasjukaitis commented Aug 26, 2016

James1345 commented Aug 28, 2016

tumbak commented Oct 30, 2016

mheppner commented Nov 30, 2016

rootvar commented Dec 1, 2016

jasjukaitis commented Dec 1, 2016

belugame commented Dec 4, 2016 • edited

James1345 commented Dec 8, 2016

jasjukaitis commented Dec 8, 2016 via email • edited

bf0 commented Dec 25, 2016 • edited

rootvar commented Dec 25, 2016

bf0 commented Dec 25, 2016 • edited

jasjukaitis commented Dec 28, 2016

belugame commented Dec 29, 2016

timbuchwaldt commented Aug 17, 2016 •

edited

belugame commented Dec 4, 2016 •

edited

jasjukaitis commented Dec 8, 2016 via email •

edited

bf0 commented Dec 25, 2016 •

edited

bf0 commented Dec 25, 2016 •

edited