Optimize generate_username query #12745

wjordan · 2017-01-18T18:48:26Z

This PR optimizes the UserHelpers#generate_username logic in three ways:

tweak the 'throw dart' logic to be exponentially less likely to hit a conflict, at the expense of slightly longer numeric suffixes on average.
Optimize the fallback range-scan query to do parsing and numeric ordering within the DB directly, rather than sending a (potentially large) list of matches to do the parsing/ordering on the frontend.
add an additional regex filter to the initial query, to filter integer suffixes from other usernames.

Not 100% confident that 2 and 3 are the best approach, but I'm reasonably confident they're improvements over the existing logic. However, it's possible we might want to adopt friendly_id instead, and rely on that library's well-maintained conflict-resolution implementation rather than continuing to maintain this logic ourselves.

ashercodeorg · 2017-01-18T19:04:09Z

You'll need to fix the failing tests...

wjordan · 2017-01-18T19:39:44Z

oh, (looking at the failing tests) I didn't realize that the username field has a max length of only 20 characters! This constraint makes me much more hesitant to increase the average number of digits in the first part of this algorithm...

ashercodeorg · 2017-03-21T13:55:10Z

Is this still active?

…user

tweak darts logic for longer suffixes

wjordan · 2017-09-25T17:27:20Z

The performance issue this PR hopes to address is still at large and getting increasingly problematic over time, so I took another pass at this PR today to address the test failures.

jopolsky

Looks good. I'm assuming that there should be no change to the index(es) used? Did you do an explain on the new query? Also, do you have any timing metrics?

jeremydstone · 2017-09-25T21:27:07Z

LGTM

wjordan · 2017-09-25T21:51:57Z

Here's a basic explain using will as the prefix, which shows it still uses the same index_users_on_username_and_deleted_at index for searching through the rows:

irb(main):050:0> User.where(['username LIKE ? and username RLIKE ?', "#{prefix}%", "^#{prefix}[0-9]+$"]).select("MAX(CAST(SUBSTRING(`username`, #{prefix.length + 1}) as unsigned)) as `id`").explain
D, [2017-09-25T21:31:13.150047 #28694] DEBUG -- :   User Load (154.1ms)  SELECT MAX(CAST(SUBSTRING(`username`, 5) as unsigned)) as `id` FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username LIKE 'will%' and username RLIKE '^will[0-9]+$')
=> EXPLAIN for: SELECT MAX(CAST(SUBSTRING(`username`, 5) as unsigned)) as `id` FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username LIKE 'will%' and username RLIKE '^will[0-9]+$')
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys                                                    | key                                    | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | users | range | index_users_on_username_and_deleted_at,index_users_on_deleted_at | index_users_on_username_and_deleted_at | 774     | NULL | 195584 | Using where; Using index |
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

Manual timing tests confirm the new query is consistently faster than the original (and in some edge cases where the original hangs for minutes, the new query finishes in ~5 seconds).

This reverts commit d29d44e.

Revert "Optimize generate_username query (#12745)"

Optimize generate_username query

f04b504

wjordan assigned ewjordan and ashercodeorg Jan 18, 2017

wjordan added 2 commits September 25, 2017 10:15

Merge remote-tracking branch 'origin/staging' into optimize_generate_…

ee0f18f

…user

passing tests

8363972

tweak darts logic for longer suffixes

wjordan unassigned ashercodeorg Sep 25, 2017

wjordan requested review from ewjordan and jopolsky September 25, 2017 17:27

jopolsky approved these changes Sep 25, 2017

View reviewed changes

ewjordan approved these changes Sep 25, 2017

View reviewed changes

wjordan merged commit d29d44e into staging Sep 25, 2017

wjordan deleted the optimize_generate_user branch September 26, 2017 16:36

wjordan added a commit that referenced this pull request Sep 27, 2017

Revert "Optimize generate_username query (#12745)"

1a9724f

This reverts commit d29d44e.

jeremydstone pushed a commit that referenced this pull request Sep 27, 2017

Merge pull request #17981 from code-dot-org/revert_12745

3ac58bc

Revert "Optimize generate_username query (#12745)"

wjordan mentioned this pull request Sep 30, 2017

Optimize generate_username query #18061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize generate_username query #12745

Optimize generate_username query #12745

wjordan commented Jan 18, 2017

ashercodeorg commented Jan 18, 2017

wjordan commented Jan 18, 2017

ashercodeorg commented Mar 21, 2017

wjordan commented Sep 25, 2017

jopolsky left a comment •

edited

Loading

jeremydstone commented Sep 25, 2017

wjordan commented Sep 25, 2017

Optimize generate_username query #12745

Optimize generate_username query #12745

Conversation

wjordan commented Jan 18, 2017

ashercodeorg commented Jan 18, 2017

wjordan commented Jan 18, 2017

ashercodeorg commented Mar 21, 2017

wjordan commented Sep 25, 2017

jopolsky left a comment • edited Loading

Choose a reason for hiding this comment

jeremydstone commented Sep 25, 2017

wjordan commented Sep 25, 2017

jopolsky left a comment •

edited

Loading