Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize generate_username query #12745

Merged
merged 3 commits into from
Sep 25, 2017
Merged

Optimize generate_username query #12745

merged 3 commits into from
Sep 25, 2017

Conversation

wjordan
Copy link
Contributor

@wjordan wjordan commented Jan 18, 2017

This PR optimizes the UserHelpers#generate_username logic in three ways:

  1. tweak the 'throw dart' logic to be exponentially less likely to hit a conflict, at the expense of slightly longer numeric suffixes on average.
  2. Optimize the fallback range-scan query to do parsing and numeric ordering within the DB directly, rather than sending a (potentially large) list of matches to do the parsing/ordering on the frontend.
  3. add an additional regex filter to the initial query, to filter integer suffixes from other usernames.

Not 100% confident that 2 and 3 are the best approach, but I'm reasonably confident they're improvements over the existing logic. However, it's possible we might want to adopt friendly_id instead, and rely on that library's well-maintained conflict-resolution implementation rather than continuing to maintain this logic ourselves.

@ashercodeorg
Copy link
Contributor

You'll need to fix the failing tests...

@wjordan
Copy link
Contributor Author

wjordan commented Jan 18, 2017

oh, (looking at the failing tests) I didn't realize that the username field has a max length of only 20 characters! This constraint makes me much more hesitant to increase the average number of digits in the first part of this algorithm...

@ashercodeorg
Copy link
Contributor

Is this still active?

@wjordan
Copy link
Contributor Author

wjordan commented Sep 25, 2017

The performance issue this PR hopes to address is still at large and getting increasingly problematic over time, so I took another pass at this PR today to address the test failures.

Copy link

@jopolsky jopolsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm assuming that there should be no change to the index(es) used? Did you do an explain on the new query? Also, do you have any timing metrics?

@jeremydstone
Copy link

LGTM

@wjordan
Copy link
Contributor Author

wjordan commented Sep 25, 2017

Here's a basic explain using will as the prefix, which shows it still uses the same index_users_on_username_and_deleted_at index for searching through the rows:

irb(main):050:0> User.where(['username LIKE ? and username RLIKE ?', "#{prefix}%", "^#{prefix}[0-9]+$"]).select("MAX(CAST(SUBSTRING(`username`, #{prefix.length + 1}) as unsigned)) as `id`").explain
D, [2017-09-25T21:31:13.150047 #28694] DEBUG -- :   User Load (154.1ms)  SELECT MAX(CAST(SUBSTRING(`username`, 5) as unsigned)) as `id` FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username LIKE 'will%' and username RLIKE '^will[0-9]+$')
=> EXPLAIN for: SELECT MAX(CAST(SUBSTRING(`username`, 5) as unsigned)) as `id` FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username LIKE 'will%' and username RLIKE '^will[0-9]+$')
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
| id | select_type | table | type  | possible_keys                                                    | key                                    | key_len | ref  | rows   | Extra                    |
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | users | range | index_users_on_username_and_deleted_at,index_users_on_deleted_at | index_users_on_username_and_deleted_at | 774     | NULL | 195584 | Using where; Using index |
+----+-------------+-------+-------+------------------------------------------------------------------+----------------------------------------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

Manual timing tests confirm the new query is consistently faster than the original (and in some edge cases where the original hangs for minutes, the new query finishes in ~5 seconds).

@wjordan wjordan merged commit d29d44e into staging Sep 25, 2017
@wjordan wjordan deleted the optimize_generate_user branch September 26, 2017 16:36
wjordan added a commit that referenced this pull request Sep 27, 2017
jeremydstone pushed a commit that referenced this pull request Sep 27, 2017
Revert "Optimize generate_username query (#12745)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants