Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize generate_username query #18061

Merged
merged 1 commit into from Sep 30, 2017
Merged

Optimize generate_username query #18061

merged 1 commit into from Sep 30, 2017

Conversation

wjordan
Copy link
Contributor

@wjordan wjordan commented Sep 30, 2017

Followup/fix to #12745.

This PR uses a faster, and more accurate MySQL query for generating pseudo-sequential usernames.

While the query in #12745 selected an integer one greater than the maximum of all found integers, this query selects the first gap in the integer sequence. This is more accurate/reliable (because a single, random high-value integer won't cause all future pseudo-sequential numbers to be strictly greater), and it also turns out to be a faster query to boot.

Compare:

Previous PR query (super-long number, 4.29 sec):

mysql> SELECT  MAX(CAST(SUBSTRING(`username`, 6) as unsigned)) as `id` FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username LIKE 'coder%' and username RLIKE '^coder[0-9]+$') ORDER BY `users`.`id` ASC LIMIT 1;
+--------------+
| id           |
+--------------+
| 756735909814 |
+--------------+
1 row in set (4.29 sec)

Current PR query (shortest-available number, 2.58 sec):

mysql> SELECT CAST(SUBSTRING(username, 7) as unsigned) + 1 as id  FROM users u   WHERE username LIKE "coder%"     AND username RLIKE "^coder[0-9]+$"      AND NOT EXISTS (       SELECT 1       FROM users u2       WHERE u2.username = CONCAT("coder", CAST(SUBSTRING(u.username, 7) as unsigned) + 1)     )   LIMIT 1;
+------+
| id   |
+------+
| 1125 |
+------+
1 row in set (2.58 sec)

This will fix the issue encountered in our UI tests because the previous query quickly encountered an integer-overrun issue, where the max value "999999[...etc...]" was selected, increased by 1 to "1000000[...etc...]", increasing its string length by 1. If the 999-etc string was already the maximum allowed username length, the newly-generated username would fail validation. The new query will only encounter this issue if all of the gaps in the sequence are also exhausted, which will be much less likely. I've also added a raise on this case, so we should be able to detect this error-condition more easily.

@wjordan
Copy link
Contributor Author

wjordan commented Sep 30, 2017

explain for this query:

mysql> explain SELECT CAST(SUBSTRING(username, 7) as unsigned) + 1 as id  FROM users u   WHERE username LIKE "coder%"     AND username RLIKE "^coder[0-9]+$"      AND NOT EXISTS (       SELECT 1       FROM users u2       WHERE u2.username = CONCAT("coder", CAST(SUBSTRING(u.username, 7) as unsigned) + 1)     )   LIMIT 1;
+----+--------------------+-------+-------+----------------------------------------+----------------------------------------+---------+------+---------+--------------------------+
| id | select_type        | table | type  | possible_keys                          | key                                    | key_len | ref  | rows    | Extra                    |
+----+--------------------+-------+-------+----------------------------------------+----------------------------------------+---------+------+---------+--------------------------+
|  1 | PRIMARY            | u     | range | index_users_on_username_and_deleted_at | index_users_on_username_and_deleted_at | 768     | NULL | 4493510 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | u2    | ref   | index_users_on_username_and_deleted_at | index_users_on_username_and_deleted_at | 768     | func |       1 | Using where; Using index |
+----+--------------------+-------+-------+----------------------------------------+----------------------------------------+---------+------+---------+--------------------------+

Use more accurate MySQL query for finding gaps in integer sequence.
@wjordan
Copy link
Contributor Author

wjordan commented Sep 30, 2017

since the earlier version of this PR has been code-reviewed previously, I plan on merging this fix now so that I can run the full set of UI tests against it right away.

@wjordan wjordan merged commit 256c411 into staging Sep 30, 2017
@wjordan wjordan deleted the generate_username_fix branch November 8, 2017 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant