Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further tests for the InlineVerifier #137

Merged
merged 6 commits into from
Sep 30, 2019

Conversation

shuhaowu
Copy link
Contributor

@shuhaowu shuhaowu commented Sep 26, 2019

Is a decently straightforward PR, although I do fix an issue with NULL vs 'NULL'.

  • Tested that InlineVerifier works across multiple charsets and collations.
  • Tested positive and negative zero floating point values.
  • Tested NULL values.

Appendix: Technical analysis of the NULL vs 'NULL' problem

One thing I uncovered is that NULL is being treated the same as 'NULL' by the InlineVerifier. Thus, we're prone to false negatives involving NULL. The cause of the issue is as folllows:

  • To generate a checksum, for each row, we compute the MD5 of each column and concatenate the result. Then we generate yet another MD5 for this concatenated string.
  • MD5 checksum is computed on the mysql server, using the MD5 function. However, MD5(NULL) -> NULL, and CONCAT(NULL, 'data') -> NULL. This means every row that has a NULL column will have a NULL checksum, which is unacceptable.
  • To get around this, we use the COALESCE function, which returns the first non null entry. Specifically, we used MD5(COALASCE(column, 'NULL')). If the column is NULL, we would thus compute MD5('NULL'). This causes the collision with the string 'NULL'.

The current workaround uses a magic string that's unlikely to be found in nature. Although it may be possible to use CONCAT_WS and NULLIF to accomplish this without the magic string. There are some reverted commits for reference. The commit at the end also adds a test case that will be triggered if the naive CONCAT_WS method is implemented in the future.

This reverts commit fb8942f.

CONCAT_WS fails if there are two columns where one column is NULL and
the other is some data on the source and the order of the above is
flipped on the target:

mysql> select concat_ws('', md5(1), md5(NULL), md5('hello'));
+------------------------------------------------------------------+
| concat_ws('', md5(1), md5(NULL), md5('hello'))                   |
+------------------------------------------------------------------+
| c4ca4238a0b923820dcc509a6f75849b5d41402abc4b2a76b9719d911017c592 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select concat_ws('', md5(1), md5('hello'), md5(NULL));
+------------------------------------------------------------------+
| concat_ws('', md5(1), md5('hello'), md5(NULL))                   |
+------------------------------------------------------------------+
| c4ca4238a0b923820dcc509a6f75849b5d41402abc4b2a76b9719d911017c592 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)
Used a magic string to make it much less likely.
To prevent a naive CONCAT_WS introducing a regression.
@shuhaowu
Copy link
Contributor Author

I'm going to merge this for now. If there are any other concerns/comments about this, we can address it in a follow up PR.

@shuhaowu shuhaowu merged commit 7c0424d into Shopify:master Sep 30, 2019
@shuhaowu shuhaowu deleted the inline-verifier-further-tests branch September 30, 2019 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant