-
Notifications
You must be signed in to change notification settings - Fork 1
Fix import query parsing #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
794e7cc
to
da3e71a
Compare
53bc21c
to
5d7507f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JanJakes, thanks for improving the WP-CLI parser.
I tested it with the two problematic SQL dumps, and both were executed correctly 🥳
It makes the import process a bit slower. Similarly to #13, it increases the import time by ~50%.
I tested it by importing a single 60MB SQL dump file which took 27 seconds compared to 18 seconds with the previous parser. I think the increase is due to checking the encoding. Maybe we could try executing the query as it is, which will work in most scenarios and if the query fails, we could then check the encoding of the statement.
The good part of streaming the file is that we don't reach the memory limit, and I was able to import a big site.
cc @wojtekn
@JanJakes I've created this PR using your branch as base. It keeps the same speed during the import because it converts the encoding only if the execution fails. Could you review it? |
…-as-fallback Speedup encoding check by executing it as a fallback on execute_statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment.
Besides that, it works as expected - all my test imports, along with problematic queries, work fine.
} catch ( Exception $e ) { | ||
try { | ||
// Try converting encoding and retry | ||
$detected_encoding = mb_detect_encoding( $statement, mb_list_encodings(), true ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be safer and more reliable to detect encoding based on initial part of the file e.g. a few kilobytes and then if it differs, convert it for all statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, zooming out, is there a better way to detect and fix encoding earlier in the process, e.g., using Node.js? The problem with mb_detect_encoding
is that it's not very reliable: https://www.php.net/manual/en/function.mb-detect-encoding.php

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading only the first bytes doesn't identify the correct encoding. We need to read the whole file at once or statement by statement.
Reading the encoding in Node.js and passing it as an argument to WP-CLI sqlite could be a solution. For efficiency, I'll merge and release this PR, and we can consider future improvements if necessary.
Description:
This is a quick fix of import query parsing, replacing #13 for now. It adds multiple fixes and improvements that make the query recognition in the input dump almost complete.
The only known edge cases that aren't supported yet are the usage of
NO_BACKSLASH_ESCAPES
SQL mode, and the MySQLDELIMITER ...
statement. We likely won't implement those in the current logic—it's better to make the new query tokenizer fully support streaming to handle all edge cases correctly.Related issues:
Testing Instructions
~/Library/Application Support/Studio/server-files/sqlite-command/src/Import.php
with the changes from this PR.