Skip to content

Conversation

JanJakes
Copy link
Contributor

@JanJakes JanJakes commented Oct 9, 2025

Description:

This is a quick fix of import query parsing, replacing #13 for now. It adds multiple fixes and improvements that make the query recognition in the input dump almost complete.

The only known edge cases that aren't supported yet are the usage of NO_BACKSLASH_ESCAPES SQL mode, and the MySQL DELIMITER ... statement. We likely won't implement those in the current logic—it's better to make the new query tokenizer fully support streaming to handle all edge cases correctly.

Related issues:

Testing Instructions

@JanJakes JanJakes force-pushed the fix-import-query-parsing branch from 794e7cc to da3e71a Compare October 9, 2025 12:25
@JanJakes JanJakes force-pushed the fix-import-query-parsing branch from 53bc21c to 5d7507f Compare October 10, 2025 06:20
@JanJakes JanJakes requested review from sejas and wojtekn October 10, 2025 06:28
Copy link
Member

@sejas sejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanJakes, thanks for improving the WP-CLI parser.
I tested it with the two problematic SQL dumps, and both were executed correctly 🥳

It makes the import process a bit slower. Similarly to #13, it increases the import time by ~50%.
I tested it by importing a single 60MB SQL dump file which took 27 seconds compared to 18 seconds with the previous parser. I think the increase is due to checking the encoding. Maybe we could try executing the query as it is, which will work in most scenarios and if the query fails, we could then check the encoding of the statement.

The good part of streaming the file is that we don't reach the memory limit, and I was able to import a big site.

cc @wojtekn

@sejas
Copy link
Member

sejas commented Oct 10, 2025

I tested it by importing a single 60MB SQL dump file which took 27 seconds compared to 18 seconds with the previous parser.

@JanJakes I've created this PR using your branch as base. It keeps the same speed during the import because it converts the encoding only if the execution fails. Could you review it?

…-as-fallback

Speedup encoding check by executing it as a fallback on execute_statements
@JanJakes
Copy link
Contributor Author

@sejas @wojtekn This seems to be done and approved. I'm not sure if I should merge, so please, go ahead anytime.

@sejas
Copy link
Member

sejas commented Oct 10, 2025

Thank you @JanJakes ! @wojtekn will take another look and after that I'll merge it and create a new wp-cli sqlite release and close STU-844 and other issues.

Copy link
Contributor

@wojtekn wojtekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment.

Besides that, it works as expected - all my test imports, along with problematic queries, work fine.

} catch ( Exception $e ) {
try {
// Try converting encoding and retry
$detected_encoding = mb_detect_encoding( $statement, mb_list_encodings(), true );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safer and more reliable to detect encoding based on initial part of the file e.g. a few kilobytes and then if it differs, convert it for all statements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, zooming out, is there a better way to detect and fix encoding earlier in the process, e.g., using Node.js? The problem with mb_detect_encoding is that it's not very reliable: https://www.php.net/manual/en/function.mb-detect-encoding.php

Screenshot 2025-10-10 at 14 53 12

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading only the first bytes doesn't identify the correct encoding. We need to read the whole file at once or statement by statement.

Reading the encoding in Node.js and passing it as an argument to WP-CLI sqlite could be a solution. For efficiency, I'll merge and release this PR, and we can consider future improvements if necessary.

@sejas sejas merged commit 83be865 into main Oct 10, 2025
34 checks passed
@sejas sejas deleted the fix-import-query-parsing branch October 10, 2025 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to parse the MySQL query error WP_MySQL_Token::get_value() throws error when importing wp_aioseo_notifications SQL query

3 participants