Fix import query parsing #14

JanJakes · 2025-10-09T09:25:06Z

Description:

This is a quick fix of import query parsing, replacing #13 for now. It adds multiple fixes and improvements that make the query recognition in the input dump almost complete.

The only known edge cases that aren't supported yet are the usage of NO_BACKSLASH_ESCAPES SQL mode, and the MySQL DELIMITER ... statement. We likely won't implement those in the current logic—it's better to make the new query tokenizer fully support streaming to handle all edge cases correctly.

Related issues:

Testing Instructions

Create a blank site using Studio
It doesn't matter if the site is running or not.
Edit the file ~/Library/Application Support/Studio/server-files/sqlite-command/src/Import.php with the changes from this PR.
In Studio go to the Import/Export tab
Try multiple queries like those provided on STU-844 or WP_MySQL_Token::get_value() throws error when importing wp_aioseo_notifications SQL query WordPress/sqlite-database-integration#250, or wp_redirection_404_with-create-table.sql
Confirm that the SQL files are imported correctly.
Try importing the same SQL dump and confirm that an alert appears stating that "Create Table" couldn't be executed. This is expected because you already imported that dump and created the table.
Feel free to try other sites and queries to confirm everything works as expected.

… in strings

sejas

@JanJakes, thanks for improving the WP-CLI parser.
I tested it with the two problematic SQL dumps, and both were executed correctly 🥳

It makes the import process a bit slower. Similarly to #13, it increases the import time by ~50%.
I tested it by importing a single 60MB SQL dump file which took 27 seconds compared to 18 seconds with the previous parser. I think the increase is due to checking the encoding. Maybe we could try executing the query as it is, which will work in most scenarios and if the query fails, we could then check the encoding of the statement.

The good part of streaming the file is that we don't reach the memory limit, and I was able to import a big site.

cc @wojtekn

sejas · 2025-10-10T10:36:40Z

I tested it by importing a single 60MB SQL dump file which took 27 seconds compared to 18 seconds with the previous parser.

@JanJakes I've created this PR using your branch as base. It keeps the same speed during the import because it converts the encoding only if the execution fails. Could you review it?

Speedup encoding check by executing it as a fallback on execute_statements #15

…-as-fallback Speedup encoding check by executing it as a fallback on execute_statements

JanJakes · 2025-10-10T11:31:52Z

@sejas @wojtekn This seems to be done and approved. I'm not sure if I should merge, so please, go ahead anytime.

sejas · 2025-10-10T11:47:49Z

Thank you @JanJakes ! @wojtekn will take another look and after that I'll merge it and create a new wp-cli sqlite release and close STU-844 and other issues.

wojtekn

I left one comment.

Besides that, it works as expected - all my test imports, along with problematic queries, work fine.

wojtekn · 2025-10-10T11:56:53Z

src/Import.php

+			} catch ( Exception $e ) {
+				try {
+					// Try converting encoding and retry
+					$detected_encoding = mb_detect_encoding( $statement, mb_list_encodings(), true );


Would it be safer and more reliable to detect encoding based on initial part of the file e.g. a few kilobytes and then if it differs, convert it for all statements?

Or, zooming out, is there a better way to detect and fix encoding earlier in the process, e.g., using Node.js? The problem with mb_detect_encoding is that it's not very reliable: https://www.php.net/manual/en/function.mb-detect-encoding.php

Reading only the first bytes doesn't identify the correct encoding. We need to read the whole file at once or statement by statement.

Reading the encoding in Node.js and passing it as an argument to WP-CLI sqlite could be a solution. For efficiency, I'll merge and release this PR, and we can consider future improvements if necessary.

JanJakes added 2 commits October 9, 2025 13:45

Fix character escaping to support escaped backslashes

053f717

Detect and fix encoding of the database dump when importing

da3e71a

JanJakes force-pushed the fix-import-query-parsing branch from 794e7cc to da3e71a Compare October 9, 2025 12:25

JanJakes added 2 commits October 9, 2025 15:27

Fix whitespace being trimmed in multi-line strings

6c22b98

Fix comment handling - allow them to start anywhere, don't match them…

c1a41cd

… in strings

JanJakes mentioned this pull request Oct 9, 2025

WP_MySQL_Naive_Query_Stream WordPress/sqlite-database-integration#264

Draft

JanJakes added 4 commits October 10, 2025 08:14

Simplify quote matching state, add tests

a84c954

Correctly parse backtick-quoted strings

8e78cf7

Interpret escape sequences only in strings that support it

79f24f9

Fix handling of empty lines, add test

5d7507f

JanJakes force-pushed the fix-import-query-parsing branch from 53bc21c to 5d7507f Compare October 10, 2025 06:20

JanJakes requested review from sejas and wojtekn October 10, 2025 06:28

sejas approved these changes Oct 10, 2025

View reviewed changes

Move encoding check to execute_statements

cfdd79c

Merge pull request #15 from Automattic/update/speed-up-encoding-check…

8556960

…-as-fallback Speedup encoding check by executing it as a fallback on execute_statements

sejas approved these changes Oct 10, 2025

View reviewed changes

wojtekn approved these changes Oct 10, 2025

View reviewed changes

sejas merged commit 83be865 into main Oct 10, 2025
34 checks passed

sejas deleted the fix-import-query-parsing branch October 10, 2025 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix import query parsing #14

Fix import query parsing #14

Uh oh!

JanJakes commented Oct 9, 2025 •

edited

Loading

Uh oh!

sejas left a comment •

edited

Loading

Uh oh!

sejas commented Oct 10, 2025

Uh oh!

JanJakes commented Oct 10, 2025

Uh oh!

sejas commented Oct 10, 2025 •

edited

Loading

Uh oh!

wojtekn left a comment

Uh oh!

wojtekn Oct 10, 2025

Uh oh!

JanJakes Oct 10, 2025

Uh oh!

sejas Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix import query parsing #14

Fix import query parsing #14

Uh oh!

Conversation

JanJakes commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Related issues:

Testing Instructions

Uh oh!

sejas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sejas commented Oct 10, 2025

Uh oh!

JanJakes commented Oct 10, 2025

Uh oh!

sejas commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wojtekn left a comment

Choose a reason for hiding this comment

Uh oh!

wojtekn Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

JanJakes Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

sejas Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JanJakes commented Oct 9, 2025 •

edited

Loading

sejas left a comment •

edited

Loading

sejas commented Oct 10, 2025 •

edited

Loading